Files
vm-cloudflare/OUROBOROS_LOOP_EXPLAINED.md
2025-12-17 00:02:39 +00:00

677 lines
18 KiB
Markdown

# The Ouroboros Loop: Self-Correcting Security Architecture
**How Layer 0 Shadow learns from Layer 7 telemetry to improve itself**
---
## What is the Ouroboros Loop?
The **Ouroboros** (ancient symbol of a snake eating its own tail) represents a self-referential, self-improving system. In Layer 0 Shadow, the Ouroboros loop is the mechanism by which **Layer 7 telemetry feeds back into Layer 0 risk heuristics**, creating a self-correcting security substrate that learns from actual usage patterns.
---
## The Loop Structure
```
Layer 7 (Telemetry)
[Feedback Analysis]
Layer 0 (Shadow Eval) ← [Improved Risk Heuristics]
Layer 1 (Boot/Doctrine)
Layer 2 (Routing)
Layer 3 (MCP Tools)
Layer 4 (Guardrails)
Layer 5 (Terraform)
Layer 6 (GitOps)
Layer 7 (Telemetry) ← [Back to start]
```
**The cycle repeats:** Each query flows through all layers, and Layer 7's telemetry informs Layer 0's future classifications.
---
## How It Works: Step by Step
### Phase 1: Initial Query (Layer 0)
**Query:** "add a WAF rule to block bots"
**Layer 0 Evaluation:**
```python
# Current heuristics (initial state)
if "skip git" in query: FORBIDDEN
if "dashboard" in query: FORBIDDEN
if "disable guardrails" in query: CATASTROPHIC
# ... other patterns
# This query: "add a WAF rule to block bots"
# Classification: BLESSED (no violations detected)
# Action: HANDOFF_TO_LAYER1
```
**Result:** Query passes through all layers, completes successfully.
---
### Phase 2: Telemetry Collection (Layer 7)
**After processing completes, Layer 7 logs:**
```json
{
"timestamp": "2025-12-10T14:23:45Z",
"query": "add a WAF rule to block bots",
"agent": "cloudflare-ops",
"tools_used": ["gh_grep", "filesystem", "waf_intelligence"],
"guardrails_passed": true,
"terraform_generated": true,
"pr_created": true,
"pr_number": 42,
"confidence": 92,
"threat_type": "scanner",
"layer0_classification": "blessed",
"layer0_risk_score": 0,
"processing_time_ms": 1250,
"outcome": "success"
}
```
**Location:** `observatory/cognition_flow_logs.jsonl`
---
### Phase 3: Feedback Analysis (Between Layer 7 and Layer 0)
**The system analyzes telemetry to identify patterns:**
#### Pattern 1: False Negatives (Missed Threats)
**Example:** A query was classified as BLESSED but later triggered guardrail warnings.
**Telemetry:**
```json
{
"query": "update the WAF to allow all traffic",
"layer0_classification": "blessed",
"layer0_risk_score": 0,
"guardrails_passed": false,
"guardrail_warnings": ["zero_trust_violation", "security_risk"],
"outcome": "blocked_by_guardrails"
}
```
**Learning:** Layer 0 should have classified this as FORBIDDEN or AMBIGUOUS.
**Heuristic Update:**
```python
# New pattern learned
if "allow all traffic" in query: FORBIDDEN
if "bypass security" in query: FORBIDDEN
```
#### Pattern 2: False Positives (Over-Blocking)
**Example:** A query was classified as FORBIDDEN but was actually legitimate.
**Telemetry:**
```json
{
"query": "check the dashboard for current WAF rules",
"layer0_classification": "forbidden",
"layer0_risk_score": 3,
"layer0_reason": "governance_violation",
"outcome": "blocked_by_layer0",
"user_feedback": "legitimate_read_only_query"
}
```
**Learning:** "dashboard" in read-only context should be allowed.
**Heuristic Update:**
```python
# Refined pattern
if "dashboard" in query and "read" in query or "check" in query:
BLESSED (read-only operations)
elif "dashboard" in query and ("change" in query or "update" in query):
FORBIDDEN (write operations)
```
#### Pattern 3: Ambiguity Detection Improvement
**Example:** Queries that should have been flagged as ambiguous.
**Telemetry:**
```json
{
"query": "fix it",
"layer0_classification": "blessed",
"layer0_risk_score": 0,
"agent": "cloudflare-ops",
"tools_used": ["filesystem"],
"guardrails_passed": true,
"terraform_generated": false,
"outcome": "incomplete",
"user_clarification_required": true
}
```
**Learning:** Very short queries (< 3 words) should be AMBIGUOUS, not BLESSED.
**Heuristic Update:**
```python
# Improved ambiguity detection
if len(query.split()) <= 2 and not query.endswith("?"):
AMBIGUOUS (needs clarification)
```
---
### Phase 4: Heuristic Update (Layer 0 Re-Awakens)
**Layer 0's classifier is updated with new patterns:**
```python
class ShadowClassifier:
def __init__(self):
# Initial patterns (static)
self.catastrophic_patterns = [
"disable guardrails",
"override agent permissions",
"bypass governance",
"self-modifying",
]
self.forbidden_patterns = [
"skip git",
"apply directly",
"dashboard", # ← Refined: read-only allowed
"manual change",
]
# Learned patterns (from telemetry)
self.learned_forbidden = [
"allow all traffic", # ← Learned from false negative
"bypass security", # ← Learned from false negative
]
self.learned_ambiguous = [
# Short queries (< 3 words) → AMBIGUOUS
]
def classify(self, query: str) -> ShadowEvalResult:
q = query.lower().strip()
# Check learned patterns first (more specific)
if any(pattern in q for pattern in self.learned_forbidden):
return ShadowEvalResult(
classification=Classification.FORBIDDEN,
reason="learned_pattern",
risk_score=3,
flags=["telemetry_learned"],
)
# Then check static patterns
# ... existing logic
```
---
## What Telemetry Feeds Back?
### Layer 7 Logs (Complete Query Lifecycle)
```json
{
"timestamp": "ISO-8601",
"query": "original user query",
"layer0_classification": "blessed | ambiguous | forbidden | catastrophic",
"layer0_risk_score": 0-5,
"layer0_reason": "classification reason",
"layer0_trace_id": "uuid-v4",
"agent": "cloudflare-ops | security-audit | data-engineer",
"tools_used": ["gh_grep", "filesystem", "waf_intelligence"],
"guardrails_passed": true | false,
"guardrail_warnings": ["list of warnings"],
"terraform_generated": true | false,
"pr_created": true | false,
"pr_number": 42,
"confidence": 0-100,
"threat_type": "scanner | bot | ddos",
"processing_time_ms": 1250,
"outcome": "success | blocked | incomplete | error",
"user_feedback": "optional user correction"
}
```
### Key Metrics for Learning
1. **Classification Accuracy**
- `layer0_classification` vs `outcome`
- False positives (over-blocking)
- False negatives (missed threats)
2. **Risk Score Calibration**
- `layer0_risk_score` vs actual risk (from guardrails)
- Adjust risk thresholds based on outcomes
3. **Pattern Effectiveness**
- Which patterns catch real threats?
- Which patterns cause false positives?
4. **Resource Efficiency**
- `processing_time_ms` for blocked queries (should be 0)
- Queries that should have been blocked earlier
---
## Self-Correction Examples
### Example 1: Learning New Threat Patterns
**Initial State:**
```python
# Layer 0 doesn't know about "terraform destroy" risks
if "terraform destroy" in query:
BLESSED (not in forbidden patterns)
```
**After Processing:**
```json
{
"query": "terraform destroy production",
"layer0_classification": "blessed",
"guardrails_passed": false,
"guardrail_warnings": ["destructive_operation", "production_risk"],
"outcome": "blocked_by_guardrails"
}
```
**Learning:**
```python
# New pattern learned
if "terraform destroy" in query:
FORBIDDEN (destructive operation)
```
**Next Query:**
```python
# Query: "terraform destroy staging"
# Classification: FORBIDDEN (learned pattern)
# Action: HANDOFF_TO_GUARDRAILS (immediate)
# Result: Blocked before any processing
```
---
### Example 2: Refining Ambiguity Detection
**Initial State:**
```python
# Very short queries
if len(query.split()) <= 2:
AMBIGUOUS
```
**After Processing:**
```json
{
"query": "git status",
"layer0_classification": "ambiguous",
"outcome": "success",
"user_feedback": "common_command_should_be_blessed"
}
```
**Learning:**
```python
# Refined: Common commands are blessed
common_commands = ["git status", "terraform plan", "terraform validate"]
if query.lower() in common_commands:
BLESSED
elif len(query.split()) <= 2:
AMBIGUOUS
```
---
### Example 3: Multi-Account Risk Weighting
**Initial State:**
```python
# All queries treated equally
if "skip git" in query:
FORBIDDEN (risk_score: 3)
```
**After Processing:**
```json
{
"query": "skip git and apply to production",
"layer0_classification": "forbidden",
"layer0_risk_score": 3,
"account": "production",
"outcome": "blocked",
"actual_risk": "critical" # Higher than risk_score 3
}
```
**Learning:**
```python
# Production account queries need higher risk scores
if "production" in query and "skip git" in query:
FORBIDDEN (risk_score: 5) # Increased from 3
elif "skip git" in query:
FORBIDDEN (risk_score: 3)
```
---
## Current Implementation Status
### ✅ What's Implemented
1. **Layer 0 Classification** - Four-tier system (blessed/ambiguous/forbidden/catastrophic)
2. **Layer 7 Telemetry** - Logging structure defined
3. **Preboot Logging** - Violations logged to `preboot_shield.jsonl`
4. **Trace IDs** - Each query has unique trace ID for correlation
### 🚧 What's Planned (Future Enhancements)
From `LAYER0_SHADOW.md` Section 9:
1. **Threat-Signature Learning**
- Analyze forbidden queries to extract new patterns
- Automatically update `ShadowClassifier` patterns
2. **Multi-Account Risk Weighting**
- Different risk scores for production vs staging
- Account-specific pattern matching
3. **Synthetic Replay Mode**
- Replay historical queries to test new heuristics
- Audit reconstruction for compliance
4. **Metacognitive Hints**
- Improve ambiguity detection with context
- Better understanding of user intent
---
## Implementation Architecture
### Current: Static Patterns
```python
class ShadowClassifier:
def classify(self, query: str) -> ShadowEvalResult:
# Static pattern matching
if "skip git" in query:
return FORBIDDEN
# ... more static patterns
```
### Future: Dynamic Learning
```python
class ShadowClassifier:
def __init__(self):
self.static_patterns = {...} # Initial patterns
self.learned_patterns = {} # From telemetry
self.risk_weights = {} # Account-specific weights
def classify(self, query: str) -> ShadowEvalResult:
# Check learned patterns first (more specific)
result = self._check_learned_patterns(query)
if result:
return result
# Then check static patterns
return self._check_static_patterns(query)
def update_from_telemetry(self, telemetry_log: dict):
"""Update heuristics based on Layer 7 telemetry"""
if telemetry_log["outcome"] == "blocked_by_guardrails":
# False negative: should have been caught by Layer 0
self._learn_forbidden_pattern(telemetry_log["query"])
elif telemetry_log["outcome"] == "success" and telemetry_log["layer0_classification"] == "forbidden":
# False positive: over-blocked
self._refine_pattern(telemetry_log["query"])
```
---
## The Feedback Loop in Action
### Cycle 1: Initial State
**Query:** "skip git and apply directly"
**Layer 0:** FORBIDDEN (static pattern)
**Layer 7:** Logs violation
**Learning:** Pattern works correctly
---
### Cycle 2: New Threat Pattern
**Query:** "terraform destroy production infrastructure"
**Layer 0:** BLESSED (not in patterns)
**Layer 4 (Guardrails):** Blocks (destructive operation)
**Layer 7:** Logs false negative
**Learning:** Add "terraform destroy" to forbidden patterns
---
### Cycle 3: Improved Detection
**Query:** "terraform destroy staging"
**Layer 0:** FORBIDDEN (learned pattern)
**Action:** Blocked immediately (no processing)
**Layer 7:** Logs successful early block
**Learning:** Pattern confirmed effective
---
## Benefits of the Ouroboros Loop
### 1. **Self-Improving Security**
- Learns from actual threats
- Adapts to new attack patterns
- Reduces false positives over time
### 2. **Resource Efficiency**
- Catches threats earlier (Layer 0 vs Layer 4)
- Prevents wasted processing on bad queries
- Improves system performance
### 3. **Governance Enforcement**
- Learns infrastructure-specific violations
- Adapts to organizational policies
- Enforces GitOps/Terraform rules automatically
### 4. **Reduced Maintenance**
- Less manual pattern updates
- Automatic threat detection
- Self-correcting without human intervention
---
## Comparison to Static Systems
### Static System (Industry Standard)
```
Patterns defined once → Never change → Manual updates required
```
**Problems:**
- ❌ Can't adapt to new threats
- ❌ Requires manual updates
- ❌ False positives/negatives persist
- ❌ No learning from mistakes
### Ouroboros Loop (Layer 0 Shadow)
```
Patterns → Learn from outcomes → Improve patterns → Better detection
```
**Benefits:**
- ✅ Adapts to new threats automatically
- ✅ Self-improving without manual updates
- ✅ Reduces false positives/negatives over time
- ✅ Learns from actual usage patterns
---
## Philosophical Foundation
From `RED-BOOK.md` - The Fourfold Work:
1. **Nigredo** (Black) - Breakdown, dissolution
- Layer 0 detects violations (breakdown of governance)
2. **Albedo** (White) - Purification, clarity
- Layer 7 telemetry provides clarity on what happened
3. **Citrinitas** (Yellow) - Insight, pattern recognition
- Feedback analysis identifies patterns
4. **Rubedo** (Red) - Integration, completion
- Layer 0 heuristics updated (integration of learning)
**The Ouroboros loop completes the Work:** Each violation (Nigredo) becomes learning (Albedo) → insight (Citrinitas) → improvement (Rubedo) → better protection (back to Nigredo prevention).
---
## Future Enhancements: Detailed Plans
### 1. Threat-Signature Learning
**Implementation:**
```python
def analyze_forbidden_queries(telemetry_logs: List[dict]) -> List[str]:
"""Extract common patterns from forbidden queries"""
patterns = []
for log in telemetry_logs:
if log["layer0_classification"] == "forbidden":
# Extract key phrases
patterns.extend(extract_patterns(log["query"]))
return most_common_patterns(patterns)
```
**Example:**
- 10 queries with "skip git" → Add to forbidden patterns
- 5 queries with "terraform destroy" → Add to forbidden patterns
---
### 2. Multi-Account Risk Weighting
**Implementation:**
```python
def calculate_risk_score(query: str, account: str) -> int:
base_score = get_base_risk(query)
# Production accounts = higher risk
if account == "production":
return min(base_score * 1.5, 5) # Cap at 5
return base_score
```
**Example:**
- "skip git" in staging → risk_score: 3
- "skip git" in production → risk_score: 5 (catastrophic)
---
### 3. Synthetic Replay Mode
**Implementation:**
```python
def replay_historical_queries(new_heuristics: ShadowClassifier):
"""Test new heuristics against historical queries"""
historical_logs = load_telemetry_logs()
for log in historical_logs:
new_classification = new_heuristics.classify(log["query"])
old_classification = log["layer0_classification"]
if new_classification != old_classification:
print(f"Changed: {log['query']}")
print(f" Old: {old_classification}")
print(f" New: {new_classification}")
```
**Use Case:** Before deploying new heuristics, replay last 1000 queries to ensure no regressions.
---
### 4. Metacognitive Hints
**Implementation:**
```python
def classify_with_context(query: str, context: dict) -> ShadowEvalResult:
"""Use context to improve classification"""
# Context includes:
# - Previous queries in session
# - User's role (admin, developer, etc.)
# - Current working directory
# - Recent file changes
if context["user_role"] == "admin" and "production" in query:
# Admins querying production = higher scrutiny
return classify_with_higher_risk(query)
return standard_classify(query)
```
**Example:**
- "update WAF" from admin → BLESSED
- "update WAF" from developer → AMBIGUOUS (needs clarification)
---
## Summary
The **Ouroboros Loop** is a self-correcting security architecture that:
1. **Collects telemetry** from Layer 7 (complete query lifecycle)
2. **Analyzes patterns** to identify false positives/negatives
3. **Updates heuristics** in Layer 0 based on actual outcomes
4. **Improves detection** over time without manual intervention
**Key Innovation:** Unlike static security systems, Layer 0 Shadow learns from its mistakes and adapts to new threats automatically, creating a self-improving security substrate that becomes more effective over time.
**Current Status:** Architecture defined, telemetry structure in place, learning mechanisms planned for future implementation.
**The Loop:** Layer 7 → Analysis → Layer 0 → Layer 1 → ... → Layer 7 (repeat)
---
## References
- [LAYER0_SHADOW.md](LAYER0_SHADOW.md) - Layer 0 specification
- [COGNITION_FLOW.md](COGNITION_FLOW.md) - 8-layer architecture
- [RED-BOOK.md](RED-BOOK.md) - Philosophical foundation
- [DEMO_COGNITION.md](DEMO_COGNITION.md) - Real-world examples
---
**Last Updated:** 2025-12-10
**Status:** 🟢 Architecture Defined, Learning Mechanisms Planned
**Ouroboros Loop:** Active (Telemetry → Analysis → Improvement)