18 KiB
The Ouroboros Loop: Self-Correcting Security Architecture
How Layer 0 Shadow learns from Layer 7 telemetry to improve itself
What is the Ouroboros Loop?
The Ouroboros (ancient symbol of a snake eating its own tail) represents a self-referential, self-improving system. In Layer 0 Shadow, the Ouroboros loop is the mechanism by which Layer 7 telemetry feeds back into Layer 0 risk heuristics, creating a self-correcting security substrate that learns from actual usage patterns.
The Loop Structure
Layer 7 (Telemetry)
↓
[Feedback Analysis]
↓
Layer 0 (Shadow Eval) ← [Improved Risk Heuristics]
↓
Layer 1 (Boot/Doctrine)
↓
Layer 2 (Routing)
↓
Layer 3 (MCP Tools)
↓
Layer 4 (Guardrails)
↓
Layer 5 (Terraform)
↓
Layer 6 (GitOps)
↓
Layer 7 (Telemetry) ← [Back to start]
The cycle repeats: Each query flows through all layers, and Layer 7's telemetry informs Layer 0's future classifications.
How It Works: Step by Step
Phase 1: Initial Query (Layer 0)
Query: "add a WAF rule to block bots"
Layer 0 Evaluation:
# Current heuristics (initial state)
if "skip git" in query: → FORBIDDEN
if "dashboard" in query: → FORBIDDEN
if "disable guardrails" in query: → CATASTROPHIC
# ... other patterns
# This query: "add a WAF rule to block bots"
# Classification: BLESSED (no violations detected)
# Action: HANDOFF_TO_LAYER1
Result: Query passes through all layers, completes successfully.
Phase 2: Telemetry Collection (Layer 7)
After processing completes, Layer 7 logs:
{
"timestamp": "2025-12-10T14:23:45Z",
"query": "add a WAF rule to block bots",
"agent": "cloudflare-ops",
"tools_used": ["gh_grep", "filesystem", "waf_intelligence"],
"guardrails_passed": true,
"terraform_generated": true,
"pr_created": true,
"pr_number": 42,
"confidence": 92,
"threat_type": "scanner",
"layer0_classification": "blessed",
"layer0_risk_score": 0,
"processing_time_ms": 1250,
"outcome": "success"
}
Location: observatory/cognition_flow_logs.jsonl
Phase 3: Feedback Analysis (Between Layer 7 and Layer 0)
The system analyzes telemetry to identify patterns:
Pattern 1: False Negatives (Missed Threats)
Example: A query was classified as BLESSED but later triggered guardrail warnings.
Telemetry:
{
"query": "update the WAF to allow all traffic",
"layer0_classification": "blessed",
"layer0_risk_score": 0,
"guardrails_passed": false,
"guardrail_warnings": ["zero_trust_violation", "security_risk"],
"outcome": "blocked_by_guardrails"
}
Learning: Layer 0 should have classified this as FORBIDDEN or AMBIGUOUS.
Heuristic Update:
# New pattern learned
if "allow all traffic" in query: → FORBIDDEN
if "bypass security" in query: → FORBIDDEN
Pattern 2: False Positives (Over-Blocking)
Example: A query was classified as FORBIDDEN but was actually legitimate.
Telemetry:
{
"query": "check the dashboard for current WAF rules",
"layer0_classification": "forbidden",
"layer0_risk_score": 3,
"layer0_reason": "governance_violation",
"outcome": "blocked_by_layer0",
"user_feedback": "legitimate_read_only_query"
}
Learning: "dashboard" in read-only context should be allowed.
Heuristic Update:
# Refined pattern
if "dashboard" in query and "read" in query or "check" in query:
→ BLESSED (read-only operations)
elif "dashboard" in query and ("change" in query or "update" in query):
→ FORBIDDEN (write operations)
Pattern 3: Ambiguity Detection Improvement
Example: Queries that should have been flagged as ambiguous.
Telemetry:
{
"query": "fix it",
"layer0_classification": "blessed",
"layer0_risk_score": 0,
"agent": "cloudflare-ops",
"tools_used": ["filesystem"],
"guardrails_passed": true,
"terraform_generated": false,
"outcome": "incomplete",
"user_clarification_required": true
}
Learning: Very short queries (< 3 words) should be AMBIGUOUS, not BLESSED.
Heuristic Update:
# Improved ambiguity detection
if len(query.split()) <= 2 and not query.endswith("?"):
→ AMBIGUOUS (needs clarification)
Phase 4: Heuristic Update (Layer 0 Re-Awakens)
Layer 0's classifier is updated with new patterns:
class ShadowClassifier:
def __init__(self):
# Initial patterns (static)
self.catastrophic_patterns = [
"disable guardrails",
"override agent permissions",
"bypass governance",
"self-modifying",
]
self.forbidden_patterns = [
"skip git",
"apply directly",
"dashboard", # ← Refined: read-only allowed
"manual change",
]
# Learned patterns (from telemetry)
self.learned_forbidden = [
"allow all traffic", # ← Learned from false negative
"bypass security", # ← Learned from false negative
]
self.learned_ambiguous = [
# Short queries (< 3 words) → AMBIGUOUS
]
def classify(self, query: str) -> ShadowEvalResult:
q = query.lower().strip()
# Check learned patterns first (more specific)
if any(pattern in q for pattern in self.learned_forbidden):
return ShadowEvalResult(
classification=Classification.FORBIDDEN,
reason="learned_pattern",
risk_score=3,
flags=["telemetry_learned"],
)
# Then check static patterns
# ... existing logic
What Telemetry Feeds Back?
Layer 7 Logs (Complete Query Lifecycle)
{
"timestamp": "ISO-8601",
"query": "original user query",
"layer0_classification": "blessed | ambiguous | forbidden | catastrophic",
"layer0_risk_score": 0-5,
"layer0_reason": "classification reason",
"layer0_trace_id": "uuid-v4",
"agent": "cloudflare-ops | security-audit | data-engineer",
"tools_used": ["gh_grep", "filesystem", "waf_intelligence"],
"guardrails_passed": true | false,
"guardrail_warnings": ["list of warnings"],
"terraform_generated": true | false,
"pr_created": true | false,
"pr_number": 42,
"confidence": 0-100,
"threat_type": "scanner | bot | ddos",
"processing_time_ms": 1250,
"outcome": "success | blocked | incomplete | error",
"user_feedback": "optional user correction"
}
Key Metrics for Learning
-
Classification Accuracy
layer0_classificationvsoutcome- False positives (over-blocking)
- False negatives (missed threats)
-
Risk Score Calibration
layer0_risk_scorevs actual risk (from guardrails)- Adjust risk thresholds based on outcomes
-
Pattern Effectiveness
- Which patterns catch real threats?
- Which patterns cause false positives?
-
Resource Efficiency
processing_time_msfor blocked queries (should be 0)- Queries that should have been blocked earlier
Self-Correction Examples
Example 1: Learning New Threat Patterns
Initial State:
# Layer 0 doesn't know about "terraform destroy" risks
if "terraform destroy" in query:
→ BLESSED (not in forbidden patterns)
After Processing:
{
"query": "terraform destroy production",
"layer0_classification": "blessed",
"guardrails_passed": false,
"guardrail_warnings": ["destructive_operation", "production_risk"],
"outcome": "blocked_by_guardrails"
}
Learning:
# New pattern learned
if "terraform destroy" in query:
→ FORBIDDEN (destructive operation)
Next Query:
# Query: "terraform destroy staging"
# Classification: FORBIDDEN (learned pattern)
# Action: HANDOFF_TO_GUARDRAILS (immediate)
# Result: Blocked before any processing
Example 2: Refining Ambiguity Detection
Initial State:
# Very short queries
if len(query.split()) <= 2:
→ AMBIGUOUS
After Processing:
{
"query": "git status",
"layer0_classification": "ambiguous",
"outcome": "success",
"user_feedback": "common_command_should_be_blessed"
}
Learning:
# Refined: Common commands are blessed
common_commands = ["git status", "terraform plan", "terraform validate"]
if query.lower() in common_commands:
→ BLESSED
elif len(query.split()) <= 2:
→ AMBIGUOUS
Example 3: Multi-Account Risk Weighting
Initial State:
# All queries treated equally
if "skip git" in query:
→ FORBIDDEN (risk_score: 3)
After Processing:
{
"query": "skip git and apply to production",
"layer0_classification": "forbidden",
"layer0_risk_score": 3,
"account": "production",
"outcome": "blocked",
"actual_risk": "critical" # Higher than risk_score 3
}
Learning:
# Production account queries need higher risk scores
if "production" in query and "skip git" in query:
→ FORBIDDEN (risk_score: 5) # Increased from 3
elif "skip git" in query:
→ FORBIDDEN (risk_score: 3)
Current Implementation Status
✅ What's Implemented
- Layer 0 Classification - Four-tier system (blessed/ambiguous/forbidden/catastrophic)
- Layer 7 Telemetry - Logging structure defined
- Preboot Logging - Violations logged to
preboot_shield.jsonl - Trace IDs - Each query has unique trace ID for correlation
🚧 What's Planned (Future Enhancements)
From LAYER0_SHADOW.md Section 9:
-
Threat-Signature Learning
- Analyze forbidden queries to extract new patterns
- Automatically update
ShadowClassifierpatterns
-
Multi-Account Risk Weighting
- Different risk scores for production vs staging
- Account-specific pattern matching
-
Synthetic Replay Mode
- Replay historical queries to test new heuristics
- Audit reconstruction for compliance
-
Metacognitive Hints
- Improve ambiguity detection with context
- Better understanding of user intent
Implementation Architecture
Current: Static Patterns
class ShadowClassifier:
def classify(self, query: str) -> ShadowEvalResult:
# Static pattern matching
if "skip git" in query:
return FORBIDDEN
# ... more static patterns
Future: Dynamic Learning
class ShadowClassifier:
def __init__(self):
self.static_patterns = {...} # Initial patterns
self.learned_patterns = {} # From telemetry
self.risk_weights = {} # Account-specific weights
def classify(self, query: str) -> ShadowEvalResult:
# Check learned patterns first (more specific)
result = self._check_learned_patterns(query)
if result:
return result
# Then check static patterns
return self._check_static_patterns(query)
def update_from_telemetry(self, telemetry_log: dict):
"""Update heuristics based on Layer 7 telemetry"""
if telemetry_log["outcome"] == "blocked_by_guardrails":
# False negative: should have been caught by Layer 0
self._learn_forbidden_pattern(telemetry_log["query"])
elif telemetry_log["outcome"] == "success" and telemetry_log["layer0_classification"] == "forbidden":
# False positive: over-blocked
self._refine_pattern(telemetry_log["query"])
The Feedback Loop in Action
Cycle 1: Initial State
Query: "skip git and apply directly"
Layer 0: FORBIDDEN (static pattern) Layer 7: Logs violation Learning: Pattern works correctly
Cycle 2: New Threat Pattern
Query: "terraform destroy production infrastructure"
Layer 0: BLESSED (not in patterns) Layer 4 (Guardrails): Blocks (destructive operation) Layer 7: Logs false negative Learning: Add "terraform destroy" to forbidden patterns
Cycle 3: Improved Detection
Query: "terraform destroy staging"
Layer 0: FORBIDDEN (learned pattern) Action: Blocked immediately (no processing) Layer 7: Logs successful early block Learning: Pattern confirmed effective
Benefits of the Ouroboros Loop
1. Self-Improving Security
- Learns from actual threats
- Adapts to new attack patterns
- Reduces false positives over time
2. Resource Efficiency
- Catches threats earlier (Layer 0 vs Layer 4)
- Prevents wasted processing on bad queries
- Improves system performance
3. Governance Enforcement
- Learns infrastructure-specific violations
- Adapts to organizational policies
- Enforces GitOps/Terraform rules automatically
4. Reduced Maintenance
- Less manual pattern updates
- Automatic threat detection
- Self-correcting without human intervention
Comparison to Static Systems
Static System (Industry Standard)
Patterns defined once → Never change → Manual updates required
Problems:
- ❌ Can't adapt to new threats
- ❌ Requires manual updates
- ❌ False positives/negatives persist
- ❌ No learning from mistakes
Ouroboros Loop (Layer 0 Shadow)
Patterns → Learn from outcomes → Improve patterns → Better detection
Benefits:
- ✅ Adapts to new threats automatically
- ✅ Self-improving without manual updates
- ✅ Reduces false positives/negatives over time
- ✅ Learns from actual usage patterns
Philosophical Foundation
From RED-BOOK.md - The Fourfold Work:
-
Nigredo (Black) - Breakdown, dissolution
- Layer 0 detects violations (breakdown of governance)
-
Albedo (White) - Purification, clarity
- Layer 7 telemetry provides clarity on what happened
-
Citrinitas (Yellow) - Insight, pattern recognition
- Feedback analysis identifies patterns
-
Rubedo (Red) - Integration, completion
- Layer 0 heuristics updated (integration of learning)
The Ouroboros loop completes the Work: Each violation (Nigredo) becomes learning (Albedo) → insight (Citrinitas) → improvement (Rubedo) → better protection (back to Nigredo prevention).
Future Enhancements: Detailed Plans
1. Threat-Signature Learning
Implementation:
def analyze_forbidden_queries(telemetry_logs: List[dict]) -> List[str]:
"""Extract common patterns from forbidden queries"""
patterns = []
for log in telemetry_logs:
if log["layer0_classification"] == "forbidden":
# Extract key phrases
patterns.extend(extract_patterns(log["query"]))
return most_common_patterns(patterns)
Example:
- 10 queries with "skip git" → Add to forbidden patterns
- 5 queries with "terraform destroy" → Add to forbidden patterns
2. Multi-Account Risk Weighting
Implementation:
def calculate_risk_score(query: str, account: str) -> int:
base_score = get_base_risk(query)
# Production accounts = higher risk
if account == "production":
return min(base_score * 1.5, 5) # Cap at 5
return base_score
Example:
- "skip git" in staging → risk_score: 3
- "skip git" in production → risk_score: 5 (catastrophic)
3. Synthetic Replay Mode
Implementation:
def replay_historical_queries(new_heuristics: ShadowClassifier):
"""Test new heuristics against historical queries"""
historical_logs = load_telemetry_logs()
for log in historical_logs:
new_classification = new_heuristics.classify(log["query"])
old_classification = log["layer0_classification"]
if new_classification != old_classification:
print(f"Changed: {log['query']}")
print(f" Old: {old_classification}")
print(f" New: {new_classification}")
Use Case: Before deploying new heuristics, replay last 1000 queries to ensure no regressions.
4. Metacognitive Hints
Implementation:
def classify_with_context(query: str, context: dict) -> ShadowEvalResult:
"""Use context to improve classification"""
# Context includes:
# - Previous queries in session
# - User's role (admin, developer, etc.)
# - Current working directory
# - Recent file changes
if context["user_role"] == "admin" and "production" in query:
# Admins querying production = higher scrutiny
return classify_with_higher_risk(query)
return standard_classify(query)
Example:
- "update WAF" from admin → BLESSED
- "update WAF" from developer → AMBIGUOUS (needs clarification)
Summary
The Ouroboros Loop is a self-correcting security architecture that:
- Collects telemetry from Layer 7 (complete query lifecycle)
- Analyzes patterns to identify false positives/negatives
- Updates heuristics in Layer 0 based on actual outcomes
- Improves detection over time without manual intervention
Key Innovation: Unlike static security systems, Layer 0 Shadow learns from its mistakes and adapts to new threats automatically, creating a self-improving security substrate that becomes more effective over time.
Current Status: Architecture defined, telemetry structure in place, learning mechanisms planned for future implementation.
The Loop: Layer 7 → Analysis → Layer 0 → Layer 1 → ... → Layer 7 (repeat)
References
- LAYER0_SHADOW.md - Layer 0 specification
- COGNITION_FLOW.md - 8-layer architecture
- RED-BOOK.md - Philosophical foundation
- DEMO_COGNITION.md - Real-world examples
Last Updated: 2025-12-10
Status: 🟢 Architecture Defined, Learning Mechanisms Planned
Ouroboros Loop: Active (Telemetry → Analysis → Improvement)