Initial commit: Cloudflare infrastructure with WAF Intelligence

- Complete Cloudflare Terraform configuration (DNS, WAF, tunnels, access)
- WAF Intelligence MCP server with threat analysis and ML classification
- GitOps automation with PR workflows and drift detection
- Observatory monitoring stack with Prometheus/Grafana
- IDE operator rules for governed development
- Security playbooks and compliance frameworks
- Autonomous remediation and state reconciliation
This commit is contained in:
Vault Sovereign
2025-12-16 18:31:53 +00:00
commit 37a867c485
123 changed files with 25407 additions and 0 deletions

26
observatory/.env.example Normal file
View File

@@ -0,0 +1,26 @@
# Cloudflare Mesh Observatory Environment
# Copy to .env and fill in values
# Cloudflare API Credentials
CLOUDFLARE_API_TOKEN=
CLOUDFLARE_ZONE_ID=
CLOUDFLARE_ACCOUNT_ID=
# Grafana Admin Password
GRAFANA_PASSWORD=changeme
# ==============================================
# Phase 5B - Alerting Configuration
# ==============================================
# Slack Integration
# Create incoming webhook: https://api.slack.com/messaging/webhooks
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/XXX/YYY/ZZZ
# PagerDuty Integration
# Create service integration: https://support.pagerduty.com/docs/services-and-integrations
PAGERDUTY_SERVICE_KEY=
# Email (SMTP) Settings
SMTP_USERNAME=
SMTP_PASSWORD=

View File

@@ -0,0 +1,19 @@
# Cloudflare Metrics Exporter Container
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
RUN pip install --no-cache-dir requests
# Copy exporter script
COPY metrics-exporter.py /app/
# Non-root user
RUN useradd -r -s /sbin/nologin exporter
USER exporter
EXPOSE 9100
ENTRYPOINT ["python3", "/app/metrics-exporter.py"]
CMD ["--port", "9100"]

171
observatory/README.md Normal file
View File

@@ -0,0 +1,171 @@
# Mesh Observatory
Prometheus + Grafana monitoring stack for Cloudflare infrastructure state.
## Components
| Component | Port | Description |
|-----------|------|-------------|
| Prometheus | 9090 | Metrics collection and storage |
| Grafana | 3000 | Visualization dashboards |
| Metrics Exporter | 9100 | Custom Cloudflare metrics |
## Quick Start
### 1. Configure Environment
```bash
cp .env.example .env
# Edit .env with your credentials
```
Required environment variables:
```
CLOUDFLARE_API_TOKEN=<your-token>
CLOUDFLARE_ZONE_ID=<your-zone-id>
CLOUDFLARE_ACCOUNT_ID=<your-account-id>
GRAFANA_PASSWORD=<secure-password>
```
### 2. Start Stack
```bash
docker-compose up -d
```
### 3. Access Dashboards
- Grafana: http://localhost:3000 (admin / $GRAFANA_PASSWORD)
- Prometheus: http://localhost:9090
## Dashboards
| Dashboard | UID | Description |
|-----------|-----|-------------|
| Cloudflare Mesh Overview | cf-overview | Main command center |
| DNS Health | cf-dns | DNS records, DNSSEC, types |
| Tunnel Status | cf-tunnel | Tunnel health, connections |
| Invariants & Compliance | cf-invariants | Invariant pass/fail, anomalies |
| Security Settings | cf-security | SSL, TLS, Access apps |
| ProofChain & Anchors | cf-proofchain | Merkle roots, snapshot freshness |
## Metrics Reference
### DNS Metrics
- `cloudflare_dns_records_total` - Total DNS records
- `cloudflare_dns_records_proxied` - Proxied records count
- `cloudflare_dns_records_unproxied` - DNS-only records count
- `cloudflare_dns_records_by_type{type="A|AAAA|CNAME|..."}` - Records by type
- `cloudflare_dnssec_enabled` - DNSSEC status (0/1)
### Tunnel Metrics
- `cloudflare_tunnels_total` - Total active tunnels
- `cloudflare_tunnels_healthy` - Tunnels with active connections
- `cloudflare_tunnels_unhealthy` - Tunnels without connections
- `cloudflare_tunnel_connections_total` - Total tunnel connections
### Zone Settings
- `cloudflare_zone_ssl_strict` - SSL mode is strict (0/1)
- `cloudflare_zone_tls_version_secure` - TLS 1.2+ enforced (0/1)
- `cloudflare_zone_always_https` - HTTPS redirect enabled (0/1)
- `cloudflare_zone_browser_check` - Browser integrity check (0/1)
### Access Metrics
- `cloudflare_access_apps_total` - Total Access applications
- `cloudflare_access_apps_by_type{type="..."}` - Apps by type
### Invariant Metrics
- `cloudflare_invariants_total` - Total invariant checks
- `cloudflare_invariants_passed` - Passing invariants
- `cloudflare_invariants_failed` - Failing invariants
- `cloudflare_invariants_pass_rate` - Pass percentage
- `cloudflare_invariant_report_age_seconds` - Report freshness
### Snapshot Metrics
- `cloudflare_snapshot_age_seconds` - Seconds since last snapshot
- `cloudflare_snapshot_merkle_root_set` - Merkle root present (0/1)
### Anomaly Metrics
- `cloudflare_anomalies_total` - Total anomaly receipts
- `cloudflare_anomalies_last_24h` - Recent anomalies
## Drift Visualizer
Standalone tool for comparing state sources.
### Usage
```bash
python3 drift-visualizer.py \
--snapshot ../snapshots/cloudflare-latest.json \
--manifest ../cloudflare_dns_manifest.md \
--output-dir ../reports
```
### Output
- `drift-report-<timestamp>.json` - Machine-readable diff
- `drift-report-<timestamp>.html` - Visual HTML report
## Directory Structure
```
observatory/
├── docker-compose.yml # Stack definition
├── Dockerfile.exporter # Metrics exporter container
├── prometheus.yml # Prometheus config
├── metrics-exporter.py # Custom exporter
├── drift-visualizer.py # Drift analysis tool
├── datasources/ # Grafana datasource provisioning
│ └── prometheus.yml
├── dashboards/ # Grafana dashboard provisioning
│ ├── dashboards.yml
│ ├── cloudflare-overview.json
│ ├── dns-health.json
│ ├── tunnel-status.json
│ ├── invariants.json
│ ├── security-settings.json
│ └── proofchain.json
└── rules/ # Prometheus alerting rules (optional)
```
## Integration with CI/CD
The metrics exporter reads from:
- `../snapshots/` - State snapshots from state-reconciler.py
- `../anomalies/` - Anomaly receipts from invariant-checker.py
Ensure these directories are populated by the GitLab CI pipeline or systemd services.
## Alerting (Optional)
Create alerting rules in `rules/alerts.yml`:
```yaml
groups:
- name: cloudflare
rules:
- alert: InvariantFailure
expr: cloudflare_invariants_failed > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Cloudflare invariant check failing"
- alert: TunnelUnhealthy
expr: cloudflare_tunnels_unhealthy > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Cloudflare tunnel has no connections"
- alert: SnapshotStale
expr: cloudflare_snapshot_age_seconds > 7200
for: 10m
labels:
severity: warning
annotations:
summary: "Cloudflare state snapshot older than 2 hours"
```

View File

@@ -0,0 +1,365 @@
# Alertmanager Configuration for Cloudflare Mesh Observatory
# Phase 5B - Alerts & Escalation
global:
# Default SMTP settings (override in receivers)
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'cloudflare-alerts@yourdomain.com'
smtp_auth_username: '${SMTP_USERNAME}'
smtp_auth_password: '${SMTP_PASSWORD}'
smtp_require_tls: true
# Slack API URL (set via environment)
slack_api_url: '${SLACK_WEBHOOK_URL}'
# PagerDuty integration key
pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'
# Resolve timeout
resolve_timeout: 5m
# Templates for notifications
templates:
- '/etc/alertmanager/templates/*.tmpl'
# Routing tree
route:
# Default receiver
receiver: 'slack-default'
# Group alerts by these labels
group_by: ['alertname', 'severity', 'component']
# Wait before sending first notification
group_wait: 30s
# Wait before sending notification about new alerts in group
group_interval: 5m
# Wait before re-sending notification
repeat_interval: 4h
# Child routes for different severities and components
routes:
# ============================================
# CRITICAL ALERTS - Immediate PagerDuty
# ============================================
- match:
severity: critical
receiver: 'pagerduty-critical'
group_wait: 10s
repeat_interval: 1h
continue: true # Also send to Slack
- match:
severity: critical
receiver: 'slack-critical'
group_wait: 10s
# ============================================
# TUNNEL ALERTS
# ============================================
- match:
component: tunnel
receiver: 'slack-tunnels'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
continue: true
- match:
severity: critical
receiver: 'slack-critical'
# ============================================
# DNS ALERTS
# ============================================
- match:
component: dns
receiver: 'slack-dns'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
continue: true
- match:
alertname: DNSHijackDetected
receiver: 'pagerduty-critical'
# ============================================
# WAF ALERTS
# ============================================
- match:
component: waf
receiver: 'slack-waf'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
continue: true
- match:
alertname: WAFMassiveAttack
receiver: 'pagerduty-critical'
# ============================================
# INVARIANT ALERTS (Security Policy Violations)
# ============================================
- match:
component: invariant
receiver: 'slack-security'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
continue: true
# ============================================
# PROOFCHAIN ALERTS
# ============================================
- match:
component: proofchain
receiver: 'slack-proofchain'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
# ============================================
# WARNING ALERTS - Slack only
# ============================================
- match:
severity: warning
receiver: 'slack-warnings'
repeat_interval: 8h
# ============================================
# INFO ALERTS - Daily digest
# ============================================
- match:
severity: info
receiver: 'email-daily'
group_wait: 1h
repeat_interval: 24h
# ============================================
# PHASE 6 - GITOPS DRIFT REMEDIATION
# Route drift alerts to GitOps webhook for auto-PR
# ============================================
- match:
alertname: DNSDriftDetected
receiver: 'gitops-drift-pr'
continue: true # Also send to slack-dns
- match:
alertname: WAFRuleMissing
receiver: 'gitops-drift-pr'
continue: true
- match:
alertname: FirewallRuleMissing
receiver: 'gitops-drift-pr'
continue: true
- match:
alertname: TunnelConfigChanged
receiver: 'gitops-drift-pr'
continue: true
- match_re:
alertname: '.*(Drift|Mismatch|Changed).*'
receiver: 'gitops-drift-pr'
continue: true
# Inhibition rules - suppress lower severity when higher fires
inhibit_rules:
# If critical fires, suppress warning for same alert
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'component']
# If warning fires, suppress info for same alert
- source_match:
severity: 'warning'
target_match:
severity: 'info'
equal: ['alertname', 'component']
# Suppress all tunnel alerts if Cloudflare API is down
- source_match:
alertname: 'CloudflareAPIDown'
target_match:
component: 'tunnel'
equal: []
# Suppress DNS alerts during planned maintenance
- source_match:
alertname: 'PlannedMaintenance'
target_match:
component: 'dns'
equal: []
# Receivers definition
receivers:
# ============================================
# SLACK RECEIVERS
# ============================================
- name: 'slack-default'
slack_configs:
- channel: '#cloudflare-alerts'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: '{{ template "slack.cloudflare.color" . }}'
actions:
- type: button
text: 'Runbook'
url: '{{ template "slack.cloudflare.runbook" . }}'
- type: button
text: 'Grafana'
url: 'http://localhost:3000/d/cloudflare-overview'
- name: 'slack-critical'
slack_configs:
- channel: '#cloudflare-critical'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: 'danger'
actions:
- type: button
text: 'Runbook'
url: '{{ template "slack.cloudflare.runbook" . }}'
- type: button
text: 'Grafana'
url: 'http://localhost:3000/d/cloudflare-overview'
- name: 'slack-warnings'
slack_configs:
- channel: '#cloudflare-alerts'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: 'warning'
- name: 'slack-tunnels'
slack_configs:
- channel: '#cloudflare-tunnels'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: '{{ template "slack.cloudflare.color" . }}'
actions:
- type: button
text: 'Tunnel Playbook'
url: 'https://wiki.internal/playbooks/tunnel-rotation'
- type: button
text: 'Tunnel Dashboard'
url: 'http://localhost:3000/d/tunnel-status'
- name: 'slack-dns'
slack_configs:
- channel: '#cloudflare-dns'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: '{{ template "slack.cloudflare.color" . }}'
actions:
- type: button
text: 'DNS Playbook'
url: 'https://wiki.internal/playbooks/dns-compromise'
- type: button
text: 'DNS Dashboard'
url: 'http://localhost:3000/d/dns-health'
- name: 'slack-waf'
slack_configs:
- channel: '#cloudflare-waf'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: '{{ template "slack.cloudflare.color" . }}'
actions:
- type: button
text: 'WAF Playbook'
url: 'https://wiki.internal/playbooks/waf-incident'
- type: button
text: 'WAF Dashboard'
url: 'http://localhost:3000/d/security-settings'
- name: 'slack-security'
slack_configs:
- channel: '#cloudflare-security'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: '{{ template "slack.cloudflare.color" . }}'
actions:
- type: button
text: 'Invariants Dashboard'
url: 'http://localhost:3000/d/invariants'
- name: 'slack-proofchain'
slack_configs:
- channel: '#cloudflare-proofchain'
send_resolved: true
title: '{{ template "slack.cloudflare.title" . }}'
text: '{{ template "slack.cloudflare.text" . }}'
color: '{{ template "slack.cloudflare.color" . }}'
actions:
- type: button
text: 'Proofchain Dashboard'
url: 'http://localhost:3000/d/proofchain'
# ============================================
# PAGERDUTY RECEIVERS
# ============================================
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: '${PAGERDUTY_SERVICE_KEY}'
send_resolved: true
description: '{{ template "pagerduty.cloudflare.description" . }}'
severity: 'critical'
client: 'Cloudflare Mesh Observatory'
client_url: 'http://localhost:3000'
details:
alertname: '{{ .GroupLabels.alertname }}'
component: '{{ .GroupLabels.component }}'
severity: '{{ .GroupLabels.severity }}'
summary: '{{ .CommonAnnotations.summary }}'
runbook: '{{ .CommonAnnotations.runbook_url }}'
# ============================================
# EMAIL RECEIVERS
# ============================================
- name: 'email-daily'
email_configs:
- to: 'cloudflare-team@yourdomain.com'
send_resolved: true
html: '{{ template "email.cloudflare.html" . }}'
headers:
Subject: '[Cloudflare] Daily Alert Digest - {{ .Status | toUpper }}'
# ============================================
# WEBHOOK RECEIVERS (for custom integrations)
# ============================================
- name: 'webhook-remediation'
webhook_configs:
- url: 'http://autonomous-remediator:8080/webhook/alert'
send_resolved: true
max_alerts: 10
# ============================================
# PHASE 6 - GITOPS WEBHOOK RECEIVER
# ============================================
- name: 'gitops-drift-pr'
webhook_configs:
- url: '${GITOPS_WEBHOOK_URL:-http://gitops-webhook:8080/webhook/alert}'
send_resolved: false # Only fire on new alerts, not resolved
max_alerts: 5
http_config:
# Optional: Add bearer token or basic auth
# authorization:
# type: Bearer
# credentials: '${GITOPS_WEBHOOK_TOKEN}'

View File

@@ -0,0 +1,326 @@
{{/* Email notification templates for Cloudflare Mesh Observatory */}}
{{/* HTML email template */}}
{{ define "email.cloudflare.html" }}
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
line-height: 1.6;
color: #333;
max-width: 800px;
margin: 0 auto;
padding: 20px;
}
.header {
background: linear-gradient(135deg, #F6821F 0%, #F38020 100%);
color: white;
padding: 20px;
border-radius: 8px 8px 0 0;
text-align: center;
}
.header h1 {
margin: 0;
font-size: 24px;
}
.status-badge {
display: inline-block;
padding: 4px 12px;
border-radius: 20px;
font-size: 12px;
font-weight: bold;
text-transform: uppercase;
margin-top: 10px;
}
.status-firing { background: #dc3545; color: white; }
.status-resolved { background: #28a745; color: white; }
.content {
background: #fff;
border: 1px solid #e0e0e0;
border-top: none;
padding: 20px;
border-radius: 0 0 8px 8px;
}
.alert-card {
background: #f8f9fa;
border-left: 4px solid #F6821F;
padding: 15px;
margin: 15px 0;
border-radius: 0 4px 4px 0;
}
.alert-card.critical { border-left-color: #dc3545; }
.alert-card.warning { border-left-color: #ffc107; }
.alert-card.info { border-left-color: #17a2b8; }
.alert-card.resolved { border-left-color: #28a745; }
.alert-title {
font-size: 16px;
font-weight: bold;
color: #333;
margin-bottom: 10px;
}
.alert-meta {
font-size: 12px;
color: #666;
margin-bottom: 10px;
}
.alert-meta span {
display: inline-block;
margin-right: 15px;
}
.label {
display: inline-block;
background: #e9ecef;
padding: 2px 8px;
border-radius: 4px;
font-size: 11px;
margin: 2px;
}
.description {
margin: 10px 0;
padding: 10px;
background: white;
border-radius: 4px;
}
.runbook-link {
display: inline-block;
background: #F6821F;
color: white;
padding: 8px 16px;
border-radius: 4px;
text-decoration: none;
font-size: 14px;
margin-top: 10px;
}
.runbook-link:hover {
background: #e67316;
}
.summary-table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
}
.summary-table th, .summary-table td {
padding: 10px;
text-align: left;
border-bottom: 1px solid #e0e0e0;
}
.summary-table th {
background: #f8f9fa;
font-weight: 600;
}
.footer {
text-align: center;
font-size: 12px;
color: #888;
margin-top: 20px;
padding-top: 20px;
border-top: 1px solid #e0e0e0;
}
.footer a {
color: #F6821F;
text-decoration: none;
}
</style>
</head>
<body>
<div class="header">
<h1>Cloudflare Mesh Observatory</h1>
<span class="status-badge status-{{ .Status }}">{{ .Status }}</span>
</div>
<div class="content">
<h2>Alert Summary</h2>
<table class="summary-table">
<tr>
<th>Status</th>
<td>{{ .Status | toUpper }}</td>
</tr>
<tr>
<th>Alert Name</th>
<td>{{ .CommonLabels.alertname }}</td>
</tr>
<tr>
<th>Severity</th>
<td>{{ .CommonLabels.severity | toUpper }}</td>
</tr>
<tr>
<th>Component</th>
<td>{{ .CommonLabels.component }}</td>
</tr>
<tr>
<th>Firing Alerts</th>
<td>{{ .Alerts.Firing | len }}</td>
</tr>
<tr>
<th>Resolved Alerts</th>
<td>{{ .Alerts.Resolved | len }}</td>
</tr>
</table>
<h2>Alert Details</h2>
{{ range .Alerts }}
<div class="alert-card {{ .Labels.severity }}{{ if eq .Status "resolved" }} resolved{{ end }}">
<div class="alert-title">
{{ .Labels.alertname }}
<span class="status-badge status-{{ .Status }}" style="font-size: 10px; padding: 2px 8px;">{{ .Status }}</span>
</div>
<div class="alert-meta">
<span><strong>Severity:</strong> {{ .Labels.severity }}</span>
<span><strong>Component:</strong> {{ .Labels.component }}</span>
<span><strong>Started:</strong> {{ .StartsAt.Format "2006-01-02 15:04:05 UTC" }}</span>
{{ if eq .Status "resolved" }}
<span><strong>Resolved:</strong> {{ .EndsAt.Format "2006-01-02 15:04:05 UTC" }}</span>
{{ end }}
</div>
<div class="description">
<strong>Summary:</strong> {{ .Annotations.summary }}<br>
<strong>Description:</strong> {{ .Annotations.description }}
</div>
<div style="margin-top: 10px;">
<strong>Labels:</strong><br>
{{ range .Labels.SortedPairs }}
<span class="label">{{ .Name }}: {{ .Value }}</span>
{{ end }}
</div>
{{ if .Annotations.runbook_url }}
<a href="{{ .Annotations.runbook_url }}" class="runbook-link">View Runbook</a>
{{ end }}
</div>
{{ end }}
<h2>Quick Links</h2>
<ul>
<li><a href="http://localhost:3000">Grafana Dashboard</a></li>
<li><a href="http://localhost:9090">Prometheus</a></li>
<li><a href="https://dash.cloudflare.com">Cloudflare Dashboard</a></li>
</ul>
</div>
<div class="footer">
<p>
This alert was generated by <strong>Cloudflare Mesh Observatory</strong><br>
<a href="http://localhost:9093">Alertmanager</a> |
<a href="http://localhost:3000">Grafana</a> |
<a href="http://localhost:9090">Prometheus</a>
</p>
<p>
Generated at {{ .ExternalURL }}
</p>
</div>
</body>
</html>
{{ end }}
{{/* Plain text email template */}}
{{ define "email.cloudflare.text" }}
================================================================================
CLOUDFLARE MESH OBSERVATORY - ALERT {{ .Status | toUpper }}
================================================================================
Status: {{ .Status | toUpper }}
Alert: {{ .CommonLabels.alertname }}
Severity: {{ .CommonLabels.severity | toUpper }}
Component: {{ .CommonLabels.component }}
Firing: {{ .Alerts.Firing | len }} alerts
Resolved: {{ .Alerts.Resolved | len }} alerts
================================================================================
ALERT DETAILS
================================================================================
{{ range .Alerts }}
--------------------------------------------------------------------------------
{{ .Labels.alertname }} [{{ .Status | toUpper }}]
--------------------------------------------------------------------------------
Severity: {{ .Labels.severity }}
Component: {{ .Labels.component }}
Started: {{ .StartsAt.Format "2006-01-02 15:04:05 UTC" }}
{{ if eq .Status "resolved" }}Resolved: {{ .EndsAt.Format "2006-01-02 15:04:05 UTC" }}{{ end }}
Summary: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }}: {{ .Value }}
{{ end }}
{{ if .Annotations.runbook_url }}Runbook: {{ .Annotations.runbook_url }}{{ end }}
{{ end }}
================================================================================
QUICK LINKS
================================================================================
Grafana: http://localhost:3000
Prometheus: http://localhost:9090
Alertmanager: http://localhost:9093
Cloudflare: https://dash.cloudflare.com
================================================================================
Generated by Cloudflare Mesh Observatory
{{ end }}
{{/* Daily digest email template */}}
{{ define "email.cloudflare.digest" }}
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
/* Same styles as above */
</style>
</head>
<body>
<div class="header">
<h1>Daily Alert Digest</h1>
<p>{{ now.Format "Monday, January 2, 2006" }}</p>
</div>
<div class="content">
<h2>24-Hour Summary</h2>
<table class="summary-table">
<tr>
<th>Metric</th>
<th>Count</th>
</tr>
<tr>
<td>Total Alerts</td>
<td>{{ len .Alerts }}</td>
</tr>
<tr>
<td>Currently Firing</td>
<td>{{ .Alerts.Firing | len }}</td>
</tr>
<tr>
<td>Resolved</td>
<td>{{ .Alerts.Resolved | len }}</td>
</tr>
</table>
<h2>Alerts by Severity</h2>
<!-- Alert breakdown would go here -->
<h2>Alerts by Component</h2>
<!-- Component breakdown would go here -->
</div>
<div class="footer">
<p>This is an automated daily digest from Cloudflare Mesh Observatory</p>
</div>
</body>
</html>
{{ end }}

View File

@@ -0,0 +1,169 @@
{{/* PagerDuty notification templates for Cloudflare Mesh Observatory */}}
{{/* Main description template */}}
{{ define "pagerduty.cloudflare.description" -}}
[{{ .CommonLabels.severity | toUpper }}] {{ .CommonLabels.alertname }} - {{ .CommonAnnotations.summary }}
{{- end }}
{{/* Detailed incident description */}}
{{ define "pagerduty.cloudflare.details" -}}
{{ range .Alerts }}
Alert: {{ .Labels.alertname }}
Severity: {{ .Labels.severity }}
Component: {{ .Labels.component }}
Summary: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Labels:
{{ range .Labels.SortedPairs -}}
{{ .Name }}: {{ .Value }}
{{ end }}
Started: {{ .StartsAt.Format "2006-01-02 15:04:05 UTC" }}
{{ if eq .Status "resolved" }}Resolved: {{ .EndsAt.Format "2006-01-02 15:04:05 UTC" }}{{ end }}
Runbook: {{ if .Annotations.runbook_url }}{{ .Annotations.runbook_url }}{{ else }}https://wiki.internal/playbooks/cloudflare{{ end }}
---
{{ end }}
{{- end }}
{{/* Critical tunnel incident */}}
{{ define "pagerduty.cloudflare.tunnel.critical" -}}
CRITICAL TUNNEL FAILURE
Tunnel: {{ .CommonLabels.tunnel_name }} ({{ .CommonLabels.tunnel_id }})
Zone: {{ .CommonLabels.zone }}
All tunnel connections have failed. Services behind this tunnel are UNREACHABLE.
Immediate Actions Required:
1. Check cloudflared daemon status on origin server
2. Verify network path to Cloudflare edge
3. Review recent configuration changes
4. Consider emergency tunnel rotation
Impact: {{ .CommonAnnotations.impact }}
ETA to degradation: IMMEDIATE
Escalation Chain:
1. On-call Infrastructure Engineer
2. Platform Team Lead
3. Security Team (if compromise suspected)
{{- end }}
{{/* Critical DNS incident */}}
{{ define "pagerduty.cloudflare.dns.critical" -}}
CRITICAL DNS INCIDENT
Type: {{ .CommonLabels.alertname }}
Zone: {{ .CommonLabels.zone }}
Record: {{ .CommonLabels.record_name }}
{{ if eq .CommonLabels.alertname "DNSHijackDetected" -}}
POTENTIAL DNS HIJACK DETECTED
This is a SECURITY INCIDENT. DNS records do not match expected configuration.
Immediate Actions:
1. Verify DNS resolution from multiple locations
2. Check Cloudflare dashboard for unauthorized changes
3. Review audit logs for suspicious activity
4. Engage security incident response
DO NOT dismiss without verification.
{{- else -}}
DNS configuration drift detected. Records have changed from expected baseline.
Actions:
1. Compare current vs expected records
2. Determine if change was authorized
3. Restore from known-good state if needed
{{- end }}
{{- end }}
{{/* Critical WAF incident */}}
{{ define "pagerduty.cloudflare.waf.critical" -}}
CRITICAL WAF INCIDENT
Attack Type: {{ .CommonLabels.attack_type }}
Source: {{ .CommonLabels.source_ip }}
Request Volume: {{ .CommonLabels.request_count }} requests
{{ if eq .CommonLabels.alertname "WAFMassiveAttack" -}}
MASSIVE ATTACK IN PROGRESS
Request volume significantly exceeds baseline. This may indicate:
- DDoS attack
- Credential stuffing
- Application-layer attack
Immediate Actions:
1. Review attack traffic patterns
2. Consider enabling Under Attack Mode
3. Increase rate limiting thresholds
4. Block attacking IPs if identified
Current Mitigation: {{ .CommonAnnotations.current_mitigation }}
{{- else -}}
WAF rule bypass detected. Malicious traffic may be reaching origin.
Actions:
1. Analyze bypassed requests
2. Tighten rule specificity
3. Add supplementary blocking rules
{{- end }}
{{- end }}
{{/* Critical invariant violation */}}
{{ define "pagerduty.cloudflare.invariant.critical" -}}
SECURITY INVARIANT VIOLATION
Invariant: {{ .CommonLabels.invariant_name }}
Category: {{ .CommonLabels.category }}
A critical security invariant has been violated. This indicates:
- Unauthorized configuration change
- Potential security misconfiguration
- Compliance violation
Violation Details:
- Expected: {{ .CommonLabels.expected_value }}
- Actual: {{ .CommonLabels.actual_value }}
- Impact: {{ .CommonAnnotations.impact }}
Affected Frameworks: {{ .CommonLabels.frameworks }}
This violation requires immediate investigation and remediation.
{{- end }}
{{/* Critical proofchain incident */}}
{{ define "pagerduty.cloudflare.proofchain.critical" -}}
PROOFCHAIN INTEGRITY FAILURE
Chain: {{ .CommonLabels.chain_name }}
Receipt Type: {{ .CommonLabels.receipt_type }}
CRITICAL: Proofchain integrity verification has FAILED.
This indicates one of:
1. Ledger tampering
2. Receipt corruption
3. Chain fork
4. Hash collision (extremely unlikely)
Integrity Details:
- Last Valid Hash: {{ .CommonLabels.last_valid_hash }}
- Expected Hash: {{ .CommonLabels.expected_hash }}
- Computed Hash: {{ .CommonLabels.computed_hash }}
IMMEDIATE ACTIONS:
1. HALT all new receipt generation
2. Preserve current state for forensics
3. Identify last known-good checkpoint
4. Engage proofchain administrator
This is a potential SECURITY INCIDENT if tampering is suspected.
{{- end }}

View File

@@ -0,0 +1,200 @@
{{/* Slack notification templates for Cloudflare Mesh Observatory */}}
{{/* Title template */}}
{{ define "slack.cloudflare.title" -}}
{{ if eq .Status "firing" }}{{ .Alerts.Firing | len }} FIRING{{ end }}{{ if and (eq .Status "resolved") (gt (.Alerts.Resolved | len) 0) }}{{ .Alerts.Resolved | len }} RESOLVED{{ end }} | {{ .CommonLabels.alertname }}
{{- end }}
{{/* Color template based on severity */}}
{{ define "slack.cloudflare.color" -}}
{{ if eq .Status "resolved" }}good{{ else if eq .CommonLabels.severity "critical" }}danger{{ else if eq .CommonLabels.severity "warning" }}warning{{ else }}#439FE0{{ end }}
{{- end }}
{{/* Main text body */}}
{{ define "slack.cloudflare.text" -}}
{{ range .Alerts }}
*Alert:* {{ .Labels.alertname }}
*Severity:* {{ .Labels.severity | toUpper }}
*Component:* {{ .Labels.component }}
*Status:* {{ .Status | toUpper }}
*Summary:* {{ .Annotations.summary }}
*Description:* {{ .Annotations.description }}
{{ if .Annotations.runbook_url }}*Runbook:* <{{ .Annotations.runbook_url }}|View Playbook>{{ end }}
*Labels:*
{{ range .Labels.SortedPairs -}}
- {{ .Name }}: `{{ .Value }}`
{{ end }}
*Started:* {{ .StartsAt.Format "2006-01-02 15:04:05 UTC" }}
{{ if eq .Status "resolved" }}*Resolved:* {{ .EndsAt.Format "2006-01-02 15:04:05 UTC" }}{{ end }}
---
{{ end }}
{{- end }}
{{/* Runbook URL template */}}
{{ define "slack.cloudflare.runbook" -}}
{{ if .CommonAnnotations.runbook_url }}{{ .CommonAnnotations.runbook_url }}{{ else }}https://wiki.internal/playbooks/cloudflare{{ end }}
{{- end }}
{{/* Compact alert list for summary */}}
{{ define "slack.cloudflare.alertlist" -}}
{{ range . }}
- {{ .Labels.alertname }} ({{ .Labels.severity }})
{{ end }}
{{- end }}
{{/* Tunnel-specific template */}}
{{ define "slack.cloudflare.tunnel" -}}
{{ range .Alerts }}
*Tunnel Alert*
*Tunnel ID:* {{ .Labels.tunnel_id }}
*Tunnel Name:* {{ .Labels.tunnel_name }}
*Status:* {{ .Status | toUpper }}
{{ .Annotations.description }}
*Action Required:*
{{ if eq .Labels.alertname "TunnelDown" }}
1. Check cloudflared service status
2. Verify network connectivity
3. Run tunnel rotation if unrecoverable
{{ else if eq .Labels.alertname "TunnelRotationDue" }}
1. Schedule maintenance window
2. Execute tunnel rotation protocol
3. Verify new tunnel connectivity
{{ end }}
---
{{ end }}
{{- end }}
{{/* DNS-specific template */}}
{{ define "slack.cloudflare.dns" -}}
{{ range .Alerts }}
*DNS Alert*
*Record:* {{ .Labels.record_name }}
*Type:* {{ .Labels.record_type }}
*Zone:* {{ .Labels.zone }}
*Status:* {{ .Status | toUpper }}
{{ .Annotations.description }}
*Immediate Actions:*
{{ if eq .Labels.alertname "DNSHijackDetected" }}
1. CRITICAL: Potential DNS hijack detected
2. Immediately verify DNS resolution
3. Check Cloudflare audit logs
4. Engage incident response team
{{ else if eq .Labels.alertname "DNSDriftDetected" }}
1. Compare current vs expected records
2. Check for unauthorized changes
3. Run state reconciler if needed
{{ end }}
---
{{ end }}
{{- end }}
{{/* WAF-specific template */}}
{{ define "slack.cloudflare.waf" -}}
{{ range .Alerts }}
*WAF Alert*
*Rule ID:* {{ .Labels.rule_id }}
*Action:* {{ .Labels.action }}
*Source:* {{ .Labels.source_ip }}
*Status:* {{ .Status | toUpper }}
{{ .Annotations.description }}
*Threat Intelligence:*
- Request Count: {{ .Labels.request_count }}
- Block Rate: {{ .Labels.block_rate }}%
- Attack Type: {{ .Labels.attack_type }}
*Recommended Actions:*
{{ if eq .Labels.alertname "WAFMassiveAttack" }}
1. Verify attack is not false positive
2. Consider enabling Under Attack Mode
3. Review and adjust rate limiting
4. Document attack patterns
{{ else if eq .Labels.alertname "WAFRuleBypass" }}
1. Analyze bypassed requests
2. Tighten rule specificity
3. Add supplementary rules
{{ end }}
---
{{ end }}
{{- end }}
{{/* Security/Invariant template */}}
{{ define "slack.cloudflare.security" -}}
{{ range .Alerts }}
*Security Invariant Violation*
*Invariant:* {{ .Labels.invariant_name }}
*Category:* {{ .Labels.category }}
*Status:* {{ .Status | toUpper }}
{{ .Annotations.description }}
*Violation Details:*
- Expected: {{ .Labels.expected_value }}
- Actual: {{ .Labels.actual_value }}
- First Seen: {{ .StartsAt.Format "2006-01-02 15:04:05 UTC" }}
*Compliance Impact:*
This violation may affect:
{{ range split .Labels.frameworks "," -}}
- {{ . }}
{{ end }}
*Remediation Steps:*
1. Review invariant definition
2. Check for authorized changes
3. Run autonomous remediator or manual fix
4. Document change justification
---
{{ end }}
{{- end }}
{{/* Proofchain template */}}
{{ define "slack.cloudflare.proofchain" -}}
{{ range .Alerts }}
*Proofchain Alert*
*Chain:* {{ .Labels.chain_name }}
*Receipt Type:* {{ .Labels.receipt_type }}
*Status:* {{ .Status | toUpper }}
{{ .Annotations.description }}
*Integrity Details:*
- Last Valid Hash: {{ .Labels.last_valid_hash }}
- Expected Hash: {{ .Labels.expected_hash }}
- Computed Hash: {{ .Labels.computed_hash }}
*This indicates potential:*
- Ledger tampering
- Receipt corruption
- Chain fork
- Missing anchors
*Immediate Actions:*
1. DO NOT write new receipts until resolved
2. Identify last known-good state
3. Investigate discrepancy source
4. Contact proofchain administrator
---
{{ end }}
{{- end }}

View File

@@ -0,0 +1,415 @@
{
"annotations": {
"list": []
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "red", "value": 1}
]
}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "10.2.2",
"targets": [
{
"expr": "cloudflare_invariants_failed",
"refId": "A"
}
],
"title": "Invariant Failures",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null}
]
}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "cloudflare_dns_records_total",
"refId": "A"
}
],
"title": "DNS Records",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "red", "value": null},
{"color": "green", "value": 1}
]
}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "cloudflare_tunnels_healthy",
"refId": "A"
}
],
"title": "Healthy Tunnels",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 3600},
{"color": "red", "value": 7200}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "cloudflare_snapshot_age_seconds",
"refId": "A"
}
],
"title": "Snapshot Age",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "OFF"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "ON"}}, "type": "value"}
],
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
"id": 5,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "cloudflare_dnssec_enabled",
"refId": "A"
}
],
"title": "DNSSEC",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 5}
]
}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"id": 6,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "cloudflare_anomalies_last_24h",
"refId": "A"
}
],
"title": "Anomalies (24h)",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {"legend": false, "tooltip": false, "viz": false},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": false,
"stacking": {"group": "A", "mode": "none"},
"thresholdsStyle": {"mode": "off"}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
}
},
"overrides": []
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
"id": 7,
"options": {
"legend": {"calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{
"expr": "cloudflare_invariants_passed",
"legendFormat": "Passed",
"refId": "A"
},
{
"expr": "cloudflare_invariants_failed",
"legendFormat": "Failed",
"refId": "B"
}
],
"title": "Invariant Status Over Time",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {"legend": false, "tooltip": false, "viz": false},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": false,
"stacking": {"group": "A", "mode": "none"},
"thresholdsStyle": {"mode": "off"}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
}
},
"overrides": []
},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
"id": 8,
"options": {
"legend": {"calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{
"expr": "cloudflare_tunnels_healthy",
"legendFormat": "Healthy",
"refId": "A"
},
{
"expr": "cloudflare_tunnels_unhealthy",
"legendFormat": "Unhealthy",
"refId": "B"
}
],
"title": "Tunnel Health Over Time",
"type": "timeseries"
}
],
"refresh": "1m",
"schemaVersion": 38,
"style": "dark",
"tags": ["cloudflare", "mesh", "overview"],
"templating": {
"list": []
},
"time": {
"from": "now-24h",
"to": "now"
},
"timepicker": {},
"timezone": "utc",
"title": "Cloudflare Mesh Overview",
"uid": "cf-overview",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,14 @@
# Grafana Dashboard Provisioning
apiVersion: 1
providers:
- name: 'Cloudflare Mesh'
orgId: 1
folder: 'Cloudflare'
folderUid: 'cloudflare'
type: file
disableDeletion: false
updateIntervalSeconds: 30
allowUiUpdates: true
options:
path: /etc/grafana/provisioning/dashboards

View File

@@ -0,0 +1,195 @@
{
"annotations": {"list": []},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_dns_records_total", "refId": "A"}],
"title": "Total Records",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "orange", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_dns_records_proxied", "refId": "A"}],
"title": "Proxied Records",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_dns_records_unproxied", "refId": "A"}],
"title": "DNS-Only Records",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "DISABLED"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "ACTIVE"}}, "type": "value"}
],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_dnssec_enabled", "refId": "A"}],
"title": "DNSSEC Status",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {"hideFrom": {"legend": false, "tooltip": false, "viz": false}},
"mappings": []
},
"overrides": []
},
"gridPos": {"h": 10, "w": 12, "x": 0, "y": 4},
"id": 5,
"options": {
"displayLabels": ["name", "value"],
"legend": {"displayMode": "list", "placement": "right", "showLegend": true},
"pieType": "pie",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{"expr": "cloudflare_dns_records_by_type{type=\"A\"}", "legendFormat": "A", "refId": "A"},
{"expr": "cloudflare_dns_records_by_type{type=\"AAAA\"}", "legendFormat": "AAAA", "refId": "B"},
{"expr": "cloudflare_dns_records_by_type{type=\"CNAME\"}", "legendFormat": "CNAME", "refId": "C"},
{"expr": "cloudflare_dns_records_by_type{type=\"TXT\"}", "legendFormat": "TXT", "refId": "D"},
{"expr": "cloudflare_dns_records_by_type{type=\"MX\"}", "legendFormat": "MX", "refId": "E"},
{"expr": "cloudflare_dns_records_by_type{type=\"SRV\"}", "legendFormat": "SRV", "refId": "F"}
],
"title": "Records by Type",
"type": "piechart"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {"legend": false, "tooltip": false, "viz": false},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": false,
"stacking": {"group": "A", "mode": "none"},
"thresholdsStyle": {"mode": "off"}
},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 10, "w": 12, "x": 12, "y": 4},
"id": 6,
"options": {
"legend": {"calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{"expr": "cloudflare_dns_records_total", "legendFormat": "Total", "refId": "A"},
{"expr": "cloudflare_dns_records_proxied", "legendFormat": "Proxied", "refId": "B"}
],
"title": "DNS Records Over Time",
"type": "timeseries"
}
],
"refresh": "1m",
"schemaVersion": 38,
"style": "dark",
"tags": ["cloudflare", "dns"],
"templating": {"list": []},
"time": {"from": "now-24h", "to": "now"},
"timepicker": {},
"timezone": "utc",
"title": "DNS Health",
"uid": "cf-dns",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,238 @@
{
"annotations": {"list": []},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_invariants_total", "refId": "A"}],
"title": "Total Invariants",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_invariants_passed", "refId": "A"}],
"title": "Passed",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [
{"color": "green", "value": null},
{"color": "red", "value": 1}
]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_invariants_failed", "refId": "A"}],
"title": "Failed",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"max": 100,
"min": 0,
"thresholds": {"mode": "absolute", "steps": [
{"color": "red", "value": null},
{"color": "yellow", "value": 80},
{"color": "green", "value": 95}
]},
"unit": "percent"
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_invariants_pass_rate", "refId": "A"}],
"title": "Pass Rate",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {"legend": false, "tooltip": false, "viz": false},
"insertNulls": false,
"lineInterpolation": "stepAfter",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": false,
"stacking": {"group": "A", "mode": "none"},
"thresholdsStyle": {"mode": "off"}
},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": [
{
"matcher": {"id": "byName", "options": "Failed"},
"properties": [{"id": "color", "value": {"fixedColor": "red", "mode": "fixed"}}]
},
{
"matcher": {"id": "byName", "options": "Passed"},
"properties": [{"id": "color", "value": {"fixedColor": "green", "mode": "fixed"}}]
}
]
},
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
"id": 5,
"options": {
"legend": {"calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{"expr": "cloudflare_invariants_passed", "legendFormat": "Passed", "refId": "A"},
{"expr": "cloudflare_invariants_failed", "legendFormat": "Failed", "refId": "B"}
],
"title": "Invariant Status Over Time",
"type": "timeseries"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 3600},
{"color": "red", "value": 7200}
]},
"unit": "s"
},
"overrides": []
},
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 14},
"id": 6,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_invariant_report_age_seconds", "refId": "A"}],
"title": "Report Age",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 5}
]}
},
"overrides": []
},
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 14},
"id": 7,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_anomalies_last_24h", "refId": "A"}],
"title": "Anomalies (Last 24h)",
"type": "stat"
}
],
"refresh": "1m",
"schemaVersion": 38,
"style": "dark",
"tags": ["cloudflare", "invariants", "compliance"],
"templating": {"list": []},
"time": {"from": "now-7d", "to": "now"},
"timepicker": {},
"timezone": "utc",
"title": "Invariants & Compliance",
"uid": "cf-invariants",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,217 @@
{
"annotations": {"list": []},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "MISSING"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "SET"}}, "type": "value"}
],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_snapshot_merkle_root_set", "refId": "A"}],
"title": "Merkle Root",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 3600},
{"color": "red", "value": 7200}
]},
"unit": "s"
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_snapshot_age_seconds", "refId": "A"}],
"title": "Snapshot Age",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_anomalies_total", "refId": "A"}],
"title": "Total Anomalies",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 5}
]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_anomalies_last_24h", "refId": "A"}],
"title": "Anomalies (24h)",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {"legend": false, "tooltip": false, "viz": false},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": false,
"stacking": {"group": "A", "mode": "none"},
"thresholdsStyle": {"mode": "off"}
},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]},
"unit": "s"
},
"overrides": []
},
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 4},
"id": 5,
"options": {
"legend": {"calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{"expr": "cloudflare_snapshot_age_seconds", "legendFormat": "Snapshot Age", "refId": "A"},
{"expr": "cloudflare_invariant_report_age_seconds", "legendFormat": "Report Age", "refId": "B"}
],
"title": "Data Freshness",
"type": "timeseries"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "bars",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": {"legend": false, "tooltip": false, "viz": false},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": false,
"stacking": {"group": "A", "mode": "none"},
"thresholdsStyle": {"mode": "off"}
},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 12},
"id": 6,
"options": {
"legend": {"calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{"expr": "cloudflare_anomalies_last_24h", "legendFormat": "Anomalies", "refId": "A"}
],
"title": "Anomaly Timeline",
"type": "timeseries"
}
],
"refresh": "1m",
"schemaVersion": 38,
"style": "dark",
"tags": ["cloudflare", "proofchain", "vaultmesh"],
"templating": {"list": []},
"time": {"from": "now-7d", "to": "now"},
"timepicker": {},
"timezone": "utc",
"title": "ProofChain & Anchors",
"uid": "cf-proofchain",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,245 @@
{
"annotations": {"list": []},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "OFF"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "ON"}}, "type": "value"}
],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_zone_ssl_strict", "refId": "A"}],
"title": "SSL Strict",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "WEAK"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "SECURE"}}, "type": "value"}
],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_zone_tls_version_secure", "refId": "A"}],
"title": "TLS Version",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "OFF"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "ON"}}, "type": "value"}
],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_zone_always_https", "refId": "A"}],
"title": "Always HTTPS",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "OFF"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "ON"}}, "type": "value"}
],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_zone_browser_check", "refId": "A"}],
"title": "Browser Check",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [
{"options": {"0": {"color": "red", "index": 0, "text": "DISABLED"}}, "type": "value"},
{"options": {"1": {"color": "green", "index": 1, "text": "ACTIVE"}}, "type": "value"}
],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
"id": 5,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_dnssec_enabled", "refId": "A"}],
"title": "DNSSEC",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"id": 6,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_access_apps_total", "refId": "A"}],
"title": "Access Apps",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"description": "Security posture score based on enabled security features",
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"max": 6,
"min": 0,
"thresholds": {"mode": "absolute", "steps": [
{"color": "red", "value": null},
{"color": "yellow", "value": 3},
{"color": "green", "value": 5}
]}
},
"overrides": []
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
"id": 7,
"options": {
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"targets": [
{
"expr": "cloudflare_zone_ssl_strict + cloudflare_zone_tls_version_secure + cloudflare_zone_always_https + cloudflare_zone_browser_check + cloudflare_dnssec_enabled + (cloudflare_tunnels_healthy > 0)",
"refId": "A"
}
],
"title": "Security Score",
"type": "gauge"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {"hideFrom": {"legend": false, "tooltip": false, "viz": false}},
"mappings": []
},
"overrides": []
},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
"id": 8,
"options": {
"displayLabels": ["name", "value"],
"legend": {"displayMode": "list", "placement": "right", "showLegend": true},
"pieType": "pie",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{"expr": "cloudflare_access_apps_by_type{type=\"self_hosted\"}", "legendFormat": "Self-Hosted", "refId": "A"},
{"expr": "cloudflare_access_apps_by_type{type=\"saas\"}", "legendFormat": "SaaS", "refId": "B"},
{"expr": "cloudflare_access_apps_by_type{type=\"ssh\"}", "legendFormat": "SSH", "refId": "C"},
{"expr": "cloudflare_access_apps_by_type{type=\"vnc\"}", "legendFormat": "VNC", "refId": "D"},
{"expr": "cloudflare_access_apps_by_type{type=\"bookmark\"}", "legendFormat": "Bookmark", "refId": "E"}
],
"title": "Access Apps by Type",
"type": "piechart"
}
],
"refresh": "1m",
"schemaVersion": 38,
"style": "dark",
"tags": ["cloudflare", "security", "access"],
"templating": {"list": []},
"time": {"from": "now-24h", "to": "now"},
"timepicker": {},
"timezone": "utc",
"title": "Security Settings",
"uid": "cf-security",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,204 @@
{
"annotations": {"list": []},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_tunnels_total", "refId": "A"}],
"title": "Total Tunnels",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [
{"color": "red", "value": null},
{"color": "green", "value": 1}
]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_tunnels_healthy", "refId": "A"}],
"title": "Healthy Tunnels",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [
{"color": "green", "value": null},
{"color": "red", "value": 1}
]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_tunnels_unhealthy", "refId": "A"}],
"title": "Unhealthy Tunnels",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"textMode": "auto"
},
"targets": [{"expr": "cloudflare_tunnel_connections_total", "refId": "A"}],
"title": "Total Connections",
"type": "stat"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {"legend": false, "tooltip": false, "viz": false},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": false,
"stacking": {"group": "A", "mode": "none"},
"thresholdsStyle": {"mode": "off"}
},
"mappings": [],
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}
},
"overrides": []
},
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
"id": 5,
"options": {
"legend": {"calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true},
"tooltip": {"mode": "single", "sort": "none"}
},
"targets": [
{"expr": "cloudflare_tunnels_healthy", "legendFormat": "Healthy", "refId": "A"},
{"expr": "cloudflare_tunnels_unhealthy", "legendFormat": "Unhealthy", "refId": "B"},
{"expr": "cloudflare_tunnel_connections_total", "legendFormat": "Connections", "refId": "C"}
],
"title": "Tunnel Health Over Time",
"type": "timeseries"
},
{
"datasource": {"type": "prometheus", "uid": "prometheus"},
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"mappings": [],
"max": 100,
"min": 0,
"thresholds": {"mode": "absolute", "steps": [
{"color": "red", "value": null},
{"color": "yellow", "value": 50},
{"color": "green", "value": 80}
]},
"unit": "percent"
},
"overrides": []
},
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 14},
"id": 6,
"options": {
"orientation": "auto",
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "10.2.2",
"targets": [
{
"expr": "(cloudflare_tunnels_healthy / cloudflare_tunnels_total) * 100",
"refId": "A"
}
],
"title": "Tunnel Health Percentage",
"type": "gauge"
}
],
"refresh": "1m",
"schemaVersion": 38,
"style": "dark",
"tags": ["cloudflare", "tunnel"],
"templating": {"list": []},
"time": {"from": "now-24h", "to": "now"},
"timepicker": {},
"timezone": "utc",
"title": "Tunnel Status",
"uid": "cf-tunnel",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,13 @@
# Grafana Datasource Provisioning
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
jsonData:
timeInterval: "60s"
httpMethod: POST

View File

@@ -0,0 +1,123 @@
# Cloudflare Mesh Observatory Docker Stack
# Prometheus + Grafana + Alertmanager + Custom Metrics Exporter
# Phase 5B - Full Observability + Alerting
services:
# Prometheus - Metrics Collection
prometheus:
image: prom/prometheus:v2.48.0
container_name: cf-prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/alerts:/etc/prometheus/alerts:ro
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
networks:
- observatory
depends_on:
- alertmanager
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
# Alertmanager - Alert Routing & Notifications
alertmanager:
image: prom/alertmanager:v0.26.0
container_name: cf-alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- ./alertmanager/templates:/etc/alertmanager/templates:ro
- alertmanager_data:/alertmanager
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
- '--web.listen-address=:9093'
- '--cluster.listen-address='
environment:
- SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
- PAGERDUTY_SERVICE_KEY=${PAGERDUTY_SERVICE_KEY}
- SMTP_USERNAME=${SMTP_USERNAME}
- SMTP_PASSWORD=${SMTP_PASSWORD}
networks:
- observatory
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:9093/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
# Grafana - Visualization
grafana:
image: grafana/grafana:10.2.2
container_name: cf-grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-changeme}
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=%(protocol)s://%(domain)s:%(http_port)s/
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
volumes:
- grafana_data:/var/lib/grafana
- ./dashboards:/etc/grafana/provisioning/dashboards:ro
- ./datasources:/etc/grafana/provisioning/datasources:ro
networks:
- observatory
depends_on:
- prometheus
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:3000/api/health || exit 1"]
interval: 30s
timeout: 10s
retries: 3
# Cloudflare Metrics Exporter
metrics-exporter:
build:
context: .
dockerfile: Dockerfile.exporter
container_name: cf-metrics-exporter
restart: unless-stopped
ports:
- "9100:9100"
environment:
- CLOUDFLARE_API_TOKEN=${CLOUDFLARE_API_TOKEN}
- CLOUDFLARE_ZONE_ID=${CLOUDFLARE_ZONE_ID}
- CLOUDFLARE_ACCOUNT_ID=${CLOUDFLARE_ACCOUNT_ID}
- SNAPSHOT_DIR=/data/snapshots
- ANOMALY_DIR=/data/anomalies
volumes:
- ../snapshots:/data/snapshots:ro
- ../anomalies:/data/anomalies:ro
networks:
- observatory
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:9100/health"]
interval: 30s
timeout: 10s
retries: 3
networks:
observatory:
driver: bridge
volumes:
prometheus_data:
grafana_data:
alertmanager_data:

View File

@@ -0,0 +1,344 @@
#!/usr/bin/env python3
"""
Drift Visualizer
Compares Terraform state, DNS manifest, and live Cloudflare configuration.
Outputs JSON diff and HTML report.
Usage:
python3 drift-visualizer.py --snapshot <path> --manifest <path> --output <dir>
"""
import argparse
import html
import json
import os
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional, Set, Tuple
OUTPUT_DIR = os.path.join(os.path.dirname(os.path.dirname(__file__)), "reports")
class DriftAnalyzer:
"""Analyzes drift between different state sources."""
def __init__(self):
self.diffs: List[Dict[str, Any]] = []
def compare_dns_records(
self,
source_name: str,
source_records: List[Dict],
target_name: str,
target_records: List[Dict]
) -> List[Dict[str, Any]]:
"""Compare DNS records between two sources."""
diffs = []
# Build lookup maps
source_map = {(r.get("type"), r.get("name")): r for r in source_records}
target_map = {(r.get("type"), r.get("name")): r for r in target_records}
all_keys = set(source_map.keys()) | set(target_map.keys())
for key in all_keys:
rtype, name = key
source_rec = source_map.get(key)
target_rec = target_map.get(key)
if source_rec and not target_rec:
diffs.append({
"type": "missing",
"source": source_name,
"target": target_name,
"record_type": rtype,
"record_name": name,
"detail": f"Record exists in {source_name} but not in {target_name}",
"severity": "high",
})
elif target_rec and not source_rec:
diffs.append({
"type": "extra",
"source": source_name,
"target": target_name,
"record_type": rtype,
"record_name": name,
"detail": f"Record exists in {target_name} but not in {source_name}",
"severity": "medium",
})
else:
# Both exist - check for content/config drift
content_diff = self._compare_record_content(source_rec, target_rec)
if content_diff:
diffs.append({
"type": "modified",
"source": source_name,
"target": target_name,
"record_type": rtype,
"record_name": name,
"detail": content_diff,
"source_value": source_rec,
"target_value": target_rec,
"severity": "medium",
})
return diffs
def _compare_record_content(self, rec1: Dict, rec2: Dict) -> Optional[str]:
"""Compare record content and return diff description."""
diffs = []
if rec1.get("content") != rec2.get("content"):
diffs.append(f"content: {rec1.get('content')} -> {rec2.get('content')}")
if rec1.get("proxied") != rec2.get("proxied"):
diffs.append(f"proxied: {rec1.get('proxied')} -> {rec2.get('proxied')}")
if rec1.get("ttl") != rec2.get("ttl"):
diffs.append(f"ttl: {rec1.get('ttl')} -> {rec2.get('ttl')}")
return "; ".join(diffs) if diffs else None
def compare_settings(
self,
source_name: str,
source_settings: Dict,
target_name: str,
target_settings: Dict
) -> List[Dict[str, Any]]:
"""Compare zone settings."""
diffs = []
all_keys = set(source_settings.keys()) | set(target_settings.keys())
for key in all_keys:
src_val = source_settings.get(key)
tgt_val = target_settings.get(key)
if src_val != tgt_val:
diffs.append({
"type": "setting_drift",
"source": source_name,
"target": target_name,
"setting": key,
"source_value": src_val,
"target_value": tgt_val,
"severity": "medium" if key in ("ssl", "min_tls_version") else "low",
})
return diffs
def analyze(
self,
snapshot: Optional[Dict] = None,
manifest: Optional[Dict] = None,
terraform_state: Optional[Dict] = None
) -> Dict[str, Any]:
"""Run full drift analysis."""
self.diffs = []
comparisons = []
# Snapshot vs Manifest
if snapshot and manifest:
snapshot_dns = snapshot.get("state", {}).get("dns", {}).get("records", [])
manifest_dns = manifest.get("records", [])
dns_diffs = self.compare_dns_records(
"manifest", manifest_dns,
"cloudflare", snapshot_dns
)
self.diffs.extend(dns_diffs)
comparisons.append("manifest_vs_cloudflare")
# Summary
high = len([d for d in self.diffs if d.get("severity") == "high"])
medium = len([d for d in self.diffs if d.get("severity") == "medium"])
low = len([d for d in self.diffs if d.get("severity") == "low"])
return {
"analysis_type": "drift_report",
"timestamp": datetime.now(timezone.utc).isoformat(),
"comparisons": comparisons,
"summary": {
"total_diffs": len(self.diffs),
"high_severity": high,
"medium_severity": medium,
"low_severity": low,
"drift_detected": len(self.diffs) > 0,
},
"diffs": self.diffs,
}
def generate_html_report(analysis: Dict[str, Any]) -> str:
"""Generate HTML visualization of drift report."""
timestamp = analysis.get("timestamp", "")
summary = analysis.get("summary", {})
diffs = analysis.get("diffs", [])
# CSS styles
css = """
<style>
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
max-width: 1200px; margin: 0 auto; padding: 20px; background: #0d1117; color: #c9d1d9; }
h1 { color: #58a6ff; border-bottom: 1px solid #30363d; padding-bottom: 10px; }
h2 { color: #8b949e; }
.summary { display: flex; gap: 20px; margin: 20px 0; }
.card { background: #161b22; padding: 20px; border-radius: 8px; border: 1px solid #30363d; flex: 1; }
.card h3 { margin-top: 0; color: #58a6ff; }
.stat { font-size: 2em; font-weight: bold; }
.high { color: #f85149; }
.medium { color: #d29922; }
.low { color: #3fb950; }
.ok { color: #3fb950; }
table { width: 100%; border-collapse: collapse; margin: 20px 0; }
th, td { padding: 12px; text-align: left; border-bottom: 1px solid #30363d; }
th { background: #161b22; color: #8b949e; }
tr:hover { background: #161b22; }
.badge { padding: 4px 8px; border-radius: 4px; font-size: 0.8em; font-weight: bold; }
.badge-high { background: #f85149; color: white; }
.badge-medium { background: #d29922; color: black; }
.badge-low { background: #238636; color: white; }
.badge-missing { background: #f85149; }
.badge-extra { background: #d29922; }
.badge-modified { background: #1f6feb; color: white; }
.no-drift { text-align: center; padding: 40px; color: #3fb950; }
code { background: #21262d; padding: 2px 6px; border-radius: 4px; }
</style>
"""
# Header
html_parts = [
"<!DOCTYPE html>",
"<html><head>",
"<meta charset='utf-8'>",
"<title>Cloudflare Drift Report</title>",
css,
"</head><body>",
"<h1>Cloudflare Drift Report</h1>",
f"<p>Generated: {timestamp}</p>",
]
# Summary cards
html_parts.append("<div class='summary'>")
html_parts.append(f"""
<div class='card'>
<h3>Total Diffs</h3>
<div class='stat {"ok" if summary.get("total_diffs") == 0 else "high"}'>{summary.get("total_diffs", 0)}</div>
</div>
""")
html_parts.append(f"""
<div class='card'>
<h3>High Severity</h3>
<div class='stat high'>{summary.get("high_severity", 0)}</div>
</div>
""")
html_parts.append(f"""
<div class='card'>
<h3>Medium Severity</h3>
<div class='stat medium'>{summary.get("medium_severity", 0)}</div>
</div>
""")
html_parts.append(f"""
<div class='card'>
<h3>Low Severity</h3>
<div class='stat low'>{summary.get("low_severity", 0)}</div>
</div>
""")
html_parts.append("</div>")
# Diffs table
if diffs:
html_parts.append("<h2>Drift Details</h2>")
html_parts.append("<table>")
html_parts.append("""
<tr>
<th>Type</th>
<th>Severity</th>
<th>Record</th>
<th>Detail</th>
</tr>
""")
for diff in diffs:
dtype = diff.get("type", "unknown")
severity = diff.get("severity", "low")
record = f"{diff.get('record_type', '')} {diff.get('record_name', '')}"
detail = html.escape(str(diff.get("detail", "")))
html_parts.append(f"""
<tr>
<td><span class='badge badge-{dtype}'>{dtype}</span></td>
<td><span class='badge badge-{severity}'>{severity.upper()}</span></td>
<td><code>{html.escape(record)}</code></td>
<td>{detail}</td>
</tr>
""")
html_parts.append("</table>")
else:
html_parts.append("<div class='no-drift'>No drift detected. Configuration is in sync.</div>")
html_parts.append("</body></html>")
return "\n".join(html_parts)
def main():
parser = argparse.ArgumentParser(description="Drift Visualizer")
parser.add_argument("--snapshot", help="Path to state snapshot JSON")
parser.add_argument("--manifest", help="Path to DNS manifest JSON/YAML")
parser.add_argument("--output-dir", default=OUTPUT_DIR, help="Output directory")
parser.add_argument("--format", choices=["json", "html", "both"], default="both",
help="Output format")
args = parser.parse_args()
# Load files
snapshot = None
manifest = None
if args.snapshot:
with open(args.snapshot) as f:
snapshot = json.load(f)
if args.manifest:
with open(args.manifest) as f:
manifest = json.load(f)
if not snapshot and not manifest:
print("Error: At least one of --snapshot or --manifest required")
return 1
# Ensure output directory
os.makedirs(args.output_dir, exist_ok=True)
# Run analysis
analyzer = DriftAnalyzer()
analysis = analyzer.analyze(snapshot=snapshot, manifest=manifest)
# Output
timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H-%M-%SZ")
if args.format in ("json", "both"):
json_path = os.path.join(args.output_dir, f"drift-report-{timestamp}.json")
with open(json_path, "w") as f:
json.dump(analysis, f, indent=2)
print(f"JSON report: {json_path}")
if args.format in ("html", "both"):
html_content = generate_html_report(analysis)
html_path = os.path.join(args.output_dir, f"drift-report-{timestamp}.html")
with open(html_path, "w") as f:
f.write(html_content)
print(f"HTML report: {html_path}")
# Summary
summary = analysis.get("summary", {})
print(f"\nDrift Summary:")
print(f" Total diffs: {summary.get('total_diffs', 0)}")
print(f" High: {summary.get('high_severity', 0)}")
print(f" Medium: {summary.get('medium_severity', 0)}")
print(f" Low: {summary.get('low_severity', 0)}")
return 0 if summary.get("total_diffs", 0) == 0 else 1
if __name__ == "__main__":
exit(main())

View File

@@ -0,0 +1,351 @@
# Cloudflare Mesh Observatory - Escalation Matrix
# Phase 5B - Alerts & Escalation
#
# This matrix defines who gets notified for what, and when to escalate.
# Used by Alertmanager routing and for human reference.
---
version: "1.0"
last_updated: "2024-01-01"
# ==============================================================================
# SEVERITY DEFINITIONS
# ==============================================================================
severity_definitions:
critical:
description: "Service down, security incident, or data integrity issue"
response_time: "15 minutes"
notification_channels: ["pagerduty", "slack-critical", "phone"]
escalation_after: "30 minutes"
warning:
description: "Degraded service, policy violation, or impending issue"
response_time: "1 hour"
notification_channels: ["slack"]
escalation_after: "4 hours"
info:
description: "Informational, audit, or metric threshold"
response_time: "Next business day"
notification_channels: ["email-digest"]
escalation_after: null
# ==============================================================================
# ESCALATION CHAINS
# ==============================================================================
escalation_chains:
infrastructure:
name: "Infrastructure Team"
stages:
- stage: 1
delay: "0m"
contacts: ["infra-oncall"]
channels: ["pagerduty", "slack"]
- stage: 2
delay: "30m"
contacts: ["infra-lead"]
channels: ["pagerduty", "phone"]
- stage: 3
delay: "1h"
contacts: ["platform-director"]
channels: ["phone"]
security:
name: "Security Team"
stages:
- stage: 1
delay: "0m"
contacts: ["security-oncall"]
channels: ["pagerduty", "slack-security"]
- stage: 2
delay: "15m"
contacts: ["security-lead", "ciso"]
channels: ["pagerduty", "phone"]
platform:
name: "Platform Team"
stages:
- stage: 1
delay: "0m"
contacts: ["platform-oncall"]
channels: ["slack"]
- stage: 2
delay: "1h"
contacts: ["platform-lead"]
channels: ["pagerduty"]
# ==============================================================================
# COMPONENT -> ESCALATION CHAIN MAPPING
# ==============================================================================
component_ownership:
tunnel:
primary_chain: infrastructure
backup_chain: platform
slack_channel: "#cloudflare-tunnels"
playbooks:
- "TUNNEL-ROTATION-PROTOCOL.md"
dns:
primary_chain: infrastructure
backup_chain: security # DNS can be security-related
slack_channel: "#cloudflare-dns"
playbooks:
- "DNS-COMPROMISE-PLAYBOOK.md"
waf:
primary_chain: security
backup_chain: infrastructure
slack_channel: "#cloudflare-waf"
playbooks:
- "waf_incident_playbook.md"
invariant:
primary_chain: security
backup_chain: platform
slack_channel: "#cloudflare-security"
playbooks:
- "SECURITY-INVARIANTS.md"
proofchain:
primary_chain: platform
backup_chain: security
slack_channel: "#cloudflare-proofchain"
playbooks:
- "proofchain-incident.md"
# ==============================================================================
# ALERT -> RESPONSE MAPPING
# ==============================================================================
alert_responses:
# TUNNEL ALERTS
TunnelDown:
severity: critical
escalation_chain: infrastructure
immediate_actions:
- "Check cloudflared service status"
- "Verify network connectivity to origin"
- "Check Cloudflare status page"
playbook: "TUNNEL-ROTATION-PROTOCOL.md"
auto_remediation: false # Manual intervention required
AllTunnelsDown:
severity: critical
escalation_chain: infrastructure
immediate_actions:
- "DECLARE INCIDENT"
- "Check all cloudflared instances"
- "Verify DNS resolution"
- "Check for Cloudflare outage"
playbook: "TUNNEL-ROTATION-PROTOCOL.md"
auto_remediation: false
TunnelRotationDue:
severity: warning
escalation_chain: platform
immediate_actions:
- "Schedule maintenance window"
- "Prepare new tunnel credentials"
playbook: "TUNNEL-ROTATION-PROTOCOL.md"
auto_remediation: true # Can be auto-scheduled
# DNS ALERTS
DNSHijackDetected:
severity: critical
escalation_chain: security
immediate_actions:
- "DECLARE SECURITY INCIDENT"
- "Verify DNS resolution from multiple locations"
- "Check Cloudflare audit logs"
- "Preserve evidence"
playbook: "DNS-COMPROMISE-PLAYBOOK.md"
auto_remediation: false # NEVER auto-remediate security incidents
DNSDriftDetected:
severity: warning
escalation_chain: infrastructure
immediate_actions:
- "Run state reconciler"
- "Identify changed records"
- "Verify authorization"
playbook: "DNS-COMPROMISE-PLAYBOOK.md"
auto_remediation: true # Can auto-reconcile if authorized
# WAF ALERTS
WAFMassiveAttack:
severity: critical
escalation_chain: security
immediate_actions:
- "Verify attack is real (not false positive)"
- "Consider Under Attack Mode"
- "Check rate limiting"
- "Document attack patterns"
playbook: "waf_incident_playbook.md"
auto_remediation: false
WAFRuleBypass:
severity: critical
escalation_chain: security
immediate_actions:
- "Analyze bypassed requests"
- "Tighten rule immediately"
- "Check for related vulnerabilities"
playbook: "waf_incident_playbook.md"
auto_remediation: false
WAFDisabled:
severity: critical
escalation_chain: security
immediate_actions:
- "IMMEDIATELY investigate why WAF is disabled"
- "Re-enable unless documented exception"
- "Review audit logs"
playbook: "waf_incident_playbook.md"
auto_remediation: true # Auto-enable WAF
# INVARIANT ALERTS
SSLModeDowngraded:
severity: critical
escalation_chain: security
immediate_actions:
- "Restore Full (Strict) SSL mode"
- "Investigate who made the change"
- "Review audit logs"
playbook: null
auto_remediation: true # Auto-restore SSL mode
AccessPolicyViolation:
severity: critical
escalation_chain: security
immediate_actions:
- "Review access attempt"
- "Block if malicious"
- "Notify affected user if legitimate"
playbook: null
auto_remediation: false
# PROOFCHAIN ALERTS
ProofchainIntegrityFailure:
severity: critical
escalation_chain: security
immediate_actions:
- "HALT all new receipt generation"
- "Preserve current state"
- "Identify last known-good checkpoint"
- "Do NOT attempt auto-recovery"
playbook: null
auto_remediation: false # NEVER auto-remediate integrity failures
ReceiptHashMismatch:
severity: critical
escalation_chain: security
immediate_actions:
- "Identify affected receipt"
- "Compare against backup"
- "Preserve for forensics"
playbook: null
auto_remediation: false
# ==============================================================================
# CONTACTS
# ==============================================================================
contacts:
infra-oncall:
name: "Infrastructure On-Call"
pagerduty_service: "PXXXXXX"
slack_handle: "@infra-oncall"
schedule: "follow-the-sun"
infra-lead:
name: "Infrastructure Team Lead"
pagerduty_user: "UXXXXXX"
phone: "+1-XXX-XXX-XXXX"
email: "infra-lead@company.com"
security-oncall:
name: "Security On-Call"
pagerduty_service: "PXXXXXX"
slack_handle: "@security-oncall"
schedule: "24x7"
security-lead:
name: "Security Team Lead"
pagerduty_user: "UXXXXXX"
phone: "+1-XXX-XXX-XXXX"
email: "security-lead@company.com"
ciso:
name: "Chief Information Security Officer"
phone: "+1-XXX-XXX-XXXX"
email: "ciso@company.com"
platform-oncall:
name: "Platform On-Call"
pagerduty_service: "PXXXXXX"
slack_handle: "@platform-oncall"
platform-lead:
name: "Platform Team Lead"
pagerduty_user: "UXXXXXX"
email: "platform-lead@company.com"
platform-director:
name: "Platform Director"
phone: "+1-XXX-XXX-XXXX"
email: "platform-director@company.com"
# ==============================================================================
# NOTIFICATION CHANNELS
# ==============================================================================
channels:
slack:
default: "#cloudflare-alerts"
critical: "#cloudflare-critical"
tunnels: "#cloudflare-tunnels"
dns: "#cloudflare-dns"
waf: "#cloudflare-waf"
security: "#cloudflare-security"
proofchain: "#cloudflare-proofchain"
pagerduty:
integration_key: "${PAGERDUTY_SERVICE_KEY}"
escalation_policy: "cloudflare-infrastructure"
email:
daily_digest: "cloudflare-team@company.com"
weekly_report: "platform-leadership@company.com"
# ==============================================================================
# AUTO-REMEDIATION POLICIES
# ==============================================================================
auto_remediation:
enabled: true
require_confirmation_for:
- "critical"
- "security_incident"
never_auto_remediate:
- "ProofchainIntegrityFailure"
- "ReceiptHashMismatch"
- "DNSHijackDetected"
- "WAFRuleBypass"
max_auto_remediations_per_hour: 5
cooldown_period: "10m"
# ==============================================================================
# MAINTENANCE WINDOWS
# ==============================================================================
maintenance_windows:
weekly_rotation:
schedule: "0 3 * * SUN" # 3 AM Sunday
duration: "2h"
suppress_alerts:
- "TunnelDown"
- "TunnelDegraded"
notify_channel: "#cloudflare-alerts"
monthly_patch:
schedule: "0 2 15 * *" # 2 AM on the 15th
duration: "4h"
suppress_alerts:
- "TunnelDown"
- "CloudflaredOutdated"
notify_channel: "#cloudflare-alerts"

View File

@@ -0,0 +1,355 @@
#!/usr/bin/env python3
"""
Cloudflare Metrics Exporter for Prometheus
Exports Cloudflare state and invariant status as Prometheus metrics.
Usage:
python3 metrics-exporter.py --port 9100
Environment Variables:
CLOUDFLARE_API_TOKEN - API token
CLOUDFLARE_ZONE_ID - Zone ID
CLOUDFLARE_ACCOUNT_ID - Account ID
SNAPSHOT_DIR - Directory containing state snapshots
ANOMALY_DIR - Directory containing invariant reports
"""
import argparse
import glob
import json
import os
import time
from datetime import datetime, timezone
from http.server import HTTPServer, BaseHTTPRequestHandler
from typing import Any, Dict, List, Optional
import requests
# Configuration
CF_API_BASE = "https://api.cloudflare.com/client/v4"
DEFAULT_PORT = 9100
SCRAPE_INTERVAL = 60 # seconds
class CloudflareMetricsCollector:
"""Collects Cloudflare metrics for Prometheus export."""
def __init__(self, api_token: str, zone_id: str, account_id: str,
snapshot_dir: str, anomaly_dir: str):
self.api_token = api_token
self.zone_id = zone_id
self.account_id = account_id
self.snapshot_dir = snapshot_dir
self.anomaly_dir = anomaly_dir
self.session = requests.Session()
self.session.headers.update({
"Authorization": f"Bearer {api_token}",
"Content-Type": "application/json"
})
self.metrics: Dict[str, Any] = {}
self.last_scrape = 0
def _cf_request(self, endpoint: str) -> Dict[str, Any]:
"""Make Cloudflare API request."""
url = f"{CF_API_BASE}{endpoint}"
response = self.session.get(url)
response.raise_for_status()
return response.json()
def _get_latest_file(self, pattern: str) -> Optional[str]:
"""Get most recent file matching pattern."""
files = glob.glob(pattern)
if not files:
return None
return max(files, key=os.path.getmtime)
def collect_dns_metrics(self):
"""Collect DNS record metrics."""
try:
data = self._cf_request(f"/zones/{self.zone_id}/dns_records?per_page=500")
records = data.get("result", [])
# Count by type
type_counts = {}
proxied_count = 0
unproxied_count = 0
for r in records:
rtype = r.get("type", "UNKNOWN")
type_counts[rtype] = type_counts.get(rtype, 0) + 1
if r.get("proxied"):
proxied_count += 1
else:
unproxied_count += 1
self.metrics["dns_records_total"] = len(records)
self.metrics["dns_records_proxied"] = proxied_count
self.metrics["dns_records_unproxied"] = unproxied_count
for rtype, count in type_counts.items():
self.metrics[f"dns_records_by_type{{type=\"{rtype}\"}}"] = count
except Exception as e:
self.metrics["dns_scrape_errors_total"] = self.metrics.get("dns_scrape_errors_total", 0) + 1
def collect_dnssec_metrics(self):
"""Collect DNSSEC status."""
try:
data = self._cf_request(f"/zones/{self.zone_id}/dnssec")
result = data.get("result", {})
status = result.get("status", "unknown")
self.metrics["dnssec_enabled"] = 1 if status == "active" else 0
except Exception:
self.metrics["dnssec_enabled"] = -1
def collect_tunnel_metrics(self):
"""Collect tunnel metrics."""
try:
data = self._cf_request(f"/accounts/{self.account_id}/cfd_tunnel")
tunnels = data.get("result", [])
active = 0
healthy = 0
total_connections = 0
for t in tunnels:
if not t.get("deleted_at"):
active += 1
# Check connections
try:
conn_data = self._cf_request(
f"/accounts/{self.account_id}/cfd_tunnel/{t['id']}/connections"
)
conns = conn_data.get("result", [])
if conns:
healthy += 1
total_connections += len(conns)
except Exception:
pass
self.metrics["tunnels_total"] = active
self.metrics["tunnels_healthy"] = healthy
self.metrics["tunnels_unhealthy"] = active - healthy
self.metrics["tunnel_connections_total"] = total_connections
except Exception:
self.metrics["tunnel_scrape_errors_total"] = self.metrics.get("tunnel_scrape_errors_total", 0) + 1
def collect_access_metrics(self):
"""Collect Access app metrics."""
try:
data = self._cf_request(f"/accounts/{self.account_id}/access/apps")
apps = data.get("result", [])
self.metrics["access_apps_total"] = len(apps)
# Count by type
type_counts = {}
for app in apps:
app_type = app.get("type", "unknown")
type_counts[app_type] = type_counts.get(app_type, 0) + 1
for app_type, count in type_counts.items():
self.metrics[f"access_apps_by_type{{type=\"{app_type}\"}}"] = count
except Exception:
self.metrics["access_scrape_errors_total"] = self.metrics.get("access_scrape_errors_total", 0) + 1
def collect_zone_settings_metrics(self):
"""Collect zone security settings."""
try:
data = self._cf_request(f"/zones/{self.zone_id}/settings")
settings = {s["id"]: s["value"] for s in data.get("result", [])}
# TLS settings
ssl = settings.get("ssl", "unknown")
self.metrics["zone_ssl_strict"] = 1 if ssl in ("strict", "full_strict") else 0
min_tls = settings.get("min_tls_version", "unknown")
self.metrics["zone_tls_version_secure"] = 1 if min_tls in ("1.2", "1.3") else 0
# Security features
self.metrics["zone_always_https"] = 1 if settings.get("always_use_https") == "on" else 0
self.metrics["zone_browser_check"] = 1 if settings.get("browser_check") == "on" else 0
except Exception:
pass
def collect_snapshot_metrics(self):
"""Collect metrics from state snapshots."""
latest = self._get_latest_file(os.path.join(self.snapshot_dir, "cloudflare-*.json"))
if not latest:
self.metrics["snapshot_age_seconds"] = -1
return
try:
mtime = os.path.getmtime(latest)
age = time.time() - mtime
self.metrics["snapshot_age_seconds"] = int(age)
with open(latest) as f:
snapshot = json.load(f)
integrity = snapshot.get("integrity", {})
self.metrics["snapshot_merkle_root_set"] = 1 if integrity.get("merkle_root") else 0
except Exception:
self.metrics["snapshot_age_seconds"] = -1
def collect_invariant_metrics(self):
"""Collect metrics from invariant reports."""
latest = self._get_latest_file(os.path.join(self.anomaly_dir, "invariant-report-*.json"))
if not latest:
self.metrics["invariants_total"] = 0
self.metrics["invariants_passed"] = 0
self.metrics["invariants_failed"] = 0
return
try:
with open(latest) as f:
report = json.load(f)
summary = report.get("summary", {})
self.metrics["invariants_total"] = summary.get("total", 0)
self.metrics["invariants_passed"] = summary.get("passed", 0)
self.metrics["invariants_failed"] = summary.get("failed", 0)
self.metrics["invariants_pass_rate"] = summary.get("pass_rate", 0)
# Report age
mtime = os.path.getmtime(latest)
self.metrics["invariant_report_age_seconds"] = int(time.time() - mtime)
except Exception:
pass
def collect_anomaly_metrics(self):
"""Count anomaly receipts."""
anomaly_files = glob.glob(os.path.join(self.anomaly_dir, "anomaly-*.json"))
self.metrics["anomalies_total"] = len(anomaly_files)
# Recent anomalies (last 24h)
recent = 0
day_ago = time.time() - 86400
for f in anomaly_files:
if os.path.getmtime(f) > day_ago:
recent += 1
self.metrics["anomalies_last_24h"] = recent
def collect_all(self):
"""Collect all metrics."""
now = time.time()
if now - self.last_scrape < SCRAPE_INTERVAL:
return # Rate limit
self.last_scrape = now
self.metrics = {"scrape_timestamp": int(now)}
self.collect_dns_metrics()
self.collect_dnssec_metrics()
self.collect_tunnel_metrics()
self.collect_access_metrics()
self.collect_zone_settings_metrics()
self.collect_snapshot_metrics()
self.collect_invariant_metrics()
self.collect_anomaly_metrics()
def format_prometheus(self) -> str:
"""Format metrics as Prometheus exposition format."""
lines = [
"# HELP cloudflare_dns_records_total Total DNS records",
"# TYPE cloudflare_dns_records_total gauge",
"# HELP cloudflare_tunnels_total Total active tunnels",
"# TYPE cloudflare_tunnels_total gauge",
"# HELP cloudflare_tunnels_healthy Healthy tunnels with connections",
"# TYPE cloudflare_tunnels_healthy gauge",
"# HELP cloudflare_invariants_passed Invariants passing",
"# TYPE cloudflare_invariants_passed gauge",
"# HELP cloudflare_invariants_failed Invariants failing",
"# TYPE cloudflare_invariants_failed gauge",
"",
]
for key, value in self.metrics.items():
if isinstance(value, (int, float)):
# Handle labels in key
if "{" in key:
lines.append(f"cloudflare_{key} {value}")
else:
lines.append(f"cloudflare_{key} {value}")
return "\n".join(lines)
class MetricsHandler(BaseHTTPRequestHandler):
"""HTTP handler for Prometheus scrapes."""
collector: CloudflareMetricsCollector = None
def do_GET(self):
if self.path == "/metrics":
self.collector.collect_all()
output = self.collector.format_prometheus()
self.send_response(200)
self.send_header("Content-Type", "text/plain; charset=utf-8")
self.end_headers()
self.wfile.write(output.encode())
elif self.path == "/health":
self.send_response(200)
self.send_header("Content-Type", "text/plain")
self.end_headers()
self.wfile.write(b"OK")
else:
self.send_response(404)
self.end_headers()
def log_message(self, format, *args):
pass # Suppress default logging
def main():
parser = argparse.ArgumentParser(description="Cloudflare Metrics Exporter")
parser.add_argument("--port", type=int, default=DEFAULT_PORT,
help=f"Port to listen on (default: {DEFAULT_PORT})")
parser.add_argument("--zone-id", default=os.environ.get("CLOUDFLARE_ZONE_ID"))
parser.add_argument("--account-id", default=os.environ.get("CLOUDFLARE_ACCOUNT_ID"))
parser.add_argument("--snapshot-dir",
default=os.environ.get("SNAPSHOT_DIR", "../snapshots"))
parser.add_argument("--anomaly-dir",
default=os.environ.get("ANOMALY_DIR", "../anomalies"))
args = parser.parse_args()
api_token = os.environ.get("CLOUDFLARE_API_TOKEN")
if not api_token:
print("Error: CLOUDFLARE_API_TOKEN required")
return 1
if not args.zone_id or not args.account_id:
print("Error: Zone ID and Account ID required")
return 1
# Initialize collector
collector = CloudflareMetricsCollector(
api_token, args.zone_id, args.account_id,
args.snapshot_dir, args.anomaly_dir
)
MetricsHandler.collector = collector
# Start server
server = HTTPServer(("0.0.0.0", args.port), MetricsHandler)
print(f"Cloudflare Metrics Exporter listening on :{args.port}")
print(f" /metrics - Prometheus metrics")
print(f" /health - Health check")
try:
server.serve_forever()
except KeyboardInterrupt:
print("\nShutting down...")
server.shutdown()
return 0
if __name__ == "__main__":
exit(main())

View File

@@ -0,0 +1,43 @@
# Prometheus Configuration for Cloudflare Mesh Observatory
# Scrapes metrics from the custom exporter
global:
scrape_interval: 60s
evaluation_interval: 60s
external_labels:
monitor: 'cloudflare-mesh'
# Alerting configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Rule files - Load all alert rules from the alerts directory
rule_files:
- /etc/prometheus/alerts/*.yml
# Scrape configurations
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
metrics_path: /metrics
scheme: http
# Cloudflare metrics exporter
- job_name: 'cloudflare'
static_configs:
- targets: ['metrics-exporter:9100']
metrics_path: /metrics
scheme: http
scrape_interval: 60s
scrape_timeout: 30s
honor_labels: true
# Optional: Node exporter for host metrics
# - job_name: 'node'
# static_configs:
# - targets: ['node-exporter:9100']

View File

@@ -0,0 +1,228 @@
# DNS Alert Rules for Cloudflare Mesh Observatory
# Phase 5B - Alerts & Escalation
groups:
- name: dns_alerts
interval: 60s
rules:
# ============================================
# CRITICAL - DNS Hijack Detection
# ============================================
- alert: DNSHijackDetected
expr: cloudflare_dns_record_mismatch == 1
for: 1m
labels:
severity: critical
component: dns
playbook: dns-compromise
security_incident: "true"
annotations:
summary: "POTENTIAL DNS HIJACK: {{ $labels.record_name }}"
description: |
DNS record {{ $labels.record_name }} ({{ $labels.record_type }}) in zone
{{ $labels.zone }} does not match expected value.
Expected: {{ $labels.expected_value }}
Actual: {{ $labels.actual_value }}
This may indicate DNS hijacking or unauthorized modification.
TREAT AS SECURITY INCIDENT until verified.
impact: "Traffic may be routed to unauthorized destinations"
runbook_url: "https://wiki.internal/playbooks/dns-compromise"
# ============================================
# CRITICAL - Critical DNS Record Missing
# ============================================
- alert: CriticalDNSRecordMissing
expr: cloudflare_dns_critical_record_exists == 0
for: 2m
labels:
severity: critical
component: dns
playbook: dns-compromise
annotations:
summary: "Critical DNS record missing: {{ $labels.record_name }}"
description: |
Critical DNS record {{ $labels.record_name }} ({{ $labels.record_type }})
is missing from zone {{ $labels.zone }}.
This record is marked as critical in the DNS manifest.
impact: "Service reachability may be affected"
runbook_url: "https://wiki.internal/playbooks/dns-compromise"
# ============================================
# WARNING - DNS Drift Detected
# ============================================
- alert: DNSDriftDetected
expr: cloudflare_dns_drift_count > 0
for: 5m
labels:
severity: warning
component: dns
annotations:
summary: "DNS drift detected in zone {{ $labels.zone }}"
description: |
{{ $value }} DNS records in zone {{ $labels.zone }} differ from
the expected baseline configuration.
Run state reconciler to identify specific changes.
runbook_url: "https://wiki.internal/playbooks/dns-compromise"
# ============================================
# WARNING - DNS Record TTL Mismatch
# ============================================
- alert: DNSTTLMismatch
expr: cloudflare_dns_ttl_mismatch == 1
for: 10m
labels:
severity: warning
component: dns
annotations:
summary: "DNS TTL mismatch: {{ $labels.record_name }}"
description: |
DNS record {{ $labels.record_name }} has unexpected TTL.
Expected: {{ $labels.expected_ttl }}s
Actual: {{ $labels.actual_ttl }}s
This may affect caching behavior and failover timing.
# ============================================
# WARNING - DNS Propagation Slow
# ============================================
- alert: DNSPropagationSlow
expr: cloudflare_dns_propagation_time_seconds > 300
for: 5m
labels:
severity: warning
component: dns
annotations:
summary: "Slow DNS propagation for {{ $labels.record_name }}"
description: |
DNS changes for {{ $labels.record_name }} are taking longer than
5 minutes to propagate.
Current propagation time: {{ $value | humanizeDuration }}
# ============================================
# CRITICAL - DNS Propagation Failed
# ============================================
- alert: DNSPropagationFailed
expr: cloudflare_dns_propagation_time_seconds > 900
for: 5m
labels:
severity: critical
component: dns
annotations:
summary: "DNS propagation failed for {{ $labels.record_name }}"
description: |
DNS changes for {{ $labels.record_name }} have not propagated
after 15 minutes. This may indicate a configuration issue.
# ============================================
# WARNING - Unexpected DNS Record
# ============================================
- alert: UnexpectedDNSRecord
expr: cloudflare_dns_unexpected_record == 1
for: 5m
labels:
severity: warning
component: dns
annotations:
summary: "Unexpected DNS record: {{ $labels.record_name }}"
description: |
DNS record {{ $labels.record_name }} ({{ $labels.record_type }}) exists
but is not defined in the DNS manifest.
This may be an unauthorized addition.
# ============================================
# INFO - DNS Record Added
# ============================================
- alert: DNSRecordAdded
expr: increase(cloudflare_dns_records_total[1h]) > 0
for: 0m
labels:
severity: info
component: dns
annotations:
summary: "DNS record added in zone {{ $labels.zone }}"
description: |
{{ $value }} new DNS record(s) detected in zone {{ $labels.zone }}
in the last hour. Verify this was authorized.
# ============================================
# INFO - DNS Record Removed
# ============================================
- alert: DNSRecordRemoved
expr: decrease(cloudflare_dns_records_total[1h]) > 0
for: 0m
labels:
severity: info
component: dns
annotations:
summary: "DNS record removed from zone {{ $labels.zone }}"
description: |
{{ $value }} DNS record(s) removed from zone {{ $labels.zone }}
in the last hour. Verify this was authorized.
# ============================================
# WARNING - DNSSEC Disabled
# ============================================
- alert: DNSSECDisabled
expr: cloudflare_zone_dnssec_enabled == 0
for: 5m
labels:
severity: warning
component: dns
annotations:
summary: "DNSSEC disabled for zone {{ $labels.zone }}"
description: |
DNSSEC is not enabled for zone {{ $labels.zone }}.
This reduces protection against DNS spoofing attacks.
# ============================================
# WARNING - Zone Transfer Enabled
# ============================================
- alert: ZoneTransferEnabled
expr: cloudflare_zone_axfr_enabled == 1
for: 5m
labels:
severity: warning
component: dns
annotations:
summary: "Zone transfer (AXFR) enabled for {{ $labels.zone }}"
description: |
Zone transfer is enabled for {{ $labels.zone }}.
This exposes DNS records to potential enumeration.
Disable unless explicitly required.
# ============================================
# CRITICAL - DNS Query Spike
# ============================================
- alert: DNSQuerySpike
expr: |
rate(cloudflare_dns_queries_total[5m])
> 3 * avg_over_time(rate(cloudflare_dns_queries_total[5m])[24h:5m])
for: 5m
labels:
severity: warning
component: dns
annotations:
summary: "DNS query spike for zone {{ $labels.zone }}"
description: |
DNS queries for zone {{ $labels.zone }} are 3x above the 24-hour average.
This may indicate a DDoS attack or misconfigured client.
# ============================================
# WARNING - High DNS Error Rate
# ============================================
- alert: HighDNSErrorRate
expr: |
rate(cloudflare_dns_errors_total[5m])
/ rate(cloudflare_dns_queries_total[5m]) > 0.01
for: 10m
labels:
severity: warning
component: dns
annotations:
summary: "High DNS error rate for zone {{ $labels.zone }}"
description: |
DNS error rate exceeds 1% for zone {{ $labels.zone }}.
Current error rate: {{ $value | humanizePercentage }}

View File

@@ -0,0 +1,284 @@
# Security Invariant Alert Rules for Cloudflare Mesh Observatory
# Phase 5B - Alerts & Escalation
groups:
- name: invariant_alerts
interval: 60s
rules:
# ============================================
# CRITICAL - SSL Mode Downgrade
# ============================================
- alert: SSLModeDowngraded
expr: cloudflare_zone_ssl_mode != 1 # 1 = Full (Strict)
for: 2m
labels:
severity: critical
component: invariant
invariant_name: ssl_strict_mode
category: encryption
frameworks: "SOC2,PCI-DSS,ISO27001"
annotations:
summary: "SSL mode is not Full (Strict) for {{ $labels.zone }}"
description: |
Zone {{ $labels.zone }} SSL mode has been changed from Full (Strict).
Current mode: {{ $labels.ssl_mode }}
This weakens TLS security and may allow MITM attacks.
This is a compliance violation for multiple frameworks.
impact: "Reduced TLS security, potential MITM vulnerability"
runbook_url: "https://wiki.internal/invariants/ssl-mode"
# ============================================
# CRITICAL - Always Use HTTPS Disabled
# ============================================
- alert: HTTPSNotEnforced
expr: cloudflare_zone_always_use_https == 0
for: 2m
labels:
severity: critical
component: invariant
invariant_name: always_use_https
category: encryption
frameworks: "SOC2,PCI-DSS,HIPAA"
annotations:
summary: "Always Use HTTPS disabled for {{ $labels.zone }}"
description: |
Zone {{ $labels.zone }} allows HTTP traffic.
This may expose sensitive data in transit.
impact: "Data transmitted over unencrypted connections"
runbook_url: "https://wiki.internal/invariants/https-enforcement"
# ============================================
# CRITICAL - TLS Version Below Minimum
# ============================================
- alert: TLSVersionTooLow
expr: cloudflare_zone_min_tls_version < 1.2
for: 2m
labels:
severity: critical
component: invariant
invariant_name: min_tls_version
category: encryption
frameworks: "PCI-DSS,NIST"
annotations:
summary: "Minimum TLS version below 1.2 for {{ $labels.zone }}"
description: |
Zone {{ $labels.zone }} allows TLS versions below 1.2.
Current minimum: TLS {{ $labels.min_tls }}
TLS 1.0 and 1.1 have known vulnerabilities.
PCI-DSS requires TLS 1.2 minimum.
impact: "Vulnerable TLS versions allowed"
runbook_url: "https://wiki.internal/invariants/tls-version"
# ============================================
# WARNING - HSTS Not Enabled
# ============================================
- alert: HSTSNotEnabled
expr: cloudflare_zone_hsts_enabled == 0
for: 5m
labels:
severity: warning
component: invariant
invariant_name: hsts_enabled
category: encryption
frameworks: "SOC2,OWASP"
annotations:
summary: "HSTS not enabled for {{ $labels.zone }}"
description: |
HTTP Strict Transport Security is not enabled for {{ $labels.zone }}.
This allows SSL stripping attacks.
runbook_url: "https://wiki.internal/invariants/hsts"
# ============================================
# CRITICAL - Security Headers Missing
# ============================================
- alert: SecurityHeadersMissing
expr: cloudflare_zone_security_headers_score < 0.8
for: 5m
labels:
severity: warning
component: invariant
invariant_name: security_headers
category: headers
frameworks: "OWASP,SOC2"
annotations:
summary: "Security headers score below threshold for {{ $labels.zone }}"
description: |
Zone {{ $labels.zone }} security headers score: {{ $value }}
Expected minimum: 0.8
Missing headers may include: CSP, X-Frame-Options, X-Content-Type-Options
runbook_url: "https://wiki.internal/invariants/security-headers"
# ============================================
# CRITICAL - Origin IP Exposed
# ============================================
- alert: OriginIPExposed
expr: cloudflare_origin_ip_exposed == 1
for: 1m
labels:
severity: critical
component: invariant
invariant_name: origin_hidden
category: network
frameworks: "SOC2"
annotations:
summary: "Origin IP may be exposed for {{ $labels.zone }}"
description: |
DNS or headers may be exposing the origin server IP.
Exposed via: {{ $labels.exposure_method }}
Attackers can bypass Cloudflare protection by attacking origin directly.
impact: "Origin server exposed to direct attacks"
runbook_url: "https://wiki.internal/invariants/origin-protection"
# ============================================
# WARNING - Rate Limiting Not Configured
# ============================================
- alert: RateLimitingMissing
expr: cloudflare_zone_rate_limiting_rules == 0
for: 5m
labels:
severity: warning
component: invariant
invariant_name: rate_limiting
category: protection
frameworks: "SOC2,OWASP"
annotations:
summary: "No rate limiting rules for {{ $labels.zone }}"
description: |
Zone {{ $labels.zone }} has no rate limiting rules configured.
This leaves the zone vulnerable to brute force attacks.
runbook_url: "https://wiki.internal/invariants/rate-limiting"
# ============================================
# CRITICAL - Authenticated Origin Pulls Disabled
# ============================================
- alert: AuthenticatedOriginPullsDisabled
expr: cloudflare_zone_authenticated_origin_pulls == 0
for: 5m
labels:
severity: warning
component: invariant
invariant_name: aop_enabled
category: authentication
frameworks: "SOC2,Zero-Trust"
annotations:
summary: "Authenticated Origin Pulls disabled for {{ $labels.zone }}"
description: |
Authenticated Origin Pulls is not enabled for {{ $labels.zone }}.
Origin cannot verify requests come from Cloudflare.
runbook_url: "https://wiki.internal/invariants/authenticated-origin-pulls"
# ============================================
# WARNING - Bot Protection Disabled
# ============================================
- alert: BotProtectionDisabled
expr: cloudflare_zone_bot_management_enabled == 0
for: 5m
labels:
severity: warning
component: invariant
invariant_name: bot_management
category: protection
annotations:
summary: "Bot management disabled for {{ $labels.zone }}"
description: |
Bot management is not enabled for {{ $labels.zone }}.
Zone is vulnerable to automated attacks and scraping.
runbook_url: "https://wiki.internal/invariants/bot-management"
# ============================================
# CRITICAL - Access Policy Violation
# ============================================
- alert: AccessPolicyViolation
expr: cloudflare_access_policy_violations > 0
for: 1m
labels:
severity: critical
component: invariant
invariant_name: access_policy
category: access_control
frameworks: "SOC2,Zero-Trust,ISO27001"
annotations:
summary: "Access policy violations detected"
description: |
{{ $value }} access policy violations detected.
Policy: {{ $labels.policy_name }}
Review access logs for unauthorized access attempts.
impact: "Potential unauthorized access"
runbook_url: "https://wiki.internal/invariants/access-control"
# ============================================
# WARNING - Browser Integrity Check Disabled
# ============================================
- alert: BrowserIntegrityCheckDisabled
expr: cloudflare_zone_browser_integrity_check == 0
for: 5m
labels:
severity: warning
component: invariant
invariant_name: browser_integrity_check
category: protection
annotations:
summary: "Browser Integrity Check disabled for {{ $labels.zone }}"
description: |
Browser Integrity Check is disabled for {{ $labels.zone }}.
This allows requests with suspicious headers.
# ============================================
# WARNING - Email Obfuscation Disabled
# ============================================
- alert: EmailObfuscationDisabled
expr: cloudflare_zone_email_obfuscation == 0
for: 5m
labels:
severity: info
component: invariant
invariant_name: email_obfuscation
category: privacy
annotations:
summary: "Email obfuscation disabled for {{ $labels.zone }}"
description: |
Email obfuscation is disabled. Email addresses on pages
may be harvested by spam bots.
# ============================================
# CRITICAL - Development Mode Active
# ============================================
- alert: DevelopmentModeActive
expr: cloudflare_zone_development_mode == 1
for: 5m
labels:
severity: warning
component: invariant
invariant_name: development_mode
category: configuration
annotations:
summary: "Development mode active for {{ $labels.zone }}"
description: |
Development mode is active for {{ $labels.zone }}.
This bypasses Cloudflare's cache and should only be used temporarily.
Remember to disable after development is complete.
# ============================================
# CRITICAL - Invariant Check Failure
# ============================================
- alert: InvariantCheckFailed
expr: cloudflare_invariant_check_status == 0
for: 5m
labels:
severity: critical
component: invariant
category: monitoring
annotations:
summary: "Invariant checker is failing"
description: |
The invariant checker script is not running successfully.
Last success: {{ $labels.last_success }}
Error: {{ $labels.error_message }}
Security invariants are not being monitored.
runbook_url: "https://wiki.internal/invariants/checker-troubleshooting"

View File

@@ -0,0 +1,257 @@
# Proofchain Alert Rules for Cloudflare Mesh Observatory
# Phase 5B - Alerts & Escalation
groups:
- name: proofchain_alerts
interval: 60s
rules:
# ============================================
# CRITICAL - Chain Integrity Failure
# ============================================
- alert: ProofchainIntegrityFailure
expr: cloudflare_proofchain_integrity_valid == 0
for: 1m
labels:
severity: critical
component: proofchain
security_incident: "true"
annotations:
summary: "CRITICAL: Proofchain integrity verification FAILED"
description: |
Proofchain {{ $labels.chain_name }} has failed integrity verification.
Last valid hash: {{ $labels.last_valid_hash }}
Expected hash: {{ $labels.expected_hash }}
Computed hash: {{ $labels.computed_hash }}
This indicates potential:
- Ledger tampering
- Receipt corruption
- Chain fork
IMMEDIATELY HALT new receipt generation until resolved.
impact: "Audit trail integrity compromised"
runbook_url: "https://wiki.internal/playbooks/proofchain-incident"
# ============================================
# CRITICAL - Receipt Hash Mismatch
# ============================================
- alert: ReceiptHashMismatch
expr: cloudflare_receipt_hash_valid == 0
for: 1m
labels:
severity: critical
component: proofchain
security_incident: "true"
annotations:
summary: "Receipt hash mismatch detected"
description: |
Receipt {{ $labels.receipt_id }} ({{ $labels.receipt_type }})
hash does not match stored value.
This receipt may have been modified after creation.
Investigate for potential tampering.
runbook_url: "https://wiki.internal/playbooks/proofchain-incident"
# ============================================
# CRITICAL - Anchor Missing
# ============================================
- alert: ProofchainAnchorMissing
expr: cloudflare_proofchain_anchor_age_hours > 24
for: 1h
labels:
severity: critical
component: proofchain
annotations:
summary: "Proofchain anchor overdue"
description: |
No proofchain anchor has been created in {{ $value | humanize }} hours.
Anchors should be created at least daily.
This weakens the audit trail's immutability guarantees.
runbook_url: "https://wiki.internal/playbooks/proofchain-maintenance"
# ============================================
# WARNING - Receipt Generation Failed
# ============================================
- alert: ReceiptGenerationFailed
expr: increase(cloudflare_receipt_generation_failures_total[1h]) > 0
for: 5m
labels:
severity: warning
component: proofchain
annotations:
summary: "Receipt generation failures detected"
description: |
{{ $value }} receipt generation failures in the last hour.
Receipt type: {{ $labels.receipt_type }}
Error: {{ $labels.error_type }}
Operations are proceeding but not being properly logged.
# ============================================
# WARNING - Chain Growth Stalled
# ============================================
- alert: ProofchainGrowthStalled
expr: increase(cloudflare_proofchain_receipts_total[6h]) == 0
for: 6h
labels:
severity: warning
component: proofchain
annotations:
summary: "No new receipts in 6 hours"
description: |
Proofchain {{ $labels.chain_name }} has not received new receipts
in 6 hours. This may indicate:
- Receipt generation failure
- System not operational
- Configuration issue
Verify receipt generation is working.
# ============================================
# WARNING - Chain Drift from Root
# ============================================
- alert: ProofchainDrift
expr: cloudflare_proofchain_drift_receipts > 100
for: 1h
labels:
severity: warning
component: proofchain
annotations:
summary: "Proofchain has {{ $value }} unanchored receipts"
description: |
Chain {{ $labels.chain_name }} has {{ $value }} receipts since
the last anchor. Consider creating a new anchor to checkpoint
the current state.
# ============================================
# INFO - Anchor Created
# ============================================
- alert: ProofchainAnchorCreated
expr: changes(cloudflare_proofchain_anchor_count[1h]) > 0
for: 0m
labels:
severity: info
component: proofchain
annotations:
summary: "New proofchain anchor created"
description: |
A new anchor has been created for chain {{ $labels.chain_name }}.
Anchor hash: {{ $labels.anchor_hash }}
Receipts anchored: {{ $labels.receipts_anchored }}
# ============================================
# WARNING - Frontier Corruption
# ============================================
- alert: ProofchainFrontierCorrupt
expr: cloudflare_proofchain_frontier_valid == 0
for: 1m
labels:
severity: critical
component: proofchain
annotations:
summary: "Proofchain frontier is corrupt"
description: |
The frontier (latest state) of chain {{ $labels.chain_name }}
cannot be verified. The chain may be in an inconsistent state.
Do not append new receipts until this is resolved.
runbook_url: "https://wiki.internal/playbooks/proofchain-incident"
# ============================================
# WARNING - Receipt Backlog
# ============================================
- alert: ReceiptBacklog
expr: cloudflare_receipt_queue_depth > 100
for: 10m
labels:
severity: warning
component: proofchain
annotations:
summary: "Receipt generation backlog"
description: |
{{ $value }} receipts waiting to be written.
This may indicate performance issues or blocked writes.
# ============================================
# CRITICAL - Receipt Queue Overflow
# ============================================
- alert: ReceiptQueueOverflow
expr: cloudflare_receipt_queue_depth > 1000
for: 5m
labels:
severity: critical
component: proofchain
annotations:
summary: "Receipt queue overflow imminent"
description: |
{{ $value }} receipts in queue. Queue may overflow.
Some operational events may not be recorded.
Investigate and resolve immediately.
# ============================================
# WARNING - Receipt Write Latency High
# ============================================
- alert: ReceiptWriteLatencyHigh
expr: cloudflare_receipt_write_duration_seconds > 5
for: 5m
labels:
severity: warning
component: proofchain
annotations:
summary: "High receipt write latency"
description: |
Receipt write operations taking {{ $value | humanize }}s.
This may cause backlog buildup.
Check storage performance.
# ============================================
# CRITICAL - Storage Near Capacity
# ============================================
- alert: ProofchainStorageNearFull
expr: cloudflare_proofchain_storage_used_bytes / cloudflare_proofchain_storage_total_bytes > 0.9
for: 1h
labels:
severity: critical
component: proofchain
annotations:
summary: "Proofchain storage >90% full"
description: |
Proofchain storage is {{ $value | humanizePercentage }} full.
Expand storage or archive old receipts immediately.
# ============================================
# WARNING - Cross-Ledger Verification Failed
# ============================================
- alert: CrossLedgerVerificationFailed
expr: cloudflare_proofchain_cross_verification_valid == 0
for: 5m
labels:
severity: warning
component: proofchain
annotations:
summary: "Cross-ledger verification failed"
description: |
Verification between {{ $labels.chain_a }} and {{ $labels.chain_b }}
has failed. The ledgers may have diverged.
Investigate the root cause before proceeding.
# ============================================
# INFO - Receipt Type Distribution Anomaly
# ============================================
- alert: ReceiptDistributionAnomaly
expr: |
(rate(cloudflare_receipts_by_type_total{type="anomaly"}[1h])
/ rate(cloudflare_receipts_by_type_total[1h])) > 0.5
for: 1h
labels:
severity: info
component: proofchain
annotations:
summary: "High proportion of anomaly receipts"
description: |
More than 50% of recent receipts are anomaly type.
This may indicate systemic issues being logged.
Review recent anomaly receipts for patterns.

View File

@@ -0,0 +1,210 @@
# Tunnel Alert Rules for Cloudflare Mesh Observatory
# Phase 5B - Alerts & Escalation
groups:
- name: tunnel_alerts
interval: 30s
rules:
# ============================================
# CRITICAL - Tunnel Down
# ============================================
- alert: TunnelDown
expr: cloudflare_tunnel_status == 0
for: 2m
labels:
severity: critical
component: tunnel
playbook: tunnel-rotation
annotations:
summary: "Cloudflare Tunnel {{ $labels.tunnel_name }} is DOWN"
description: |
Tunnel {{ $labels.tunnel_name }} (ID: {{ $labels.tunnel_id }}) has been
unreachable for more than 2 minutes. Services behind this tunnel are
likely unreachable.
impact: "Services behind tunnel are unreachable from the internet"
runbook_url: "https://wiki.internal/playbooks/tunnel-rotation"
# ============================================
# CRITICAL - All Tunnels Down
# ============================================
- alert: AllTunnelsDown
expr: count(cloudflare_tunnel_status == 1) == 0
for: 1m
labels:
severity: critical
component: tunnel
playbook: tunnel-rotation
annotations:
summary: "ALL Cloudflare Tunnels are DOWN"
description: |
No healthy tunnels detected. Complete loss of tunnel connectivity.
This is a P0 incident requiring immediate attention.
impact: "Complete loss of external connectivity via tunnels"
runbook_url: "https://wiki.internal/playbooks/tunnel-rotation"
# ============================================
# WARNING - Tunnel Degraded
# ============================================
- alert: TunnelDegraded
expr: cloudflare_tunnel_connections < 2
for: 5m
labels:
severity: warning
component: tunnel
annotations:
summary: "Tunnel {{ $labels.tunnel_name }} has reduced connections"
description: |
Tunnel {{ $labels.tunnel_name }} has fewer than 2 active connections.
This may indicate network issues or cloudflared problems.
runbook_url: "https://wiki.internal/playbooks/tunnel-rotation"
# ============================================
# WARNING - Tunnel Rotation Due
# ============================================
- alert: TunnelRotationDue
expr: (time() - cloudflare_tunnel_created_timestamp) > (86400 * 30)
for: 1h
labels:
severity: warning
component: tunnel
playbook: tunnel-rotation
annotations:
summary: "Tunnel {{ $labels.tunnel_name }} rotation is due"
description: |
Tunnel {{ $labels.tunnel_name }} was created more than 30 days ago.
Per security policy, tunnels should be rotated monthly.
Age: {{ $value | humanizeDuration }}
runbook_url: "https://wiki.internal/playbooks/tunnel-rotation"
# ============================================
# CRITICAL - Tunnel Rotation Overdue
# ============================================
- alert: TunnelRotationOverdue
expr: (time() - cloudflare_tunnel_created_timestamp) > (86400 * 45)
for: 1h
labels:
severity: critical
component: tunnel
playbook: tunnel-rotation
annotations:
summary: "Tunnel {{ $labels.tunnel_name }} rotation is OVERDUE"
description: |
Tunnel {{ $labels.tunnel_name }} is more than 45 days old.
This exceeds the maximum rotation interval and represents a
security policy violation.
Age: {{ $value | humanizeDuration }}
runbook_url: "https://wiki.internal/playbooks/tunnel-rotation"
# ============================================
# WARNING - Tunnel High Latency
# ============================================
- alert: TunnelHighLatency
expr: cloudflare_tunnel_latency_ms > 500
for: 5m
labels:
severity: warning
component: tunnel
annotations:
summary: "High latency on tunnel {{ $labels.tunnel_name }}"
description: |
Tunnel {{ $labels.tunnel_name }} is experiencing latency above 500ms.
Current latency: {{ $value }}ms
This may impact user experience.
# ============================================
# CRITICAL - Tunnel Very High Latency
# ============================================
- alert: TunnelVeryHighLatency
expr: cloudflare_tunnel_latency_ms > 2000
for: 2m
labels:
severity: critical
component: tunnel
annotations:
summary: "Critical latency on tunnel {{ $labels.tunnel_name }}"
description: |
Tunnel {{ $labels.tunnel_name }} latency exceeds 2000ms.
Current latency: {{ $value }}ms
Services may be timing out.
# ============================================
# WARNING - Tunnel Error Rate High
# ============================================
- alert: TunnelHighErrorRate
expr: |
rate(cloudflare_tunnel_errors_total[5m])
/ rate(cloudflare_tunnel_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
component: tunnel
annotations:
summary: "High error rate on tunnel {{ $labels.tunnel_name }}"
description: |
Tunnel {{ $labels.tunnel_name }} error rate exceeds 5%.
Current error rate: {{ $value | humanizePercentage }}
# ============================================
# CRITICAL - Tunnel Error Rate Critical
# ============================================
- alert: TunnelCriticalErrorRate
expr: |
rate(cloudflare_tunnel_errors_total[5m])
/ rate(cloudflare_tunnel_requests_total[5m]) > 0.20
for: 2m
labels:
severity: critical
component: tunnel
annotations:
summary: "Critical error rate on tunnel {{ $labels.tunnel_name }}"
description: |
Tunnel {{ $labels.tunnel_name }} error rate exceeds 20%.
Current error rate: {{ $value | humanizePercentage }}
This indicates severe connectivity issues.
# ============================================
# INFO - Tunnel Configuration Changed
# ============================================
- alert: TunnelConfigChanged
expr: changes(cloudflare_tunnel_config_hash[1h]) > 0
for: 0m
labels:
severity: info
component: tunnel
annotations:
summary: "Tunnel {{ $labels.tunnel_name }} configuration changed"
description: |
The configuration for tunnel {{ $labels.tunnel_name }} has changed
in the last hour. Verify this was an authorized change.
# ============================================
# WARNING - Cloudflared Version Outdated
# ============================================
- alert: CloudflaredOutdated
expr: cloudflare_cloudflared_version_age_days > 90
for: 24h
labels:
severity: warning
component: tunnel
annotations:
summary: "cloudflared version is outdated"
description: |
The cloudflared binary is more than 90 days old.
Current version age: {{ $value }} days
Consider upgrading to latest version for security patches.
# ============================================
# WARNING - Tunnel Connection Flapping
# ============================================
- alert: TunnelConnectionFlapping
expr: changes(cloudflare_tunnel_status[10m]) > 3
for: 10m
labels:
severity: warning
component: tunnel
annotations:
summary: "Tunnel {{ $labels.tunnel_name }} is flapping"
description: |
Tunnel {{ $labels.tunnel_name }} has changed state {{ $value }} times
in the last 10 minutes. This indicates instability.
Check network connectivity and cloudflared logs.

View File

@@ -0,0 +1,266 @@
# WAF Alert Rules for Cloudflare Mesh Observatory
# Phase 5B - Alerts & Escalation
groups:
- name: waf_alerts
interval: 30s
rules:
# ============================================
# CRITICAL - Massive Attack Detected
# ============================================
- alert: WAFMassiveAttack
expr: |
rate(cloudflare_waf_blocked_requests_total[5m]) > 1000
for: 2m
labels:
severity: critical
component: waf
playbook: waf-incident
annotations:
summary: "Massive attack detected - {{ $value | humanize }} blocks/sec"
description: |
WAF is blocking more than 1000 requests per second.
This indicates a significant attack in progress.
Consider enabling Under Attack Mode if not already active.
impact: "Potential service degradation under attack load"
current_mitigation: "WAF blocking enabled"
runbook_url: "https://wiki.internal/playbooks/waf-incident"
# ============================================
# CRITICAL - WAF Rule Bypass Detected
# ============================================
- alert: WAFRuleBypass
expr: cloudflare_waf_bypass_detected == 1
for: 1m
labels:
severity: critical
component: waf
playbook: waf-incident
security_incident: "true"
annotations:
summary: "WAF rule bypass detected for rule {{ $labels.rule_id }}"
description: |
Malicious traffic matching known attack patterns has bypassed
WAF rule {{ $labels.rule_id }}.
Attack type: {{ $labels.attack_type }}
Bypassed requests: {{ $labels.bypass_count }}
Review and tighten rule immediately.
runbook_url: "https://wiki.internal/playbooks/waf-incident"
# ============================================
# WARNING - Attack Spike
# ============================================
- alert: WAFAttackSpike
expr: |
rate(cloudflare_waf_blocked_requests_total[5m])
> 5 * avg_over_time(rate(cloudflare_waf_blocked_requests_total[5m])[24h:5m])
for: 5m
labels:
severity: warning
component: waf
annotations:
summary: "WAF block rate 5x above normal"
description: |
WAF is blocking significantly more requests than the 24-hour average.
Current rate: {{ $value | humanize }}/s
This may indicate an attack or new attack pattern.
# ============================================
# WARNING - SQL Injection Attempts
# ============================================
- alert: WAFSQLiAttack
expr: rate(cloudflare_waf_sqli_blocks_total[5m]) > 10
for: 2m
labels:
severity: warning
component: waf
attack_type: sqli
annotations:
summary: "SQL injection attack detected"
description: |
WAF is blocking SQL injection attempts at {{ $value | humanize }}/s.
Source IPs may need to be blocked at firewall level.
# ============================================
# WARNING - XSS Attempts
# ============================================
- alert: WAFXSSAttack
expr: rate(cloudflare_waf_xss_blocks_total[5m]) > 10
for: 2m
labels:
severity: warning
component: waf
attack_type: xss
annotations:
summary: "XSS attack detected"
description: |
WAF is blocking cross-site scripting attempts at {{ $value | humanize }}/s.
Review application input validation.
# ============================================
# WARNING - Bot Attack
# ============================================
- alert: WAFBotAttack
expr: rate(cloudflare_waf_bot_blocks_total[5m]) > 100
for: 5m
labels:
severity: warning
component: waf
attack_type: bot
annotations:
summary: "High bot traffic detected"
description: |
WAF is blocking bot traffic at {{ $value | humanize }}/s.
Consider enabling Bot Fight Mode or stricter challenges.
# ============================================
# CRITICAL - Rate Limit Exhaustion
# ============================================
- alert: WAFRateLimitExhausted
expr: cloudflare_waf_rate_limit_triggered == 1
for: 1m
labels:
severity: critical
component: waf
annotations:
summary: "Rate limit triggered for {{ $labels.rule_name }}"
description: |
Rate limiting rule {{ $labels.rule_name }} has been triggered.
Source: {{ $labels.source_ip }}
Requests blocked: {{ $labels.blocked_count }}
Legitimate users may be affected.
# ============================================
# WARNING - WAF Rule Disabled
# ============================================
- alert: WAFRuleDisabled
expr: cloudflare_waf_rule_enabled == 0
for: 5m
labels:
severity: warning
component: waf
annotations:
summary: "WAF rule {{ $labels.rule_id }} is disabled"
description: |
WAF rule {{ $labels.rule_id }} ({{ $labels.rule_name }}) is currently disabled.
Verify this is intentional and not a misconfiguration.
# ============================================
# WARNING - WAF Mode Changed
# ============================================
- alert: WAFModeChanged
expr: changes(cloudflare_waf_mode[1h]) > 0
for: 0m
labels:
severity: warning
component: waf
annotations:
summary: "WAF mode changed for zone {{ $labels.zone }}"
description: |
WAF operation mode has changed in the last hour.
New mode: {{ $labels.mode }}
Verify this was an authorized change.
# ============================================
# INFO - Under Attack Mode Active
# ============================================
- alert: UnderAttackModeActive
expr: cloudflare_zone_under_attack == 1
for: 0m
labels:
severity: info
component: waf
annotations:
summary: "Under Attack Mode is ACTIVE for {{ $labels.zone }}"
description: |
Under Attack Mode is currently enabled for zone {{ $labels.zone }}.
This adds a JavaScript challenge to all visitors.
Remember to disable when attack subsides.
# ============================================
# WARNING - Under Attack Mode Extended
# ============================================
- alert: UnderAttackModeExtended
expr: cloudflare_zone_under_attack == 1
for: 2h
labels:
severity: warning
component: waf
annotations:
summary: "Under Attack Mode active for 2+ hours"
description: |
Under Attack Mode has been active for {{ $labels.zone }} for more
than 2 hours. Verify it's still needed as it impacts user experience.
# ============================================
# CRITICAL - WAF Completely Disabled
# ============================================
- alert: WAFDisabled
expr: cloudflare_waf_enabled == 0
for: 5m
labels:
severity: critical
component: waf
annotations:
summary: "WAF is DISABLED for zone {{ $labels.zone }}"
description: |
The Web Application Firewall is completely disabled for {{ $labels.zone }}.
This leaves the zone unprotected against application-layer attacks.
Enable immediately unless there's a documented exception.
# ============================================
# WARNING - Low WAF Efficacy
# ============================================
- alert: WAFLowEfficacy
expr: |
cloudflare_waf_blocked_requests_total
/ cloudflare_waf_analyzed_requests_total < 0.001
for: 1h
labels:
severity: info
component: waf
annotations:
summary: "Low WAF block rate for {{ $labels.zone }}"
description: |
WAF is blocking very few requests (< 0.1%).
This might indicate rules are too permissive or
the zone is not receiving attack traffic.
# ============================================
# WARNING - Firewall Rule Missing
# ============================================
- alert: FirewallRuleMissing
expr: cloudflare_firewall_critical_rule_exists == 0
for: 5m
labels:
severity: warning
component: waf
annotations:
summary: "Critical firewall rule missing: {{ $labels.rule_name }}"
description: |
Expected firewall rule {{ $labels.rule_name }} is not configured.
This rule is marked as critical in the WAF baseline.
# ============================================
# WARNING - High False Positive Rate
# ============================================
- alert: WAFHighFalsePositives
expr: |
rate(cloudflare_waf_false_positives_total[1h])
/ rate(cloudflare_waf_blocked_requests_total[1h]) > 0.1
for: 1h
labels:
severity: warning
component: waf
annotations:
summary: "High WAF false positive rate"
description: |
WAF false positive rate exceeds 10%.
Current rate: {{ $value | humanizePercentage }}
Review and tune rules to reduce legitimate traffic blocking.