Files
vm-cloudflare/observatory/README.md
Vault Sovereign 37a867c485 Initial commit: Cloudflare infrastructure with WAF Intelligence
- Complete Cloudflare Terraform configuration (DNS, WAF, tunnels, access)
- WAF Intelligence MCP server with threat analysis and ML classification
- GitOps automation with PR workflows and drift detection
- Observatory monitoring stack with Prometheus/Grafana
- IDE operator rules for governed development
- Security playbooks and compliance frameworks
- Autonomous remediation and state reconciliation
2025-12-16 18:31:53 +00:00

172 lines
5.0 KiB
Markdown

# Mesh Observatory
Prometheus + Grafana monitoring stack for Cloudflare infrastructure state.
## Components
| Component | Port | Description |
|-----------|------|-------------|
| Prometheus | 9090 | Metrics collection and storage |
| Grafana | 3000 | Visualization dashboards |
| Metrics Exporter | 9100 | Custom Cloudflare metrics |
## Quick Start
### 1. Configure Environment
```bash
cp .env.example .env
# Edit .env with your credentials
```
Required environment variables:
```
CLOUDFLARE_API_TOKEN=<your-token>
CLOUDFLARE_ZONE_ID=<your-zone-id>
CLOUDFLARE_ACCOUNT_ID=<your-account-id>
GRAFANA_PASSWORD=<secure-password>
```
### 2. Start Stack
```bash
docker-compose up -d
```
### 3. Access Dashboards
- Grafana: http://localhost:3000 (admin / $GRAFANA_PASSWORD)
- Prometheus: http://localhost:9090
## Dashboards
| Dashboard | UID | Description |
|-----------|-----|-------------|
| Cloudflare Mesh Overview | cf-overview | Main command center |
| DNS Health | cf-dns | DNS records, DNSSEC, types |
| Tunnel Status | cf-tunnel | Tunnel health, connections |
| Invariants & Compliance | cf-invariants | Invariant pass/fail, anomalies |
| Security Settings | cf-security | SSL, TLS, Access apps |
| ProofChain & Anchors | cf-proofchain | Merkle roots, snapshot freshness |
## Metrics Reference
### DNS Metrics
- `cloudflare_dns_records_total` - Total DNS records
- `cloudflare_dns_records_proxied` - Proxied records count
- `cloudflare_dns_records_unproxied` - DNS-only records count
- `cloudflare_dns_records_by_type{type="A|AAAA|CNAME|..."}` - Records by type
- `cloudflare_dnssec_enabled` - DNSSEC status (0/1)
### Tunnel Metrics
- `cloudflare_tunnels_total` - Total active tunnels
- `cloudflare_tunnels_healthy` - Tunnels with active connections
- `cloudflare_tunnels_unhealthy` - Tunnels without connections
- `cloudflare_tunnel_connections_total` - Total tunnel connections
### Zone Settings
- `cloudflare_zone_ssl_strict` - SSL mode is strict (0/1)
- `cloudflare_zone_tls_version_secure` - TLS 1.2+ enforced (0/1)
- `cloudflare_zone_always_https` - HTTPS redirect enabled (0/1)
- `cloudflare_zone_browser_check` - Browser integrity check (0/1)
### Access Metrics
- `cloudflare_access_apps_total` - Total Access applications
- `cloudflare_access_apps_by_type{type="..."}` - Apps by type
### Invariant Metrics
- `cloudflare_invariants_total` - Total invariant checks
- `cloudflare_invariants_passed` - Passing invariants
- `cloudflare_invariants_failed` - Failing invariants
- `cloudflare_invariants_pass_rate` - Pass percentage
- `cloudflare_invariant_report_age_seconds` - Report freshness
### Snapshot Metrics
- `cloudflare_snapshot_age_seconds` - Seconds since last snapshot
- `cloudflare_snapshot_merkle_root_set` - Merkle root present (0/1)
### Anomaly Metrics
- `cloudflare_anomalies_total` - Total anomaly receipts
- `cloudflare_anomalies_last_24h` - Recent anomalies
## Drift Visualizer
Standalone tool for comparing state sources.
### Usage
```bash
python3 drift-visualizer.py \
--snapshot ../snapshots/cloudflare-latest.json \
--manifest ../cloudflare_dns_manifest.md \
--output-dir ../reports
```
### Output
- `drift-report-<timestamp>.json` - Machine-readable diff
- `drift-report-<timestamp>.html` - Visual HTML report
## Directory Structure
```
observatory/
├── docker-compose.yml # Stack definition
├── Dockerfile.exporter # Metrics exporter container
├── prometheus.yml # Prometheus config
├── metrics-exporter.py # Custom exporter
├── drift-visualizer.py # Drift analysis tool
├── datasources/ # Grafana datasource provisioning
│ └── prometheus.yml
├── dashboards/ # Grafana dashboard provisioning
│ ├── dashboards.yml
│ ├── cloudflare-overview.json
│ ├── dns-health.json
│ ├── tunnel-status.json
│ ├── invariants.json
│ ├── security-settings.json
│ └── proofchain.json
└── rules/ # Prometheus alerting rules (optional)
```
## Integration with CI/CD
The metrics exporter reads from:
- `../snapshots/` - State snapshots from state-reconciler.py
- `../anomalies/` - Anomaly receipts from invariant-checker.py
Ensure these directories are populated by the GitLab CI pipeline or systemd services.
## Alerting (Optional)
Create alerting rules in `rules/alerts.yml`:
```yaml
groups:
- name: cloudflare
rules:
- alert: InvariantFailure
expr: cloudflare_invariants_failed > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Cloudflare invariant check failing"
- alert: TunnelUnhealthy
expr: cloudflare_tunnels_unhealthy > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Cloudflare tunnel has no connections"
- alert: SnapshotStale
expr: cloudflare_snapshot_age_seconds > 7200
for: 10m
labels:
severity: warning
annotations:
summary: "Cloudflare state snapshot older than 2 hours"
```