- Complete Cloudflare Terraform configuration (DNS, WAF, tunnels, access) - WAF Intelligence MCP server with threat analysis and ML classification - GitOps automation with PR workflows and drift detection - Observatory monitoring stack with Prometheus/Grafana - IDE operator rules for governed development - Security playbooks and compliance frameworks - Autonomous remediation and state reconciliation
172 lines
5.0 KiB
Markdown
172 lines
5.0 KiB
Markdown
# Mesh Observatory
|
|
|
|
Prometheus + Grafana monitoring stack for Cloudflare infrastructure state.
|
|
|
|
## Components
|
|
|
|
| Component | Port | Description |
|
|
|-----------|------|-------------|
|
|
| Prometheus | 9090 | Metrics collection and storage |
|
|
| Grafana | 3000 | Visualization dashboards |
|
|
| Metrics Exporter | 9100 | Custom Cloudflare metrics |
|
|
|
|
## Quick Start
|
|
|
|
### 1. Configure Environment
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env with your credentials
|
|
```
|
|
|
|
Required environment variables:
|
|
```
|
|
CLOUDFLARE_API_TOKEN=<your-token>
|
|
CLOUDFLARE_ZONE_ID=<your-zone-id>
|
|
CLOUDFLARE_ACCOUNT_ID=<your-account-id>
|
|
GRAFANA_PASSWORD=<secure-password>
|
|
```
|
|
|
|
### 2. Start Stack
|
|
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
### 3. Access Dashboards
|
|
|
|
- Grafana: http://localhost:3000 (admin / $GRAFANA_PASSWORD)
|
|
- Prometheus: http://localhost:9090
|
|
|
|
## Dashboards
|
|
|
|
| Dashboard | UID | Description |
|
|
|-----------|-----|-------------|
|
|
| Cloudflare Mesh Overview | cf-overview | Main command center |
|
|
| DNS Health | cf-dns | DNS records, DNSSEC, types |
|
|
| Tunnel Status | cf-tunnel | Tunnel health, connections |
|
|
| Invariants & Compliance | cf-invariants | Invariant pass/fail, anomalies |
|
|
| Security Settings | cf-security | SSL, TLS, Access apps |
|
|
| ProofChain & Anchors | cf-proofchain | Merkle roots, snapshot freshness |
|
|
|
|
## Metrics Reference
|
|
|
|
### DNS Metrics
|
|
- `cloudflare_dns_records_total` - Total DNS records
|
|
- `cloudflare_dns_records_proxied` - Proxied records count
|
|
- `cloudflare_dns_records_unproxied` - DNS-only records count
|
|
- `cloudflare_dns_records_by_type{type="A|AAAA|CNAME|..."}` - Records by type
|
|
- `cloudflare_dnssec_enabled` - DNSSEC status (0/1)
|
|
|
|
### Tunnel Metrics
|
|
- `cloudflare_tunnels_total` - Total active tunnels
|
|
- `cloudflare_tunnels_healthy` - Tunnels with active connections
|
|
- `cloudflare_tunnels_unhealthy` - Tunnels without connections
|
|
- `cloudflare_tunnel_connections_total` - Total tunnel connections
|
|
|
|
### Zone Settings
|
|
- `cloudflare_zone_ssl_strict` - SSL mode is strict (0/1)
|
|
- `cloudflare_zone_tls_version_secure` - TLS 1.2+ enforced (0/1)
|
|
- `cloudflare_zone_always_https` - HTTPS redirect enabled (0/1)
|
|
- `cloudflare_zone_browser_check` - Browser integrity check (0/1)
|
|
|
|
### Access Metrics
|
|
- `cloudflare_access_apps_total` - Total Access applications
|
|
- `cloudflare_access_apps_by_type{type="..."}` - Apps by type
|
|
|
|
### Invariant Metrics
|
|
- `cloudflare_invariants_total` - Total invariant checks
|
|
- `cloudflare_invariants_passed` - Passing invariants
|
|
- `cloudflare_invariants_failed` - Failing invariants
|
|
- `cloudflare_invariants_pass_rate` - Pass percentage
|
|
- `cloudflare_invariant_report_age_seconds` - Report freshness
|
|
|
|
### Snapshot Metrics
|
|
- `cloudflare_snapshot_age_seconds` - Seconds since last snapshot
|
|
- `cloudflare_snapshot_merkle_root_set` - Merkle root present (0/1)
|
|
|
|
### Anomaly Metrics
|
|
- `cloudflare_anomalies_total` - Total anomaly receipts
|
|
- `cloudflare_anomalies_last_24h` - Recent anomalies
|
|
|
|
## Drift Visualizer
|
|
|
|
Standalone tool for comparing state sources.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
python3 drift-visualizer.py \
|
|
--snapshot ../snapshots/cloudflare-latest.json \
|
|
--manifest ../cloudflare_dns_manifest.md \
|
|
--output-dir ../reports
|
|
```
|
|
|
|
### Output
|
|
|
|
- `drift-report-<timestamp>.json` - Machine-readable diff
|
|
- `drift-report-<timestamp>.html` - Visual HTML report
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
observatory/
|
|
├── docker-compose.yml # Stack definition
|
|
├── Dockerfile.exporter # Metrics exporter container
|
|
├── prometheus.yml # Prometheus config
|
|
├── metrics-exporter.py # Custom exporter
|
|
├── drift-visualizer.py # Drift analysis tool
|
|
├── datasources/ # Grafana datasource provisioning
|
|
│ └── prometheus.yml
|
|
├── dashboards/ # Grafana dashboard provisioning
|
|
│ ├── dashboards.yml
|
|
│ ├── cloudflare-overview.json
|
|
│ ├── dns-health.json
|
|
│ ├── tunnel-status.json
|
|
│ ├── invariants.json
|
|
│ ├── security-settings.json
|
|
│ └── proofchain.json
|
|
└── rules/ # Prometheus alerting rules (optional)
|
|
```
|
|
|
|
## Integration with CI/CD
|
|
|
|
The metrics exporter reads from:
|
|
- `../snapshots/` - State snapshots from state-reconciler.py
|
|
- `../anomalies/` - Anomaly receipts from invariant-checker.py
|
|
|
|
Ensure these directories are populated by the GitLab CI pipeline or systemd services.
|
|
|
|
## Alerting (Optional)
|
|
|
|
Create alerting rules in `rules/alerts.yml`:
|
|
|
|
```yaml
|
|
groups:
|
|
- name: cloudflare
|
|
rules:
|
|
- alert: InvariantFailure
|
|
expr: cloudflare_invariants_failed > 0
|
|
for: 5m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Cloudflare invariant check failing"
|
|
|
|
- alert: TunnelUnhealthy
|
|
expr: cloudflare_tunnels_unhealthy > 0
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Cloudflare tunnel has no connections"
|
|
|
|
- alert: SnapshotStale
|
|
expr: cloudflare_snapshot_age_seconds > 7200
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Cloudflare state snapshot older than 2 hours"
|
|
```
|