Initial commit: Cloudflare infrastructure with WAF Intelligence
- Complete Cloudflare Terraform configuration (DNS, WAF, tunnels, access) - WAF Intelligence MCP server with threat analysis and ML classification - GitOps automation with PR workflows and drift detection - Observatory monitoring stack with Prometheus/Grafana - IDE operator rules for governed development - Security playbooks and compliance frameworks - Autonomous remediation and state reconciliation
This commit is contained in:
171
observatory/README.md
Normal file
171
observatory/README.md
Normal file
@@ -0,0 +1,171 @@
|
||||
# Mesh Observatory
|
||||
|
||||
Prometheus + Grafana monitoring stack for Cloudflare infrastructure state.
|
||||
|
||||
## Components
|
||||
|
||||
| Component | Port | Description |
|
||||
|-----------|------|-------------|
|
||||
| Prometheus | 9090 | Metrics collection and storage |
|
||||
| Grafana | 3000 | Visualization dashboards |
|
||||
| Metrics Exporter | 9100 | Custom Cloudflare metrics |
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Configure Environment
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your credentials
|
||||
```
|
||||
|
||||
Required environment variables:
|
||||
```
|
||||
CLOUDFLARE_API_TOKEN=<your-token>
|
||||
CLOUDFLARE_ZONE_ID=<your-zone-id>
|
||||
CLOUDFLARE_ACCOUNT_ID=<your-account-id>
|
||||
GRAFANA_PASSWORD=<secure-password>
|
||||
```
|
||||
|
||||
### 2. Start Stack
|
||||
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### 3. Access Dashboards
|
||||
|
||||
- Grafana: http://localhost:3000 (admin / $GRAFANA_PASSWORD)
|
||||
- Prometheus: http://localhost:9090
|
||||
|
||||
## Dashboards
|
||||
|
||||
| Dashboard | UID | Description |
|
||||
|-----------|-----|-------------|
|
||||
| Cloudflare Mesh Overview | cf-overview | Main command center |
|
||||
| DNS Health | cf-dns | DNS records, DNSSEC, types |
|
||||
| Tunnel Status | cf-tunnel | Tunnel health, connections |
|
||||
| Invariants & Compliance | cf-invariants | Invariant pass/fail, anomalies |
|
||||
| Security Settings | cf-security | SSL, TLS, Access apps |
|
||||
| ProofChain & Anchors | cf-proofchain | Merkle roots, snapshot freshness |
|
||||
|
||||
## Metrics Reference
|
||||
|
||||
### DNS Metrics
|
||||
- `cloudflare_dns_records_total` - Total DNS records
|
||||
- `cloudflare_dns_records_proxied` - Proxied records count
|
||||
- `cloudflare_dns_records_unproxied` - DNS-only records count
|
||||
- `cloudflare_dns_records_by_type{type="A|AAAA|CNAME|..."}` - Records by type
|
||||
- `cloudflare_dnssec_enabled` - DNSSEC status (0/1)
|
||||
|
||||
### Tunnel Metrics
|
||||
- `cloudflare_tunnels_total` - Total active tunnels
|
||||
- `cloudflare_tunnels_healthy` - Tunnels with active connections
|
||||
- `cloudflare_tunnels_unhealthy` - Tunnels without connections
|
||||
- `cloudflare_tunnel_connections_total` - Total tunnel connections
|
||||
|
||||
### Zone Settings
|
||||
- `cloudflare_zone_ssl_strict` - SSL mode is strict (0/1)
|
||||
- `cloudflare_zone_tls_version_secure` - TLS 1.2+ enforced (0/1)
|
||||
- `cloudflare_zone_always_https` - HTTPS redirect enabled (0/1)
|
||||
- `cloudflare_zone_browser_check` - Browser integrity check (0/1)
|
||||
|
||||
### Access Metrics
|
||||
- `cloudflare_access_apps_total` - Total Access applications
|
||||
- `cloudflare_access_apps_by_type{type="..."}` - Apps by type
|
||||
|
||||
### Invariant Metrics
|
||||
- `cloudflare_invariants_total` - Total invariant checks
|
||||
- `cloudflare_invariants_passed` - Passing invariants
|
||||
- `cloudflare_invariants_failed` - Failing invariants
|
||||
- `cloudflare_invariants_pass_rate` - Pass percentage
|
||||
- `cloudflare_invariant_report_age_seconds` - Report freshness
|
||||
|
||||
### Snapshot Metrics
|
||||
- `cloudflare_snapshot_age_seconds` - Seconds since last snapshot
|
||||
- `cloudflare_snapshot_merkle_root_set` - Merkle root present (0/1)
|
||||
|
||||
### Anomaly Metrics
|
||||
- `cloudflare_anomalies_total` - Total anomaly receipts
|
||||
- `cloudflare_anomalies_last_24h` - Recent anomalies
|
||||
|
||||
## Drift Visualizer
|
||||
|
||||
Standalone tool for comparing state sources.
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
python3 drift-visualizer.py \
|
||||
--snapshot ../snapshots/cloudflare-latest.json \
|
||||
--manifest ../cloudflare_dns_manifest.md \
|
||||
--output-dir ../reports
|
||||
```
|
||||
|
||||
### Output
|
||||
|
||||
- `drift-report-<timestamp>.json` - Machine-readable diff
|
||||
- `drift-report-<timestamp>.html` - Visual HTML report
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
observatory/
|
||||
├── docker-compose.yml # Stack definition
|
||||
├── Dockerfile.exporter # Metrics exporter container
|
||||
├── prometheus.yml # Prometheus config
|
||||
├── metrics-exporter.py # Custom exporter
|
||||
├── drift-visualizer.py # Drift analysis tool
|
||||
├── datasources/ # Grafana datasource provisioning
|
||||
│ └── prometheus.yml
|
||||
├── dashboards/ # Grafana dashboard provisioning
|
||||
│ ├── dashboards.yml
|
||||
│ ├── cloudflare-overview.json
|
||||
│ ├── dns-health.json
|
||||
│ ├── tunnel-status.json
|
||||
│ ├── invariants.json
|
||||
│ ├── security-settings.json
|
||||
│ └── proofchain.json
|
||||
└── rules/ # Prometheus alerting rules (optional)
|
||||
```
|
||||
|
||||
## Integration with CI/CD
|
||||
|
||||
The metrics exporter reads from:
|
||||
- `../snapshots/` - State snapshots from state-reconciler.py
|
||||
- `../anomalies/` - Anomaly receipts from invariant-checker.py
|
||||
|
||||
Ensure these directories are populated by the GitLab CI pipeline or systemd services.
|
||||
|
||||
## Alerting (Optional)
|
||||
|
||||
Create alerting rules in `rules/alerts.yml`:
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: cloudflare
|
||||
rules:
|
||||
- alert: InvariantFailure
|
||||
expr: cloudflare_invariants_failed > 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Cloudflare invariant check failing"
|
||||
|
||||
- alert: TunnelUnhealthy
|
||||
expr: cloudflare_tunnels_unhealthy > 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Cloudflare tunnel has no connections"
|
||||
|
||||
- alert: SnapshotStale
|
||||
expr: cloudflare_snapshot_age_seconds > 7200
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Cloudflare state snapshot older than 2 hours"
|
||||
```
|
||||
Reference in New Issue
Block a user