chore: initial import
This commit is contained in:
182
docs/ARCHITECTURE.md
Normal file
182
docs/ARCHITECTURE.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# VaultMesh Command Center Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
The VaultMesh Command Center is a minimal control plane for monitoring and managing VaultMesh nodes. It consists of two components:
|
||||
|
||||
1. **Command Center (CC)** - Central Rust/Axum web server
|
||||
2. **Node Agent** - Lightweight daemon running on each VaultMesh node
|
||||
|
||||
## Communication Model
|
||||
|
||||
```
|
||||
┌─────────────────┐ HTTPS (Cloudflare Tunnel) ┌─────────────────┐
|
||||
│ Node Agent │ ─────────────────────────────────▶│ Command Center │
|
||||
│ (ArchVault) │ POST /api/agent/heartbeat │ (Axum) │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
│
|
||||
┌─────────────────┐ HTTPS (Cloudflare Tunnel) │
|
||||
│ Node Agent │ ─────────────────────────────────────────┘
|
||||
│ (DebianVault) │ POST /api/agent/heartbeat
|
||||
└─────────────────┘
|
||||
|
||||
┌─────────────────────────────────┐
|
||||
│ Cloudflare Access │
|
||||
│ (Zero Trust Authentication) │
|
||||
└─────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────┐
|
||||
│ Admin Browser │
|
||||
│ GET / (Dashboard) │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Security Model
|
||||
|
||||
### Outbound-Only from Nodes
|
||||
|
||||
Nodes only make **outbound** connections to the Command Center. No inbound ports are opened on VaultMesh nodes for CC communication.
|
||||
|
||||
### Cloudflare Zero Trust
|
||||
|
||||
- **Cloudflare Tunnel**: CC is exposed via `cloudflared` tunnel, not a public IP
|
||||
- **Cloudflare Access**: Admin dashboard protected by Cloudflare Access policies
|
||||
- **Agent Authentication**: Future: signed JWTs or shared secrets for agent auth
|
||||
|
||||
### Network Flow
|
||||
|
||||
1. Node agent wakes up every N seconds
|
||||
2. Agent collects local health metrics
|
||||
3. Agent POSTs heartbeat JSON to CC via Cloudflare Tunnel
|
||||
4. CC stores heartbeat in memory (future: SQLite)
|
||||
5. Admin views dashboard via Cloudflare Access-protected URL
|
||||
|
||||
## Data Model
|
||||
|
||||
### NodeHeartbeat
|
||||
|
||||
```json
|
||||
{
|
||||
"node_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"hostname": "vault-node-01",
|
||||
"os_profile": "ArchVault",
|
||||
"cloudflare_ok": true,
|
||||
"services_ok": true,
|
||||
"vaultmesh_root": "/var/lib/vaultmesh",
|
||||
"timestamp": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Storage (V1)
|
||||
|
||||
In-memory `HashMap<Uuid, NodeHeartbeat>` wrapped in `Arc<RwLock<>>`.
|
||||
|
||||
Nodes are keyed by `node_id`. Each heartbeat overwrites the previous entry for that node.
|
||||
|
||||
## Component Details
|
||||
|
||||
### Command Center
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|------------|
|
||||
| Web Framework | Axum 0.7 |
|
||||
| Async Runtime | Tokio |
|
||||
| Serialization | Serde + serde_json |
|
||||
| Logging | tracing + tracing-subscriber |
|
||||
| HTML | Server-rendered (HTMX-ready) |
|
||||
|
||||
### Node Agent
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|------------|
|
||||
| HTTP Client | reqwest (rustls-tls) |
|
||||
| Async Runtime | Tokio |
|
||||
| System Checks | systemctl calls |
|
||||
| Config | Environment variables |
|
||||
|
||||
## Deployment
|
||||
|
||||
### Command Center Deployment
|
||||
|
||||
1. Build: `cargo build --release -p vaultmesh-command-center`
|
||||
2. Install binary to `/usr/local/bin/`
|
||||
3. Install systemd unit
|
||||
4. Configure Cloudflare Tunnel to `http://127.0.0.1:8088`
|
||||
5. Configure Cloudflare Access policies
|
||||
|
||||
### Node Agent Deployment
|
||||
|
||||
1. Build: `cargo build --release -p vaultmesh-node-agent`
|
||||
2. Install binary to `/usr/local/bin/`
|
||||
3. Create `/etc/vaultmesh/agent.env` with:
|
||||
- `VAULTMESH_CC_URL=https://cc.your-domain.example`
|
||||
- `VAULTMESH_OS_PROFILE=ArchVault`
|
||||
4. Install systemd unit
|
||||
|
||||
## Version History
|
||||
|
||||
### V0.7.2: Communication Layer (Current)
|
||||
- Unified `EventEnvelope` as canonical message format for comms events.
|
||||
- `POST /api/events` - Ingest endpoint for operators, agents, and bots.
|
||||
- `GET /api/events` - Query endpoint with since/kind/node_id/limit filtering.
|
||||
- New event kinds: `note`, `incident`, `ack`, `tag`, `resolve`.
|
||||
- SSE broadcast of envelope events by their `kind` name.
|
||||
- Durable persistence to `events.jsonl` with replay on startup.
|
||||
- Memory-bounded in-memory store (500 most recent envelopes).
|
||||
|
||||
### V0.7.1: Mission Console
|
||||
- NASA-style 3-panel dashboard at `GET /console`.
|
||||
- Global Mission Bar with fleet KPIs (Total/Healthy/Attention/Critical).
|
||||
- Left panel: Node list with status pills (OK/ATTN/CRIT), live heartbeat glow.
|
||||
- Center panel: Selected node telemetry + per-node event timeline.
|
||||
- Right panel: Attention summary, scan findings, global event feed.
|
||||
- Full SSE wiring for real-time DOM updates without page refresh.
|
||||
|
||||
### V0.7: SSE Event Bus
|
||||
- Real-time event streaming via `GET /events` (Server-Sent Events).
|
||||
- Named events: `heartbeat`, `scan`, `command`, `attention`.
|
||||
- Broadcast channel distributes events to all connected SSE clients.
|
||||
- Keepalive every 15s to prevent connection timeouts.
|
||||
- JS probe in dashboard for console.log debugging.
|
||||
|
||||
### V0.6.1: Log Tools
|
||||
- CLI subcommands for querying JSONL event logs.
|
||||
- `logs view` with filters: --kind, --node, --since, --min-severity, --limit.
|
||||
- `logs tail` for real-time log following.
|
||||
- `logs stats` for per-node event statistics.
|
||||
|
||||
### V0.6: Append-Only Persistence
|
||||
- Event logging to JSONL files (heartbeats, scans, commands).
|
||||
- State replay on startup from `$VAULTMESH_LOG_DIR`.
|
||||
- Foundation for V0.8 Ledger Bridge (Merkle over logs).
|
||||
|
||||
### V0.5: Fleet Orchestrator
|
||||
- Background scheduler for autonomous scans.
|
||||
- Attention model with staleness / drift detection.
|
||||
- Command policy enforcement (per-profile allowlists).
|
||||
|
||||
### V0.4: Commands & Sovereign Scan
|
||||
- Push commands to nodes (Ed25519 signed).
|
||||
- Command receipts with proof chain.
|
||||
- Sovereign scan execution and reporting.
|
||||
|
||||
### V0.3: Signed Commands
|
||||
- Ed25519 key generation and command signing.
|
||||
- Agent signature verification.
|
||||
- Nonce-based replay protection.
|
||||
|
||||
### V0.2: Node Metrics
|
||||
- System metrics in heartbeat (load, memory, disk).
|
||||
- Heartbeat history tracking.
|
||||
|
||||
## Future Extensions
|
||||
|
||||
### V0.7.1: Mission Console
|
||||
- NASA-inspired dashboard layout with node tree sidebar.
|
||||
- Live row updates via SSE (no page refresh).
|
||||
- Attention summary panel with fleet health overview.
|
||||
|
||||
### V0.8: Ledger Bridge
|
||||
- Merkle tree over `sovereign-scans.jsonl`.
|
||||
- `ROOT.txt` + `PROOFCHAIN.json` for VaultMesh proof layer.
|
||||
Reference in New Issue
Block a user