chore: initial import
This commit is contained in:
182
docs/ARCHITECTURE.md
Normal file
182
docs/ARCHITECTURE.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# VaultMesh Command Center Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
The VaultMesh Command Center is a minimal control plane for monitoring and managing VaultMesh nodes. It consists of two components:
|
||||
|
||||
1. **Command Center (CC)** - Central Rust/Axum web server
|
||||
2. **Node Agent** - Lightweight daemon running on each VaultMesh node
|
||||
|
||||
## Communication Model
|
||||
|
||||
```
|
||||
┌─────────────────┐ HTTPS (Cloudflare Tunnel) ┌─────────────────┐
|
||||
│ Node Agent │ ─────────────────────────────────▶│ Command Center │
|
||||
│ (ArchVault) │ POST /api/agent/heartbeat │ (Axum) │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
│
|
||||
┌─────────────────┐ HTTPS (Cloudflare Tunnel) │
|
||||
│ Node Agent │ ─────────────────────────────────────────┘
|
||||
│ (DebianVault) │ POST /api/agent/heartbeat
|
||||
└─────────────────┘
|
||||
|
||||
┌─────────────────────────────────┐
|
||||
│ Cloudflare Access │
|
||||
│ (Zero Trust Authentication) │
|
||||
└─────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────┐
|
||||
│ Admin Browser │
|
||||
│ GET / (Dashboard) │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Security Model
|
||||
|
||||
### Outbound-Only from Nodes
|
||||
|
||||
Nodes only make **outbound** connections to the Command Center. No inbound ports are opened on VaultMesh nodes for CC communication.
|
||||
|
||||
### Cloudflare Zero Trust
|
||||
|
||||
- **Cloudflare Tunnel**: CC is exposed via `cloudflared` tunnel, not a public IP
|
||||
- **Cloudflare Access**: Admin dashboard protected by Cloudflare Access policies
|
||||
- **Agent Authentication**: Future: signed JWTs or shared secrets for agent auth
|
||||
|
||||
### Network Flow
|
||||
|
||||
1. Node agent wakes up every N seconds
|
||||
2. Agent collects local health metrics
|
||||
3. Agent POSTs heartbeat JSON to CC via Cloudflare Tunnel
|
||||
4. CC stores heartbeat in memory (future: SQLite)
|
||||
5. Admin views dashboard via Cloudflare Access-protected URL
|
||||
|
||||
## Data Model
|
||||
|
||||
### NodeHeartbeat
|
||||
|
||||
```json
|
||||
{
|
||||
"node_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"hostname": "vault-node-01",
|
||||
"os_profile": "ArchVault",
|
||||
"cloudflare_ok": true,
|
||||
"services_ok": true,
|
||||
"vaultmesh_root": "/var/lib/vaultmesh",
|
||||
"timestamp": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Storage (V1)
|
||||
|
||||
In-memory `HashMap<Uuid, NodeHeartbeat>` wrapped in `Arc<RwLock<>>`.
|
||||
|
||||
Nodes are keyed by `node_id`. Each heartbeat overwrites the previous entry for that node.
|
||||
|
||||
## Component Details
|
||||
|
||||
### Command Center
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|------------|
|
||||
| Web Framework | Axum 0.7 |
|
||||
| Async Runtime | Tokio |
|
||||
| Serialization | Serde + serde_json |
|
||||
| Logging | tracing + tracing-subscriber |
|
||||
| HTML | Server-rendered (HTMX-ready) |
|
||||
|
||||
### Node Agent
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|------------|
|
||||
| HTTP Client | reqwest (rustls-tls) |
|
||||
| Async Runtime | Tokio |
|
||||
| System Checks | systemctl calls |
|
||||
| Config | Environment variables |
|
||||
|
||||
## Deployment
|
||||
|
||||
### Command Center Deployment
|
||||
|
||||
1. Build: `cargo build --release -p vaultmesh-command-center`
|
||||
2. Install binary to `/usr/local/bin/`
|
||||
3. Install systemd unit
|
||||
4. Configure Cloudflare Tunnel to `http://127.0.0.1:8088`
|
||||
5. Configure Cloudflare Access policies
|
||||
|
||||
### Node Agent Deployment
|
||||
|
||||
1. Build: `cargo build --release -p vaultmesh-node-agent`
|
||||
2. Install binary to `/usr/local/bin/`
|
||||
3. Create `/etc/vaultmesh/agent.env` with:
|
||||
- `VAULTMESH_CC_URL=https://cc.your-domain.example`
|
||||
- `VAULTMESH_OS_PROFILE=ArchVault`
|
||||
4. Install systemd unit
|
||||
|
||||
## Version History
|
||||
|
||||
### V0.7.2: Communication Layer (Current)
|
||||
- Unified `EventEnvelope` as canonical message format for comms events.
|
||||
- `POST /api/events` - Ingest endpoint for operators, agents, and bots.
|
||||
- `GET /api/events` - Query endpoint with since/kind/node_id/limit filtering.
|
||||
- New event kinds: `note`, `incident`, `ack`, `tag`, `resolve`.
|
||||
- SSE broadcast of envelope events by their `kind` name.
|
||||
- Durable persistence to `events.jsonl` with replay on startup.
|
||||
- Memory-bounded in-memory store (500 most recent envelopes).
|
||||
|
||||
### V0.7.1: Mission Console
|
||||
- NASA-style 3-panel dashboard at `GET /console`.
|
||||
- Global Mission Bar with fleet KPIs (Total/Healthy/Attention/Critical).
|
||||
- Left panel: Node list with status pills (OK/ATTN/CRIT), live heartbeat glow.
|
||||
- Center panel: Selected node telemetry + per-node event timeline.
|
||||
- Right panel: Attention summary, scan findings, global event feed.
|
||||
- Full SSE wiring for real-time DOM updates without page refresh.
|
||||
|
||||
### V0.7: SSE Event Bus
|
||||
- Real-time event streaming via `GET /events` (Server-Sent Events).
|
||||
- Named events: `heartbeat`, `scan`, `command`, `attention`.
|
||||
- Broadcast channel distributes events to all connected SSE clients.
|
||||
- Keepalive every 15s to prevent connection timeouts.
|
||||
- JS probe in dashboard for console.log debugging.
|
||||
|
||||
### V0.6.1: Log Tools
|
||||
- CLI subcommands for querying JSONL event logs.
|
||||
- `logs view` with filters: --kind, --node, --since, --min-severity, --limit.
|
||||
- `logs tail` for real-time log following.
|
||||
- `logs stats` for per-node event statistics.
|
||||
|
||||
### V0.6: Append-Only Persistence
|
||||
- Event logging to JSONL files (heartbeats, scans, commands).
|
||||
- State replay on startup from `$VAULTMESH_LOG_DIR`.
|
||||
- Foundation for V0.8 Ledger Bridge (Merkle over logs).
|
||||
|
||||
### V0.5: Fleet Orchestrator
|
||||
- Background scheduler for autonomous scans.
|
||||
- Attention model with staleness / drift detection.
|
||||
- Command policy enforcement (per-profile allowlists).
|
||||
|
||||
### V0.4: Commands & Sovereign Scan
|
||||
- Push commands to nodes (Ed25519 signed).
|
||||
- Command receipts with proof chain.
|
||||
- Sovereign scan execution and reporting.
|
||||
|
||||
### V0.3: Signed Commands
|
||||
- Ed25519 key generation and command signing.
|
||||
- Agent signature verification.
|
||||
- Nonce-based replay protection.
|
||||
|
||||
### V0.2: Node Metrics
|
||||
- System metrics in heartbeat (load, memory, disk).
|
||||
- Heartbeat history tracking.
|
||||
|
||||
## Future Extensions
|
||||
|
||||
### V0.7.1: Mission Console
|
||||
- NASA-inspired dashboard layout with node tree sidebar.
|
||||
- Live row updates via SSE (no page refresh).
|
||||
- Attention summary panel with fleet health overview.
|
||||
|
||||
### V0.8: Ledger Bridge
|
||||
- Merkle tree over `sovereign-scans.jsonl`.
|
||||
- `ROOT.txt` + `PROOFCHAIN.json` for VaultMesh proof layer.
|
||||
83
docs/EVENT_GENERATION.md
Normal file
83
docs/EVENT_GENERATION.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# VaultMesh Command Center: Event Generation Mechanism
|
||||
|
||||
## Overview
|
||||
|
||||
The VaultMesh Command Center generates events through a sophisticated, multi-layered mechanism designed for real-time monitoring and fleet management.
|
||||
|
||||
## Event Types
|
||||
|
||||
### 1. Heartbeat Events
|
||||
- **Trigger**: Node heartbeat submission
|
||||
- **Payload Includes**:
|
||||
* Timestamp
|
||||
* Node ID
|
||||
* Hostname
|
||||
* OS Profile
|
||||
* Cloudflare Status
|
||||
* Services Status
|
||||
* VaultMesh Root Path
|
||||
* System Metrics (uptime, load averages)
|
||||
|
||||
### 2. Scan Events
|
||||
- **Trigger**: Successful scan result submission
|
||||
- **Payload Includes**:
|
||||
* Timestamp
|
||||
* Node ID
|
||||
* Hostname
|
||||
* OS Profile
|
||||
* Scan Summary (critical/high/medium/low findings)
|
||||
* Real/Mock Findings Flag
|
||||
* Receipt Hash
|
||||
|
||||
### 3. Command Events
|
||||
- **Trigger**: Command execution result
|
||||
- **Payload Includes**:
|
||||
* Timestamp
|
||||
* Node ID
|
||||
* Hostname
|
||||
* OS Profile
|
||||
* Command Name
|
||||
* Execution Status
|
||||
* Exit Code
|
||||
* Nonce (for replay protection)
|
||||
|
||||
## Event Generation Flow
|
||||
|
||||
1. **Data Collection**
|
||||
- Node agents submit heartbeats and scan results
|
||||
- Command results are reported back to the Command Center
|
||||
|
||||
2. **Event Processing**
|
||||
- Raw data is transformed into structured event payloads
|
||||
- Events are published to a broadcast channel
|
||||
- Server-Sent Events (SSE) distribute events to connected clients
|
||||
|
||||
3. **State Management**
|
||||
- Events trigger state updates (node history, last scan, etc.)
|
||||
- Attention status is recomputed based on new events
|
||||
|
||||
## Advanced Features
|
||||
|
||||
- **Automatic Scan Scheduling**
|
||||
- Periodic scans triggered based on node profile and last scan timestamp
|
||||
- Configurable scan intervals
|
||||
|
||||
- **Attention Computation**
|
||||
- Dynamic assessment of node health
|
||||
- Tracks critical findings, heartbeat staleness, service status
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Ed25519 key signing for commands
|
||||
- Nonce-based replay protection
|
||||
- Configurable command policies per node profile
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- In-memory event storage (500 most recent events)
|
||||
- Optional JSONL log persistence
|
||||
- Low-overhead event broadcasting
|
||||
|
||||
## Extensibility
|
||||
|
||||
The event system supports easy addition of new event types and payloads through the `ServerEvent` enum and corresponding payload structures.
|
||||
170
docs/EVENT_PROCESSING.md
Normal file
170
docs/EVENT_PROCESSING.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# VaultMesh Command Center: Event Processing Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
The Command Center implements a sophisticated, multi-layered event processing system designed for robust, real-time fleet management.
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Event Types
|
||||
- `NodeHeartbeat`: Node status updates
|
||||
- `ScanEvent`: Scan findings and results
|
||||
- `CommandEvent`: Command execution outcomes
|
||||
- `EventEnvelope`: Generic communication events
|
||||
|
||||
### 2. Processing Stages
|
||||
|
||||
#### a. Ingestion
|
||||
- Raw event data received via HTTP endpoints
|
||||
- Validated and transformed into structured payloads
|
||||
- Logged to append-only JSONL files for durability
|
||||
|
||||
#### b. State Update
|
||||
- In-memory state updated with latest information
|
||||
- Maintain sliding window of recent events (max 50 per node)
|
||||
- Compute derived states (attention status, last scan)
|
||||
|
||||
#### c. Broadcasting
|
||||
- Events published via Server-Sent Events (SSE)
|
||||
- Broadcast to all connected clients
|
||||
- Low-latency, real-time updates
|
||||
|
||||
## Event Processing Workflow
|
||||
|
||||
### Heartbeat Processing
|
||||
1. Receive heartbeat data
|
||||
2. Log to `heartbeats.jsonl`
|
||||
3. Update node history
|
||||
4. Publish heartbeat event
|
||||
5. Recompute node attention status
|
||||
6. Broadcast attention event
|
||||
|
||||
```rust
|
||||
pub async fn upsert_heartbeat(&self, hb: NodeHeartbeat) {
|
||||
// Log event
|
||||
let event = HeartbeatEvent { ... };
|
||||
self.logs.append_json_line("heartbeats.jsonl", &event);
|
||||
|
||||
// Update in-memory state
|
||||
self.upsert_heartbeat_no_log(hb).await;
|
||||
}
|
||||
```
|
||||
|
||||
### Scan Result Processing
|
||||
1. Receive scan results
|
||||
2. Log to `scans.jsonl`
|
||||
3. Update last scan information
|
||||
4. Publish scan event
|
||||
5. Recompute node attention status
|
||||
6. Broadcast attention event
|
||||
|
||||
```rust
|
||||
pub async fn update_last_scan(&self, node_id: Uuid, scan: LastScan) {
|
||||
// Update scan history
|
||||
let mut scans = self.last_scans.write().await;
|
||||
scans.insert(node_id, scan);
|
||||
}
|
||||
```
|
||||
|
||||
### Command Result Processing
|
||||
1. Receive command result
|
||||
2. Log to `commands.jsonl`
|
||||
3. Store command history
|
||||
4. Publish command event
|
||||
5. Optionally trigger additional actions
|
||||
|
||||
```rust
|
||||
pub async fn record_command_result(&self, result: CommandResult) {
|
||||
// Log command event
|
||||
let event = CommandEvent { ... };
|
||||
self.logs.append_json_line("commands.jsonl", &event);
|
||||
|
||||
// Update command result history
|
||||
self.record_command_result_no_log(result).await;
|
||||
}
|
||||
```
|
||||
|
||||
## Attention Computation
|
||||
|
||||
The system dynamically computes a node's attention status based on multiple factors:
|
||||
|
||||
- Heartbeat staleness
|
||||
- Scan staleness
|
||||
- Scan findings severity
|
||||
- Service status
|
||||
- Cloudflare tunnel status
|
||||
|
||||
```rust
|
||||
pub fn compute_attention(
|
||||
now: OffsetDateTime,
|
||||
hb: &NodeHeartbeat,
|
||||
scan: Option<&LastScan>,
|
||||
cfg: &SchedulerConfig,
|
||||
) -> NodeAttentionStatus {
|
||||
let mut reasons = Vec::new();
|
||||
|
||||
// Check heartbeat age
|
||||
if now - hb.timestamp > cfg.heartbeat_stale {
|
||||
reasons.push("heartbeat_stale");
|
||||
}
|
||||
|
||||
// Check scan status
|
||||
match scan {
|
||||
None => reasons.push("never_scanned"),
|
||||
Some(s) => {
|
||||
if now - s.ts > cfg.scan_stale {
|
||||
reasons.push("scan_stale");
|
||||
}
|
||||
if s.summary.critical > 0 {
|
||||
reasons.push("critical_findings");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check service flags
|
||||
if !hb.cloudflare_ok {
|
||||
reasons.push("cloudflare_down");
|
||||
}
|
||||
|
||||
NodeAttentionStatus {
|
||||
needs_attention: !reasons.is_empty(),
|
||||
reasons,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Persistence and State Replay
|
||||
|
||||
### Log Replay Mechanism
|
||||
- On startup, reconstruct in-memory state from JSONL logs
|
||||
- Replay events in chronological order
|
||||
- Recreate node history, scan results, and command results
|
||||
|
||||
```rust
|
||||
pub async fn replay_from_logs(&self) {
|
||||
self.replay_heartbeats().await;
|
||||
self.replay_scans().await;
|
||||
self.replay_commands().await;
|
||||
self.replay_envelopes().await;
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- In-memory event store: 500 most recent events
|
||||
- Append-only logging for durability
|
||||
- Non-blocking event processing
|
||||
- Low-overhead broadcasting
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Ed25519 key signing for commands
|
||||
- Configurable command policies
|
||||
- Nonce-based replay protection
|
||||
- No sensitive data stored in logs
|
||||
|
||||
## Extensibility
|
||||
|
||||
- Easy to add new event types
|
||||
- Flexible attention computation
|
||||
- Modular event processing pipeline
|
||||
163
docs/NODE_AGENT_CONTRACT.md
Normal file
163
docs/NODE_AGENT_CONTRACT.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Node Agent Contract
|
||||
|
||||
This document defines the API contract between VaultMesh Node Agents and the Command Center.
|
||||
|
||||
## Heartbeat Endpoint
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
POST /api/agent/heartbeat
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
### Heartbeat Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"node_id": "UUID v4",
|
||||
"hostname": "string",
|
||||
"os_profile": "string",
|
||||
"cloudflare_ok": "boolean",
|
||||
"services_ok": "boolean",
|
||||
"vaultmesh_root": "string",
|
||||
"timestamp": "ISO 8601 / RFC 3339"
|
||||
}
|
||||
```
|
||||
|
||||
### Field Definitions
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `node_id` | UUID v4 | Unique identifier for this node. Should persist across reboots. |
|
||||
| `hostname` | String | System hostname (from `hostname::get()` or `/etc/hostname`) |
|
||||
| `os_profile` | String | VaultMesh profile name: `ArchVault`, `DebianVault`, etc. |
|
||||
| `cloudflare_ok` | Boolean | `true` if `cloudflared` service is active |
|
||||
| `services_ok` | Boolean | `true` if VAULTMESH_ROOT exists and is healthy |
|
||||
| `vaultmesh_root` | String | Path to VAULTMESH_ROOT (e.g., `/var/lib/vaultmesh`) |
|
||||
| `timestamp` | RFC 3339 | UTC timestamp when heartbeat was generated |
|
||||
|
||||
### Response
|
||||
|
||||
**Success (200 OK)**:
|
||||
```json
|
||||
{
|
||||
"status": "ok"
|
||||
}
|
||||
```
|
||||
|
||||
**Error (4xx/5xx)**:
|
||||
```json
|
||||
{
|
||||
"error": "description"
|
||||
}
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
The node agent is configured via environment variables, typically set in `/etc/vaultmesh/agent.env`.
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `VAULTMESH_NODE_ID` | No | Auto-generated UUID v4 | Persistent node identifier |
|
||||
| `VAULTMESH_CC_URL` | No | `http://127.0.0.1:8088` | Command Center base URL |
|
||||
| `VAULTMESH_OS_PROFILE` | No | `ArchVault` | OS profile name to report |
|
||||
| `VAULTMESH_ROOT` | No | `/var/lib/vaultmesh` | Path to check for services_ok |
|
||||
| `VAULTMESH_HEARTBEAT_SECS` | No | `30` | Seconds between heartbeats |
|
||||
| `RUST_LOG` | No | `info` | Log level (trace, debug, info, warn, error) |
|
||||
|
||||
### Example `/etc/vaultmesh/agent.env`
|
||||
|
||||
```bash
|
||||
VAULTMESH_NODE_ID=550e8400-e29b-41d4-a716-446655440000
|
||||
VAULTMESH_CC_URL=https://cc.vaultmesh.example
|
||||
VAULTMESH_OS_PROFILE=ArchVault
|
||||
VAULTMESH_ROOT=/var/lib/vaultmesh
|
||||
VAULTMESH_HEARTBEAT_SECS=30
|
||||
RUST_LOG=info
|
||||
```
|
||||
|
||||
## Node Registration
|
||||
|
||||
Nodes self-register on first heartbeat. There is no explicit registration endpoint.
|
||||
|
||||
When the Command Center receives a heartbeat with a new `node_id`, it creates a new entry. Subsequent heartbeats update the existing entry.
|
||||
|
||||
### Node ID Persistence
|
||||
|
||||
For consistent tracking, the `VAULTMESH_NODE_ID` should be persisted. Options:
|
||||
|
||||
1. **Environment file**: Set in `/etc/vaultmesh/agent.env`
|
||||
2. **Machine ID**: Could derive from `/etc/machine-id`
|
||||
3. **Auto-generated**: If not set, agent generates a new UUID on each start (not recommended for production)
|
||||
|
||||
**Recommended**: Generate a UUID once during node bootstrap and store in `agent.env`:
|
||||
|
||||
```bash
|
||||
# During node bootstrap
|
||||
echo "VAULTMESH_NODE_ID=$(uuidgen)" >> /etc/vaultmesh/agent.env
|
||||
```
|
||||
|
||||
## Health Checks
|
||||
|
||||
### cloudflare_ok
|
||||
|
||||
The agent runs:
|
||||
```bash
|
||||
systemctl is-active --quiet cloudflared
|
||||
```
|
||||
|
||||
Returns `true` if exit code is 0 (service active).
|
||||
|
||||
### services_ok
|
||||
|
||||
The agent checks if `VAULTMESH_ROOT` exists and is a directory:
|
||||
```rust
|
||||
std::path::Path::new(vaultmesh_root).is_dir()
|
||||
```
|
||||
|
||||
Future versions may add additional checks:
|
||||
- Disk space
|
||||
- Key services running
|
||||
- ProofChain integrity
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Network Errors
|
||||
|
||||
If the agent cannot reach the Command Center:
|
||||
- Log error at WARN/ERROR level
|
||||
- Sleep for heartbeat interval
|
||||
- Retry on next cycle
|
||||
|
||||
No exponential backoff in V1. The agent will retry every `VAULTMESH_HEARTBEAT_SECS` seconds indefinitely.
|
||||
|
||||
### Invalid Response
|
||||
|
||||
If CC returns non-2xx status:
|
||||
- Log warning with status code
|
||||
- Continue normal operation
|
||||
- Retry on next cycle
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Transport Security
|
||||
|
||||
- Agent should connect to CC via HTTPS (Cloudflare Tunnel)
|
||||
- `reqwest` configured with `rustls-tls` (no OpenSSL dependency)
|
||||
|
||||
### Authentication (Future)
|
||||
|
||||
V1 has no agent authentication. Future versions may add:
|
||||
- Signed JWTs
|
||||
- Shared secrets
|
||||
- mTLS
|
||||
|
||||
### Data Sensitivity
|
||||
|
||||
Heartbeat data is low-sensitivity:
|
||||
- No secrets or credentials
|
||||
- No PII
|
||||
- No file contents
|
||||
|
||||
The `vaultmesh_root` path and hostname are the most identifying fields.
|
||||
Reference in New Issue
Block a user