Files
vm-control/CHANGELOG.md
2025-12-18 00:14:03 +00:00

270 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Changelog
## V0.7.3 Envelope Canonicalization
- Added `format` and `schema` to `EventEnvelope` for stable codec/semantics pinning.
- Renamed `body``payload` in the comms event API (server still accepts `body` as an alias).
- Canonicalization enforced before persistence/broadcast:
- Timestamps truncated to RFC3339 UTC `Z` with seconds precision
- `payload` object keys recursively sorted (arrays preserve order)
- `events.jsonl` now stores one canonical `EventEnvelope` per line.
## V0.7.2 Communication Layer
### Unified Event API
- Added `EventEnvelope` as the canonical message format for all comms events.
- New event kinds: `note`, `incident`, `ack`, `tag`, `resolve`.
- Events support optional `node_id` for node-specific or global scope.
- Author field tracks origin: "operator", "system", "vm-copilot", etc.
### New API Endpoints
- `POST /api/events` - Create a new event envelope.
- Server assigns `id` and `ts`.
- Author can be overridden via `X-VM-Author` header.
- `GET /api/events` - Query events with filtering:
- `?since=RFC3339` - Filter by timestamp.
- `?kind=note,incident` - Comma-separated kind filter.
- `?node_id=uuid` - Filter by node.
- `?limit=N` - Max results (default: 100).
### SSE Integration
- Envelope events broadcast via SSE using their `kind` as the event name.
- E.g., `event: note` for note envelopes.
### Persistence
- Events logged to `events.jsonl` in `$VAULTMESH_LOG_DIR`.
- Replayed on startup to restore in-memory state.
- Memory-bounded to 500 most recent envelopes.
### Usage
```bash
# Post a note
curl -X POST http://127.0.0.1:8088/api/events \
-H "Content-Type: application/json" \
-d '{"kind":"note","node_id":"uuid","author":"operator","payload":{"text":"Test","severity":"info"}}'
# Query events
curl "http://127.0.0.1:8088/api/events?since=2025-01-01T00:00:00Z&kind=note"
```
> V0.7.2 transforms CC from a control plane into a coordination plane.
---
## V0.7.1 Mission Console
### NASA-Style Dashboard
- Added `GET /console` endpoint with 3-panel Mission Console UI.
- Global Mission Bar with fleet KPIs: Total, Healthy, Attention, Critical, Last Scan.
- Left panel: Clickable node list with live status pills (OK/ATTN/CRIT).
- Center panel: Selected node telemetry with metrics cards and per-node event timeline.
- Right panel: Attention summary chips, global scan findings, and global event feed.
### Live SSE Integration
- Real-time heartbeat indicator glow when nodes report in.
- Live status pill updates via `attention` events.
- Dynamic KPI recalculation without page refresh.
- Per-node and global event timelines populated from SSE stream.
### Visual Design
- Dark NASA-inspired theme (#05070a background).
- Monospace typography (JetBrains Mono).
- CSS Grid 3-column layout with sticky headers.
- Animated pulse for critical nodes.
- Color-coded severity indicators (green/yellow/red).
### Usage
```bash
# Open Mission Console
open http://127.0.0.1:8088/console
# Original table dashboard still at /
open http://127.0.0.1:8088/
```
> V0.7.1 transforms the basic table into an operator-grade control room.
---
## V0.7 SSE Event Bus
### Real-Time Event Streaming
- Added `GET /events` endpoint for Server-Sent Events (SSE).
- Events are streamed in real-time as they occur (no page refresh needed).
- Broadcast channel (`tokio::sync::broadcast`) distributes events to all connected clients.
### Event Types
Four named SSE events are published:
- `heartbeat` when a node sends a heartbeat.
- `scan` when a sovereign-scan completes.
- `command` when any command result is reported.
- `attention` when a node's attention status changes.
### Wire Format
Each event is JSON-encoded with wire-efficient payloads:
```
event: heartbeat
data: {"ts":"2024-01-15T10:30:00Z","node_id":"...","hostname":"vault-01","cloudflare_ok":true,...}
event: attention
data: {"ts":"2024-01-15T10:30:00Z","node_id":"...","needs_attention":true,"reasons":["critical_findings"]}
```
### Dashboard Integration
- Minimal JS probe added to dashboard and node detail pages.
- Events are logged to browser console (`[SSE][HB]`, `[SSE][SCAN]`, etc.) for debugging.
- Foundation for V0.7.1 Mission Console with live row updates.
### Keepalive
- SSE connection sends keepalive every 15 seconds to prevent timeouts.
- Clients that lag behind receive a warning and skip missed events.
### Testing
```bash
# Connect to SSE stream
curl -N http://127.0.0.1:8088/events
# Or open dashboard in browser, open DevTools Console
# Trigger heartbeat → see [SSE][HB] in console
```
> V0.7 provides the real-time infrastructure for live dashboards without page refresh.
---
## V0.6.1 Log Tools
### CLI Log Commands
- Added `logs view` subcommand to query event logs with filters:
- `--kind heartbeats|scans|commands` filter by event type
- `--node NODE_ID` filter by node UUID
- `--since 1h|24h|7d|30m` time-based filtering
- `--min-severity info|low|medium|high|critical` severity filter (scans only)
- `-n|--limit N` number of events to show (default: 20)
- Added `logs tail` subcommand for real-time log following
- Added `logs stats` subcommand for per-node and per-kind statistics
- Server mode unchanged: `vm-cc` or `vm-cc serve` starts HTTP server
### Files Added
- `command-center/src/cli.rs` CLI argument parsing with clap
- `command-center/src/logs.rs` Log reading and query logic
### Usage Examples
```bash
vm-cc logs view # last 20 events
vm-cc logs view --kind scans # last 20 scans
vm-cc logs view --since 1h --kind heartbeats
vm-cc logs view --kind scans --min-severity high
vm-cc logs stats # event counts by node
vm-cc logs tail # follow all logs
```
---
## V0.6 Append-Only Persistence
### Event Logging
- All CC state changes are now persisted to append-only JSONL log files.
- Log files are stored in `$VAULTMESH_LOG_DIR` (default: `/var/lib/vaultmesh/cc-logs`).
- Three event types:
- `heartbeats.jsonl` HeartbeatEvent for each agent heartbeat.
- `scans.jsonl` ScanEvent when sovereign-scan completes successfully.
- `commands.jsonl` CommandEvent when agent reports any command result.
### Replay on Startup
- CC replays all log files on startup to reconstruct in-memory state.
- Nodes, heartbeat history, scan results, and command history survive restarts.
- Log replay counts are printed at startup for visibility.
### Ledger-Ready Format
- JSONL format is ledger-native; each line is a self-contained JSON object.
- Designed as foundation for V0.8 Ledger Bridge (Merkle tree over logs).
- No external database dependencies (SQLite not required).
### Configuration
| Variable | Default | Description |
|----------------------|------------------------------|----------------------------|
| `VAULTMESH_LOG_DIR` | `/var/lib/vaultmesh/cc-logs` | Directory for event logs |
> With V0.6, CC state is durable across restarts. The append-only design provides an audit trail of all fleet events.
---
## V0.5 Fleet Orchestrator
### Scan Orchestrator
- Background scheduler runs every `VAULTMESH_SCHEDULER_TICK_SECONDS` (default: `300s`).
- Automatically queues `sovereign-scan` commands for nodes whose last scan is older than
`VAULTMESH_SCAN_INTERVAL_HOURS` (default: `24h`).
- Commands are signed with the CC Ed25519 key and added to the per-node command queue for agent pickup.
### Staleness / Drift Detection
- New **Attention** column on the main dashboard for at-a-glance fleet health.
- Attention reasons:
- `never_scanned` node has no scan history.
- `scan_stale` last scan older than `VAULTMESH_SCAN_STALE_HOURS`.
- `heartbeat_stale` no heartbeat within `VAULTMESH_HEARTBEAT_STALE_MINUTES`.
- `critical_findings` last scan reported critical vulnerabilities.
- `high_findings` last scan reported high-severity vulnerabilities.
- `cloudflare_down` Cloudflare service flag is `false`.
- `services_down` services flag is `false`.
- Visual cues:
- `.attn-ok` green background, label `OK` for healthy nodes.
- `.attn-bad` red background, comma-separated reasons for nodes needing attention.
### Command Policy Enforcement
- Introduced `CommandPolicy` with:
- `global_allowed` commands.
- Optional `per_profile` allowlists keyed by OS profile.
- Default allowed commands:
- `service-status`
- `tail-journal`
- `sovereign-scan`
- `restart-service`
- Web UI returns HTTP `403 Forbidden` for disallowed commands.
- Scheduler respects policy and will not auto-queue scans if they are disallowed for the node's profile.
### Configuration (Environment Variables)
| Variable | Default | Description |
|------------------------------------|---------|-------------------------------------|
| `VAULTMESH_SCAN_INTERVAL_HOURS` | `24` | How often each node should be scanned |
| `VAULTMESH_SCAN_STALE_HOURS` | `48` | Scan staleness threshold |
| `VAULTMESH_HEARTBEAT_STALE_MINUTES`| `10` | Heartbeat staleness threshold |
| `VAULTMESH_SCHEDULER_TICK_SECONDS` | `300` | Scheduler loop interval |
> With V0.5, the fleet tends itself; operators only need to investigate nodes marked red in the Attention column.
---
## V0.4.1 Scan Status UI
- Dashboard columns: Last Scan, Crit/High, Source (REAL/MOCK badges).
- Parse `sovereign-scan` stdout to track `LastScan` in state.
- Visual indicators for critical (red) and high (orange) findings.
## V0.4 Sovereign Scan Integration
- Node agent executes `sovereign-scan` command.
- Scan results reported via command-result API.
- ProofChain receipts written to `/var/lib/vaultmesh/proofchain/`.
## V0.3 Signed Commands
- Ed25519 command signing (CC signs, agent verifies).
- Command queue per node with nonce replay protection.
- Commands: `service-status`, `restart-service`, `tail-journal`.
## V0.2 Node Metrics
- Added system metrics to heartbeat (load, memory, disk).
- Node detail page with metrics cards.
- Heartbeat history tracking.
## V0.1 Initial Release
- Command Center with HTML dashboard.
- Node Agent with heartbeat loop.
- Basic health monitoring (`cloudflare_ok`, `services_ok`).