Files
vm-control/CHANGELOG.md
2025-12-18 00:14:03 +00:00

10 KiB
Raw Permalink Blame History

Changelog

V0.7.3 Envelope Canonicalization

  • Added format and schema to EventEnvelope for stable codec/semantics pinning.
  • Renamed bodypayload in the comms event API (server still accepts body as an alias).
  • Canonicalization enforced before persistence/broadcast:
    • Timestamps truncated to RFC3339 UTC Z with seconds precision
    • payload object keys recursively sorted (arrays preserve order)
  • events.jsonl now stores one canonical EventEnvelope per line.

V0.7.2 Communication Layer

Unified Event API

  • Added EventEnvelope as the canonical message format for all comms events.
  • New event kinds: note, incident, ack, tag, resolve.
  • Events support optional node_id for node-specific or global scope.
  • Author field tracks origin: "operator", "system", "vm-copilot", etc.

New API Endpoints

  • POST /api/events - Create a new event envelope.
    • Server assigns id and ts.
    • Author can be overridden via X-VM-Author header.
  • GET /api/events - Query events with filtering:
    • ?since=RFC3339 - Filter by timestamp.
    • ?kind=note,incident - Comma-separated kind filter.
    • ?node_id=uuid - Filter by node.
    • ?limit=N - Max results (default: 100).

SSE Integration

  • Envelope events broadcast via SSE using their kind as the event name.
  • E.g., event: note for note envelopes.

Persistence

  • Events logged to events.jsonl in $VAULTMESH_LOG_DIR.
  • Replayed on startup to restore in-memory state.
  • Memory-bounded to 500 most recent envelopes.

Usage

# Post a note
curl -X POST http://127.0.0.1:8088/api/events \
  -H "Content-Type: application/json" \
  -d '{"kind":"note","node_id":"uuid","author":"operator","payload":{"text":"Test","severity":"info"}}'

# Query events
curl "http://127.0.0.1:8088/api/events?since=2025-01-01T00:00:00Z&kind=note"

V0.7.2 transforms CC from a control plane into a coordination plane.


V0.7.1 Mission Console

NASA-Style Dashboard

  • Added GET /console endpoint with 3-panel Mission Console UI.
  • Global Mission Bar with fleet KPIs: Total, Healthy, Attention, Critical, Last Scan.
  • Left panel: Clickable node list with live status pills (OK/ATTN/CRIT).
  • Center panel: Selected node telemetry with metrics cards and per-node event timeline.
  • Right panel: Attention summary chips, global scan findings, and global event feed.

Live SSE Integration

  • Real-time heartbeat indicator glow when nodes report in.
  • Live status pill updates via attention events.
  • Dynamic KPI recalculation without page refresh.
  • Per-node and global event timelines populated from SSE stream.

Visual Design

  • Dark NASA-inspired theme (#05070a background).
  • Monospace typography (JetBrains Mono).
  • CSS Grid 3-column layout with sticky headers.
  • Animated pulse for critical nodes.
  • Color-coded severity indicators (green/yellow/red).

Usage

# Open Mission Console
open http://127.0.0.1:8088/console

# Original table dashboard still at /
open http://127.0.0.1:8088/

V0.7.1 transforms the basic table into an operator-grade control room.


V0.7 SSE Event Bus

Real-Time Event Streaming

  • Added GET /events endpoint for Server-Sent Events (SSE).
  • Events are streamed in real-time as they occur (no page refresh needed).
  • Broadcast channel (tokio::sync::broadcast) distributes events to all connected clients.

Event Types

Four named SSE events are published:

  • heartbeat when a node sends a heartbeat.
  • scan when a sovereign-scan completes.
  • command when any command result is reported.
  • attention when a node's attention status changes.

Wire Format

Each event is JSON-encoded with wire-efficient payloads:

event: heartbeat
data: {"ts":"2024-01-15T10:30:00Z","node_id":"...","hostname":"vault-01","cloudflare_ok":true,...}

event: attention
data: {"ts":"2024-01-15T10:30:00Z","node_id":"...","needs_attention":true,"reasons":["critical_findings"]}

Dashboard Integration

  • Minimal JS probe added to dashboard and node detail pages.
  • Events are logged to browser console ([SSE][HB], [SSE][SCAN], etc.) for debugging.
  • Foundation for V0.7.1 Mission Console with live row updates.

Keepalive

  • SSE connection sends keepalive every 15 seconds to prevent timeouts.
  • Clients that lag behind receive a warning and skip missed events.

Testing

# Connect to SSE stream
curl -N http://127.0.0.1:8088/events

# Or open dashboard in browser, open DevTools Console
# Trigger heartbeat → see [SSE][HB] in console

V0.7 provides the real-time infrastructure for live dashboards without page refresh.


V0.6.1 Log Tools

CLI Log Commands

  • Added logs view subcommand to query event logs with filters:
    • --kind heartbeats|scans|commands filter by event type
    • --node NODE_ID filter by node UUID
    • --since 1h|24h|7d|30m time-based filtering
    • --min-severity info|low|medium|high|critical severity filter (scans only)
    • -n|--limit N number of events to show (default: 20)
  • Added logs tail subcommand for real-time log following
  • Added logs stats subcommand for per-node and per-kind statistics
  • Server mode unchanged: vm-cc or vm-cc serve starts HTTP server

Files Added

  • command-center/src/cli.rs CLI argument parsing with clap
  • command-center/src/logs.rs Log reading and query logic

Usage Examples

vm-cc logs view                          # last 20 events
vm-cc logs view --kind scans             # last 20 scans
vm-cc logs view --since 1h --kind heartbeats
vm-cc logs view --kind scans --min-severity high
vm-cc logs stats                         # event counts by node
vm-cc logs tail                          # follow all logs

V0.6 Append-Only Persistence

Event Logging

  • All CC state changes are now persisted to append-only JSONL log files.
  • Log files are stored in $VAULTMESH_LOG_DIR (default: /var/lib/vaultmesh/cc-logs).
  • Three event types:
    • heartbeats.jsonl HeartbeatEvent for each agent heartbeat.
    • scans.jsonl ScanEvent when sovereign-scan completes successfully.
    • commands.jsonl CommandEvent when agent reports any command result.

Replay on Startup

  • CC replays all log files on startup to reconstruct in-memory state.
  • Nodes, heartbeat history, scan results, and command history survive restarts.
  • Log replay counts are printed at startup for visibility.

Ledger-Ready Format

  • JSONL format is ledger-native; each line is a self-contained JSON object.
  • Designed as foundation for V0.8 Ledger Bridge (Merkle tree over logs).
  • No external database dependencies (SQLite not required).

Configuration

Variable Default Description
VAULTMESH_LOG_DIR /var/lib/vaultmesh/cc-logs Directory for event logs

With V0.6, CC state is durable across restarts. The append-only design provides an audit trail of all fleet events.


V0.5 Fleet Orchestrator

Scan Orchestrator

  • Background scheduler runs every VAULTMESH_SCHEDULER_TICK_SECONDS (default: 300s).
  • Automatically queues sovereign-scan commands for nodes whose last scan is older than VAULTMESH_SCAN_INTERVAL_HOURS (default: 24h).
  • Commands are signed with the CC Ed25519 key and added to the per-node command queue for agent pickup.

Staleness / Drift Detection

  • New Attention column on the main dashboard for at-a-glance fleet health.
  • Attention reasons:
    • never_scanned node has no scan history.
    • scan_stale last scan older than VAULTMESH_SCAN_STALE_HOURS.
    • heartbeat_stale no heartbeat within VAULTMESH_HEARTBEAT_STALE_MINUTES.
    • critical_findings last scan reported critical vulnerabilities.
    • high_findings last scan reported high-severity vulnerabilities.
    • cloudflare_down Cloudflare service flag is false.
    • services_down services flag is false.
  • Visual cues:
    • .attn-ok green background, label OK for healthy nodes.
    • .attn-bad red background, comma-separated reasons for nodes needing attention.

Command Policy Enforcement

  • Introduced CommandPolicy with:
    • global_allowed commands.
    • Optional per_profile allowlists keyed by OS profile.
  • Default allowed commands:
    • service-status
    • tail-journal
    • sovereign-scan
    • restart-service
  • Web UI returns HTTP 403 Forbidden for disallowed commands.
  • Scheduler respects policy and will not auto-queue scans if they are disallowed for the node's profile.

Configuration (Environment Variables)

Variable Default Description
VAULTMESH_SCAN_INTERVAL_HOURS 24 How often each node should be scanned
VAULTMESH_SCAN_STALE_HOURS 48 Scan staleness threshold
VAULTMESH_HEARTBEAT_STALE_MINUTES 10 Heartbeat staleness threshold
VAULTMESH_SCHEDULER_TICK_SECONDS 300 Scheduler loop interval

With V0.5, the fleet tends itself; operators only need to investigate nodes marked red in the Attention column.


V0.4.1 Scan Status UI

  • Dashboard columns: Last Scan, Crit/High, Source (REAL/MOCK badges).
  • Parse sovereign-scan stdout to track LastScan in state.
  • Visual indicators for critical (red) and high (orange) findings.

V0.4 Sovereign Scan Integration

  • Node agent executes sovereign-scan command.
  • Scan results reported via command-result API.
  • ProofChain receipts written to /var/lib/vaultmesh/proofchain/.

V0.3 Signed Commands

  • Ed25519 command signing (CC signs, agent verifies).
  • Command queue per node with nonce replay protection.
  • Commands: service-status, restart-service, tail-journal.

V0.2 Node Metrics

  • Added system metrics to heartbeat (load, memory, disk).
  • Node detail page with metrics cards.
  • Heartbeat history tracking.

V0.1 Initial Release

  • Command Center with HTML dashboard.
  • Node Agent with heartbeat loop.
  • Basic health monitoring (cloudflare_ok, services_ok).