chore: initial import

This commit is contained in:
Sovereign
2025-12-18 00:29:15 +01:00
commit 789397eb33
22 changed files with 5944 additions and 0 deletions

5
.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
/target
Cargo.lock
*.log
.env
/cc-ed25519.key

260
CHANGELOG.md Normal file
View File

@@ -0,0 +1,260 @@
# Changelog
## V0.7.2 Communication Layer
### Unified Event API
- Added `EventEnvelope` as the canonical message format for all comms events.
- New event kinds: `note`, `incident`, `ack`, `tag`, `resolve`.
- Events support optional `node_id` for node-specific or global scope.
- Author field tracks origin: "operator", "system", "vm-copilot", etc.
### New API Endpoints
- `POST /api/events` - Create a new event envelope.
- Server assigns `id` and `ts`.
- Author can be overridden via `X-VM-Author` header.
- `GET /api/events` - Query events with filtering:
- `?since=RFC3339` - Filter by timestamp.
- `?kind=note,incident` - Comma-separated kind filter.
- `?node_id=uuid` - Filter by node.
- `?limit=N` - Max results (default: 100).
### SSE Integration
- Envelope events broadcast via SSE using their `kind` as the event name.
- E.g., `event: note` for note envelopes.
### Persistence
- Events logged to `events.jsonl` in `$VAULTMESH_LOG_DIR`.
- Replayed on startup to restore in-memory state.
- Memory-bounded to 500 most recent envelopes.
### Usage
```bash
# Post a note
curl -X POST http://127.0.0.1:8088/api/events \
-H "Content-Type: application/json" \
-d '{"kind":"note","node_id":"uuid","author":"operator","body":{"text":"Test","severity":"info"}}'
# Query events
curl "http://127.0.0.1:8088/api/events?since=2025-01-01T00:00:00Z&kind=note"
```
> V0.7.2 transforms CC from a control plane into a coordination plane.
---
## V0.7.1 Mission Console
### NASA-Style Dashboard
- Added `GET /console` endpoint with 3-panel Mission Console UI.
- Global Mission Bar with fleet KPIs: Total, Healthy, Attention, Critical, Last Scan.
- Left panel: Clickable node list with live status pills (OK/ATTN/CRIT).
- Center panel: Selected node telemetry with metrics cards and per-node event timeline.
- Right panel: Attention summary chips, global scan findings, and global event feed.
### Live SSE Integration
- Real-time heartbeat indicator glow when nodes report in.
- Live status pill updates via `attention` events.
- Dynamic KPI recalculation without page refresh.
- Per-node and global event timelines populated from SSE stream.
### Visual Design
- Dark NASA-inspired theme (#05070a background).
- Monospace typography (JetBrains Mono).
- CSS Grid 3-column layout with sticky headers.
- Animated pulse for critical nodes.
- Color-coded severity indicators (green/yellow/red).
### Usage
```bash
# Open Mission Console
open http://127.0.0.1:8088/console
# Original table dashboard still at /
open http://127.0.0.1:8088/
```
> V0.7.1 transforms the basic table into an operator-grade control room.
---
## V0.7 SSE Event Bus
### Real-Time Event Streaming
- Added `GET /events` endpoint for Server-Sent Events (SSE).
- Events are streamed in real-time as they occur (no page refresh needed).
- Broadcast channel (`tokio::sync::broadcast`) distributes events to all connected clients.
### Event Types
Four named SSE events are published:
- `heartbeat` when a node sends a heartbeat.
- `scan` when a sovereign-scan completes.
- `command` when any command result is reported.
- `attention` when a node's attention status changes.
### Wire Format
Each event is JSON-encoded with wire-efficient payloads:
```
event: heartbeat
data: {"ts":"2024-01-15T10:30:00Z","node_id":"...","hostname":"vault-01","cloudflare_ok":true,...}
event: attention
data: {"ts":"2024-01-15T10:30:00Z","node_id":"...","needs_attention":true,"reasons":["critical_findings"]}
```
### Dashboard Integration
- Minimal JS probe added to dashboard and node detail pages.
- Events are logged to browser console (`[SSE][HB]`, `[SSE][SCAN]`, etc.) for debugging.
- Foundation for V0.7.1 Mission Console with live row updates.
### Keepalive
- SSE connection sends keepalive every 15 seconds to prevent timeouts.
- Clients that lag behind receive a warning and skip missed events.
### Testing
```bash
# Connect to SSE stream
curl -N http://127.0.0.1:8088/events
# Or open dashboard in browser, open DevTools Console
# Trigger heartbeat → see [SSE][HB] in console
```
> V0.7 provides the real-time infrastructure for live dashboards without page refresh.
---
## V0.6.1 Log Tools
### CLI Log Commands
- Added `logs view` subcommand to query event logs with filters:
- `--kind heartbeats|scans|commands` filter by event type
- `--node NODE_ID` filter by node UUID
- `--since 1h|24h|7d|30m` time-based filtering
- `--min-severity info|low|medium|high|critical` severity filter (scans only)
- `-n|--limit N` number of events to show (default: 20)
- Added `logs tail` subcommand for real-time log following
- Added `logs stats` subcommand for per-node and per-kind statistics
- Server mode unchanged: `vm-cc` or `vm-cc serve` starts HTTP server
### Files Added
- `command-center/src/cli.rs` CLI argument parsing with clap
- `command-center/src/logs.rs` Log reading and query logic
### Usage Examples
```bash
vm-cc logs view # last 20 events
vm-cc logs view --kind scans # last 20 scans
vm-cc logs view --since 1h --kind heartbeats
vm-cc logs view --kind scans --min-severity high
vm-cc logs stats # event counts by node
vm-cc logs tail # follow all logs
```
---
## V0.6 Append-Only Persistence
### Event Logging
- All CC state changes are now persisted to append-only JSONL log files.
- Log files are stored in `$VAULTMESH_LOG_DIR` (default: `/var/lib/vaultmesh/cc-logs`).
- Three event types:
- `heartbeats.jsonl` HeartbeatEvent for each agent heartbeat.
- `scans.jsonl` ScanEvent when sovereign-scan completes successfully.
- `commands.jsonl` CommandEvent when agent reports any command result.
### Replay on Startup
- CC replays all log files on startup to reconstruct in-memory state.
- Nodes, heartbeat history, scan results, and command history survive restarts.
- Log replay counts are printed at startup for visibility.
### Ledger-Ready Format
- JSONL format is ledger-native; each line is a self-contained JSON object.
- Designed as foundation for V0.8 Ledger Bridge (Merkle tree over logs).
- No external database dependencies (SQLite not required).
### Configuration
| Variable | Default | Description |
|----------------------|------------------------------|----------------------------|
| `VAULTMESH_LOG_DIR` | `/var/lib/vaultmesh/cc-logs` | Directory for event logs |
> With V0.6, CC state is durable across restarts. The append-only design provides an audit trail of all fleet events.
---
## V0.5 Fleet Orchestrator
### Scan Orchestrator
- Background scheduler runs every `VAULTMESH_SCHEDULER_TICK_SECONDS` (default: `300s`).
- Automatically queues `sovereign-scan` commands for nodes whose last scan is older than
`VAULTMESH_SCAN_INTERVAL_HOURS` (default: `24h`).
- Commands are signed with the CC Ed25519 key and added to the per-node command queue for agent pickup.
### Staleness / Drift Detection
- New **Attention** column on the main dashboard for at-a-glance fleet health.
- Attention reasons:
- `never_scanned` node has no scan history.
- `scan_stale` last scan older than `VAULTMESH_SCAN_STALE_HOURS`.
- `heartbeat_stale` no heartbeat within `VAULTMESH_HEARTBEAT_STALE_MINUTES`.
- `critical_findings` last scan reported critical vulnerabilities.
- `high_findings` last scan reported high-severity vulnerabilities.
- `cloudflare_down` Cloudflare service flag is `false`.
- `services_down` services flag is `false`.
- Visual cues:
- `.attn-ok` green background, label `OK` for healthy nodes.
- `.attn-bad` red background, comma-separated reasons for nodes needing attention.
### Command Policy Enforcement
- Introduced `CommandPolicy` with:
- `global_allowed` commands.
- Optional `per_profile` allowlists keyed by OS profile.
- Default allowed commands:
- `service-status`
- `tail-journal`
- `sovereign-scan`
- `restart-service`
- Web UI returns HTTP `403 Forbidden` for disallowed commands.
- Scheduler respects policy and will not auto-queue scans if they are disallowed for the node's profile.
### Configuration (Environment Variables)
| Variable | Default | Description |
|------------------------------------|---------|-------------------------------------|
| `VAULTMESH_SCAN_INTERVAL_HOURS` | `24` | How often each node should be scanned |
| `VAULTMESH_SCAN_STALE_HOURS` | `48` | Scan staleness threshold |
| `VAULTMESH_HEARTBEAT_STALE_MINUTES`| `10` | Heartbeat staleness threshold |
| `VAULTMESH_SCHEDULER_TICK_SECONDS` | `300` | Scheduler loop interval |
> With V0.5, the fleet tends itself; operators only need to investigate nodes marked red in the Attention column.
---
## V0.4.1 Scan Status UI
- Dashboard columns: Last Scan, Crit/High, Source (REAL/MOCK badges).
- Parse `sovereign-scan` stdout to track `LastScan` in state.
- Visual indicators for critical (red) and high (orange) findings.
## V0.4 Sovereign Scan Integration
- Node agent executes `sovereign-scan` command.
- Scan results reported via command-result API.
- ProofChain receipts written to `/var/lib/vaultmesh/proofchain/`.
## V0.3 Signed Commands
- Ed25519 command signing (CC signs, agent verifies).
- Command queue per node with nonce replay protection.
- Commands: `service-status`, `restart-service`, `tail-journal`.
## V0.2 Node Metrics
- Added system metrics to heartbeat (load, memory, disk).
- Node detail page with metrics cards.
- Heartbeat history tracking.
## V0.1 Initial Release
- Command Center with HTML dashboard.
- Node Agent with heartbeat loop.
- Basic health monitoring (`cloudflare_ok`, `services_ok`).

7
Cargo.toml Normal file
View File

@@ -0,0 +1,7 @@
[workspace]
members = [
"command-center",
"node-agent",
]
resolver = "2"

155
README.md Normal file
View File

@@ -0,0 +1,155 @@
# VaultMesh Command Center
Minimal, no-bloat control plane for VaultMesh nodes.
- Rust backend (Axum) with server-rendered HTML (HTMX-ready)
- Node agent (Rust daemon) that runs on each VaultMesh node
- Zero external infra deps (no Redis, Kafka, k8s)
- Cloudflare-native: fronted by Cloudflare Tunnel + Access
## Repository Layout
```text
vaultmesh-command-center/
├── Cargo.toml # Workspace manifest
├── README.md
├── .gitignore
├── command-center/ # Backend + Web UI
│ ├── Cargo.toml
│ └── src/
│ ├── main.rs # Entry point, server setup
│ ├── routes.rs # HTTP handlers
│ └── state.rs # AppState, in-memory node store
├── node-agent/ # Daemon for each VaultMesh node
│ ├── Cargo.toml
│ └── src/
│ ├── main.rs # Heartbeat loop
│ └── config.rs # Env config loader
├── docs/
│ ├── ARCHITECTURE.md # How it all fits together
│ └── NODE_AGENT_CONTRACT.md # Agent API spec
└── systemd/
├── vaultmesh-command-center.service
└── vaultmesh-node-agent.service
```
## Quick Start
```bash
# Clone and build
git clone <your-url> vaultmesh-command-center
cd vaultmesh-command-center
# Run the command center locally
cd command-center
RUST_LOG=info cargo run
# listens on 127.0.0.1:8088
# In another terminal, run the agent (pointing at local CC)
cd ../node-agent
RUST_LOG=info VAULTMESH_OS_PROFILE=ArchVault cargo run
```
Then:
- Put the Command Center behind a Cloudflare Tunnel.
- Protect it with Cloudflare Access.
- Install the node agent as a systemd service on each VaultMesh node.
## Deployment
### Command Center
1. Build release binary:
```bash
cargo build --release -p vaultmesh-command-center
```
2. Copy to `/usr/local/bin/`:
```bash
sudo cp target/release/vaultmesh-command-center /usr/local/bin/
```
3. Install systemd unit:
```bash
sudo cp systemd/vaultmesh-command-center.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now vaultmesh-command-center
```
4. Configure Cloudflare Tunnel to point at `http://127.0.0.1:8088`.
### Node Agent
1. Build release binary:
```bash
cargo build --release -p vaultmesh-node-agent
```
2. Copy to each node:
```bash
sudo cp target/release/vaultmesh-node-agent /usr/local/bin/
```
3. Create environment file `/etc/vaultmesh/agent.env`:
```bash
VAULTMESH_CC_URL=https://cc.your-domain.example
VAULTMESH_OS_PROFILE=ArchVault
VAULTMESH_ROOT=/var/lib/vaultmesh
VAULTMESH_HEARTBEAT_SECS=30
```
4. Install systemd unit:
```bash
sudo cp systemd/vaultmesh-node-agent.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now vaultmesh-node-agent
```
## API Endpoints
| Method | Path | Description |
|--------|------------------------------------|------------------------------------------|
| GET | `/` | HTML dashboard showing all nodes |
| GET | `/nodes` | JSON array of all node heartbeats |
| GET | `/nodes/:id` | Node detail page (HTML) |
| POST | `/nodes/:id/commands` | Queue command for node (web form or API) |
| POST | `/api/agent/heartbeat` | Agent heartbeat endpoint |
| GET | `/api/agent/commands?node_id=<uuid>` | Agent polls for pending commands |
| POST | `/api/agent/command-result` | Agent reports command execution result |
## Fleet Operation Model (V0.5)
**Green fleet:**
- Attention = `OK`.
- Heartbeats fresh (age < `VAULTMESH_HEARTBEAT_STALE_MINUTES`).
- Last scan age < `VAULTMESH_SCAN_STALE_HOURS`.
- No critical or high findings.
**Yellow/Red fleet:**
Check Attention reasons in order:
1. `heartbeat_stale` → connectivity / host issue.
2. `cloudflare_down` / `services_down` → control plane or local service failure.
3. `never_scanned` / `scan_stale` → wait for scheduler or trigger manual scan (if policy allows).
4. `critical_findings` / `high_findings` → prioritize remediation on that node.
**Policy guardrail:**
- Disallowed commands are blocked with HTTP 403.
- Scheduler respects policy; `sovereign-scan` must be allowed for a profile to be auto-queued.
## Configuration
| Variable | Default | Description |
|------------------------------------|-------------------|------------------------------|
| `VAULTMESH_SCAN_INTERVAL_HOURS` | `24` | Auto-scan interval |
| `VAULTMESH_SCAN_STALE_HOURS` | `48` | Scan staleness threshold |
| `VAULTMESH_HEARTBEAT_STALE_MINUTES`| `10` | Heartbeat staleness |
| `VAULTMESH_SCHEDULER_TICK_SECONDS` | `300` | Scheduler tick interval |
| `VAULTMESH_CC_KEY_PATH` | `cc-ed25519.key` | Command signing key path |
## License
MIT

23
command-center/Cargo.toml Normal file
View File

@@ -0,0 +1,23 @@
[package]
name = "vaultmesh-command-center"
version = "0.1.0"
edition = "2021"
[dependencies]
axum = { version = "0.7", features = ["macros", "json"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tower = "0.4"
tower-http = { version = "0.5", features = ["trace"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
uuid = { version = "1", features = ["v4", "serde"] }
time = { version = "0.3", features = ["serde", "macros", "formatting", "parsing"] }
ed25519-dalek = { version = "2", features = ["rand_core"] }
rand = "0.8"
base64 = "0.22"
anyhow = "1"
clap = { version = "4", features = ["derive"] }
async-stream = "0.3"
futures-core = "0.3"

61
command-center/src/cli.rs Normal file
View File

@@ -0,0 +1,61 @@
use clap::{Parser, Subcommand};
#[derive(Parser)]
#[command(name = "vm-cc")]
#[command(about = "VaultMesh Command Center")]
pub struct Cli {
/// Subcommand to run. If omitted, starts the HTTP server.
#[command(subcommand)]
pub command: Option<Commands>,
}
#[derive(Subcommand)]
pub enum Commands {
/// View and query event logs
Logs {
#[command(subcommand)]
action: LogsAction,
},
/// Start the HTTP server (default if no subcommand is provided)
Serve,
}
#[derive(Subcommand)]
pub enum LogsAction {
/// View recent log events
View {
/// Event kind: heartbeats, scans, commands (default: all)
#[arg(short, long)]
kind: Option<String>,
/// Filter by node ID (UUID)
#[arg(short, long)]
node: Option<String>,
/// Time filter: 1h, 24h, 7d, 30m, etc.
#[arg(short, long)]
since: Option<String>,
/// Minimum severity for scans: info, low, medium, high, critical
#[arg(long)]
min_severity: Option<String>,
/// Number of events to show (default: 20)
#[arg(short = 'n', long, default_value = "20")]
limit: usize,
},
/// Follow logs in real-time
Tail {
/// Event kind to follow
#[arg(short, long)]
kind: Option<String>,
/// Filter by node ID (UUID)
#[arg(short, long)]
node: Option<String>,
},
/// Show log statistics
Stats,
}

396
command-center/src/logs.rs Normal file
View File

@@ -0,0 +1,396 @@
// V0.6.1: CLI log reading and query commands
use crate::state::{CommandEvent, HeartbeatEvent, ScanEvent};
use std::collections::HashMap;
use std::io::{BufRead, BufReader};
use std::path::PathBuf;
use time::{Duration, OffsetDateTime};
use uuid::Uuid;
/// Reader for JSONL log files.
pub struct LogReader {
dir: PathBuf,
}
impl LogReader {
pub fn new() -> Self {
let dir = std::env::var("VAULTMESH_LOG_DIR")
.map(PathBuf::from)
.unwrap_or_else(|_| PathBuf::from("/var/lib/vaultmesh/cc-logs"));
LogReader { dir }
}
pub fn read_heartbeats(&self) -> Vec<HeartbeatEvent> {
self.read_jsonl("heartbeats.jsonl")
}
pub fn read_scans(&self) -> Vec<ScanEvent> {
self.read_jsonl("scans.jsonl")
}
pub fn read_commands(&self) -> Vec<CommandEvent> {
self.read_jsonl("commands.jsonl")
}
fn read_jsonl<T: serde::de::DeserializeOwned>(&self, filename: &str) -> Vec<T> {
let path = self.dir.join(filename);
if !path.exists() {
return vec![];
}
let file = match std::fs::File::open(&path) {
Ok(f) => f,
Err(_) => return vec![],
};
BufReader::new(file)
.lines()
.filter_map(|line| line.ok())
.filter(|line| !line.trim().is_empty())
.filter_map(|line| serde_json::from_str(&line).ok())
.collect()
}
}
/// Parse a duration string like "1h", "24h", "7d", "30m".
pub fn parse_duration(s: &str) -> Option<Duration> {
let s = s.trim().to_lowercase();
if s.ends_with('h') {
s[..s.len() - 1].parse::<i64>().ok().map(Duration::hours)
} else if s.ends_with('d') {
s[..s.len() - 1].parse::<i64>().ok().map(Duration::days)
} else if s.ends_with('m') {
s[..s.len() - 1].parse::<i64>().ok().map(Duration::minutes)
} else {
None
}
}
/// Convert severity string to a rank for filtering.
pub fn severity_to_rank(s: &str) -> u8 {
match s.to_lowercase().as_str() {
"critical" => 5,
"high" => 4,
"medium" => 3,
"low" => 2,
"info" => 1,
_ => 0,
}
}
/// View recent log events with filtering.
pub fn cmd_logs_view(
kind: Option<String>,
node: Option<String>,
since: Option<String>,
min_severity: Option<String>,
limit: usize,
) {
let reader = LogReader::new();
let now = OffsetDateTime::now_utc();
let since_ts = since.as_ref().and_then(|s| parse_duration(s)).map(|d| now - d);
let node_filter: Option<Uuid> = node.as_ref().and_then(|s| s.parse().ok());
let min_sev_rank = min_severity.as_ref().map(|s| severity_to_rank(s)).unwrap_or(0);
// Collect events based on kind
let mut events: Vec<(OffsetDateTime, String)> = vec![];
if kind.is_none() || kind.as_deref() == Some("heartbeats") {
for e in reader.read_heartbeats() {
if let Some(ts) = since_ts {
if e.ts < ts {
continue;
}
}
if let Some(nf) = node_filter {
if e.node_id != nf {
continue;
}
}
let line = format!(
"[HB] {} {} {} cf={} svc={}",
format_ts(e.ts),
e.hostname,
e.os_profile,
if e.heartbeat.cloudflare_ok { "OK" } else { "DOWN" },
if e.heartbeat.services_ok { "OK" } else { "DOWN" }
);
events.push((e.ts, line));
}
}
if kind.is_none() || kind.as_deref() == Some("scans") {
for e in reader.read_scans() {
if let Some(ts) = since_ts {
if e.ts < ts {
continue;
}
}
if let Some(nf) = node_filter {
if e.node_id != nf {
continue;
}
}
let max_sev = if e.summary.critical > 0 {
5
} else if e.summary.high > 0 {
4
} else if e.summary.medium > 0 {
3
} else if e.summary.low > 0 {
2
} else {
1
};
if max_sev < min_sev_rank {
continue;
}
let source = if e.findings_real { "REAL" } else { "MOCK" };
let line = format!(
"[SCAN] {} C:{} H:{} M:{} L:{} [{}]",
format_ts(e.ts),
e.summary.critical,
e.summary.high,
e.summary.medium,
e.summary.low,
source
);
events.push((e.ts, line));
}
}
if kind.is_none() || kind.as_deref() == Some("commands") {
for e in reader.read_commands() {
if let Some(ts) = since_ts {
if e.ts < ts {
continue;
}
}
if let Some(nf) = node_filter {
if e.node_id != nf {
continue;
}
}
let line = format!(
"[CMD] {} {} status={} exit={:?}",
format_ts(e.ts),
e.cmd,
e.status,
e.exit_code
);
events.push((e.ts, line));
}
}
if events.is_empty() {
println!("No events found.");
return;
}
// Sort by timestamp descending, take limit
events.sort_by(|a, b| b.0.cmp(&a.0));
events.truncate(limit);
// Print in chronological order
for (_, line) in events.into_iter().rev() {
println!("{}", line);
}
}
/// Show log statistics.
pub fn cmd_logs_stats() {
let reader = LogReader::new();
let heartbeats = reader.read_heartbeats();
let scans = reader.read_scans();
let commands = reader.read_commands();
println!("=== Log Statistics ===\n");
println!("Heartbeats: {}", heartbeats.len());
println!("Scans: {}", scans.len());
println!("Commands: {}", commands.len());
// Count by node
let mut by_node: HashMap<Uuid, (usize, usize, usize, String)> = HashMap::new();
for e in &heartbeats {
let entry = by_node.entry(e.node_id).or_insert((0, 0, 0, e.hostname.clone()));
entry.0 += 1;
}
for e in &scans {
let entry = by_node.entry(e.node_id).or_insert((0, 0, 0, String::new()));
entry.1 += 1;
}
for e in &commands {
let entry = by_node.entry(e.node_id).or_insert((0, 0, 0, String::new()));
entry.2 += 1;
}
if !by_node.is_empty() {
println!("\n=== By Node ===\n");
println!(
"{:<38} {:<16} {:>8} {:>8} {:>8}",
"Node ID", "Hostname", "HB", "Scans", "Cmds"
);
println!("{:-<80}", "");
for (node_id, (hb, sc, cmd, hostname)) in by_node {
println!(
"{:<38} {:<16} {:>8} {:>8} {:>8}",
node_id, hostname, hb, sc, cmd
);
}
}
}
/// Follow logs in real-time (tail -f style).
pub fn cmd_logs_tail(kind: Option<String>, node: Option<String>) {
use std::io::Seek;
use std::thread;
use std::time::Duration as StdDuration;
let node_filter: Option<Uuid> = node.as_ref().and_then(|s| s.parse().ok());
// Track file positions
let mut hb_pos: u64 = 0;
let mut scan_pos: u64 = 0;
let mut cmd_pos: u64 = 0;
// Seek to end of existing files
let log_dir = std::env::var("VAULTMESH_LOG_DIR")
.map(PathBuf::from)
.unwrap_or_else(|_| PathBuf::from("/var/lib/vaultmesh/cc-logs"));
if let Ok(mut f) = std::fs::File::open(log_dir.join("heartbeats.jsonl")) {
hb_pos = f.seek(std::io::SeekFrom::End(0)).unwrap_or(0);
}
if let Ok(mut f) = std::fs::File::open(log_dir.join("scans.jsonl")) {
scan_pos = f.seek(std::io::SeekFrom::End(0)).unwrap_or(0);
}
if let Ok(mut f) = std::fs::File::open(log_dir.join("commands.jsonl")) {
cmd_pos = f.seek(std::io::SeekFrom::End(0)).unwrap_or(0);
}
println!("Following logs (Ctrl+C to stop)...\n");
loop {
// Check heartbeats
if kind.is_none() || kind.as_deref() == Some("heartbeats") {
if let Some((events, new_pos)) =
read_new_lines::<HeartbeatEvent>(&log_dir.join("heartbeats.jsonl"), hb_pos)
{
hb_pos = new_pos;
for e in events {
if let Some(nf) = node_filter {
if e.node_id != nf {
continue;
}
}
println!(
"[HB] {} {} {}",
format_ts(e.ts),
e.hostname,
e.os_profile
);
}
}
}
// Check scans
if kind.is_none() || kind.as_deref() == Some("scans") {
if let Some((events, new_pos)) =
read_new_lines::<ScanEvent>(&log_dir.join("scans.jsonl"), scan_pos)
{
scan_pos = new_pos;
for e in events {
if let Some(nf) = node_filter {
if e.node_id != nf {
continue;
}
}
let source = if e.findings_real { "REAL" } else { "MOCK" };
println!(
"[SCAN] {} C:{} H:{} M:{} L:{} [{}]",
format_ts(e.ts),
e.summary.critical,
e.summary.high,
e.summary.medium,
e.summary.low,
source
);
}
}
}
// Check commands
if kind.is_none() || kind.as_deref() == Some("commands") {
if let Some((events, new_pos)) =
read_new_lines::<CommandEvent>(&log_dir.join("commands.jsonl"), cmd_pos)
{
cmd_pos = new_pos;
for e in events {
if let Some(nf) = node_filter {
if e.node_id != nf {
continue;
}
}
println!(
"[CMD] {} {} status={}",
format_ts(e.ts),
e.cmd,
e.status
);
}
}
}
thread::sleep(StdDuration::from_secs(1));
}
}
/// Read new lines from a file starting at a given position.
fn read_new_lines<T: serde::de::DeserializeOwned>(
path: &PathBuf,
start_pos: u64,
) -> Option<(Vec<T>, u64)> {
use std::io::Seek;
let mut file = std::fs::File::open(path).ok()?;
let end_pos = file.seek(std::io::SeekFrom::End(0)).ok()?;
if end_pos <= start_pos {
return None;
}
file.seek(std::io::SeekFrom::Start(start_pos)).ok()?;
let reader = BufReader::new(file);
let mut events = Vec::new();
for line in reader.lines() {
if let Ok(line) = line {
if line.trim().is_empty() {
continue;
}
if let Ok(event) = serde_json::from_str(&line) {
events.push(event);
}
}
}
Some((events, end_pos))
}
/// Format timestamp for display.
fn format_ts(ts: OffsetDateTime) -> String {
format!(
"{:04}-{:02}-{:02} {:02}:{:02}:{:02}",
ts.year(),
ts.month() as u8,
ts.day(),
ts.hour(),
ts.minute(),
ts.second()
)
}

174
command-center/src/main.rs Normal file
View File

@@ -0,0 +1,174 @@
mod cli;
mod logs;
mod routes;
mod state;
use crate::cli::{Cli, Commands, LogsAction};
use crate::routes::app;
use crate::state::{AppState, CommandPayload, SignedCommand};
use base64::Engine;
use clap::Parser;
use ed25519_dalek::{Signer, SigningKey};
use std::net::SocketAddr;
use std::path::Path;
use std::time::Duration;
use time::OffsetDateTime;
use tokio::net::TcpListener;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
use uuid::Uuid;
/// Load an existing Ed25519 signing key or generate a new one.
fn load_or_init_signing_key() -> SigningKey {
let key_path = std::env::var("VAULTMESH_CC_KEY_PATH")
.unwrap_or_else(|_| "cc-ed25519.key".into());
let path = Path::new(&key_path);
if path.exists() {
let bytes = std::fs::read(path).expect("Failed to read signing key");
let key_bytes: [u8; 32] = bytes
.try_into()
.expect("Signing key must be exactly 32 bytes");
SigningKey::from_bytes(&key_bytes)
} else {
// Generate a new key
let mut csprng = rand::thread_rng();
let signing_key = SigningKey::generate(&mut csprng);
std::fs::write(path, signing_key.to_bytes()).expect("Failed to write signing key");
tracing::info!("Generated new signing key at {}", key_path);
signing_key
}
}
/// Background scheduler loop that auto-triggers scans on nodes.
async fn scheduler_loop(state: AppState) {
let tick_secs = std::env::var("VAULTMESH_SCHEDULER_TICK_SECONDS")
.ok()
.and_then(|s| s.parse::<u64>().ok())
.unwrap_or(300);
tracing::info!("Scheduler started with tick interval: {}s", tick_secs);
loop {
if let Err(e) = run_scheduler_tick(&state).await {
tracing::warn!("scheduler tick failed: {e}");
}
tokio::time::sleep(Duration::from_secs(tick_secs)).await;
}
}
/// Run a single scheduler tick: check each node and queue scans if needed.
async fn run_scheduler_tick(state: &AppState) -> anyhow::Result<()> {
let now = OffsetDateTime::now_utc();
let latest = state.list_latest().await;
let last_scans = state.list_last_scans().await;
for hb in latest.iter() {
// Check policy allows sovereign-scan for this profile
if !state.is_command_allowed(&hb.os_profile, &hb.node_id, "sovereign-scan") {
continue;
}
// Check if scan needed (never scanned or interval elapsed)
let needs_scan = match last_scans.get(&hb.node_id) {
None => true,
Some(scan) => {
let elapsed = now - scan.ts;
elapsed >= state.scheduler_cfg.scan_interval
}
};
if needs_scan {
// Build and sign the command
let payload = CommandPayload {
node_id: hb.node_id,
ts: now,
nonce: Uuid::new_v4().to_string(),
cmd: "sovereign-scan".into(),
args: serde_json::json!({}),
};
let payload_json = serde_json::to_vec(&payload)?;
let signature = state.signing_key.sign(&payload_json);
let signature_b64 =
base64::engine::general_purpose::STANDARD.encode(signature.to_bytes());
let signed_cmd = SignedCommand {
payload,
signature: signature_b64,
};
state.queue_command(signed_cmd).await;
tracing::info!(
"scheduler: queued sovereign-scan for {} ({})",
hb.node_id,
hb.hostname
);
}
}
Ok(())
}
#[tokio::main]
async fn main() {
let cli = Cli::parse();
match cli.command {
Some(Commands::Logs { action }) => {
// CLI log commands - no tracing/server init needed
match action {
LogsAction::View {
kind,
node,
since,
min_severity,
limit,
} => {
logs::cmd_logs_view(kind, node, since, min_severity, limit);
}
LogsAction::Tail { kind, node } => {
logs::cmd_logs_tail(kind, node);
}
LogsAction::Stats => {
logs::cmd_logs_stats();
}
}
}
Some(Commands::Serve) | None => {
// Run the HTTP server (default)
run_server().await;
}
}
}
async fn run_server() {
tracing_subscriber::registry()
.with(tracing_subscriber::EnvFilter::new(
std::env::var("RUST_LOG").unwrap_or_else(|_| "info".into()),
))
.with(tracing_subscriber::fmt::layer())
.init();
let signing_key = load_or_init_signing_key();
let state = AppState::new(signing_key);
// V0.6: Replay logs to reconstruct state
state.replay_from_logs().await;
// Log the public key for distribution to nodes
tracing::info!("CC public key (base64): {}", state.public_key_b64);
// Spawn the scheduler background task
let scheduler_state = state.clone();
tokio::spawn(async move {
scheduler_loop(scheduler_state).await;
});
let app = app(state);
let addr: SocketAddr = "0.0.0.0:8088".parse().unwrap();
tracing::info!("VaultMesh Command Center listening on http://{}", addr);
let listener = TcpListener::bind(&addr).await.unwrap();
axum::serve(listener, app).await.unwrap();
}

2069
command-center/src/routes.rs Normal file

File diff suppressed because it is too large Load Diff

842
command-center/src/state.rs Normal file
View File

@@ -0,0 +1,842 @@
use ed25519_dalek::SigningKey;
use serde::{Deserialize, Serialize};
use std::{
collections::{HashMap, HashSet, VecDeque},
fs::OpenOptions,
io::{BufRead, BufReader, Write},
path::PathBuf,
sync::Arc,
};
use time::{Duration, OffsetDateTime};
use tokio::sync::{broadcast, RwLock};
use uuid::Uuid;
/// How many heartbeats we keep per node (for history).
const MAX_HEARTBEATS_PER_NODE: usize = 50;
/// How many command records we keep per node.
const MAX_COMMAND_RECORDS_PER_NODE: usize = 50;
/// Metrics reported by the node agent.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct NodeMetrics {
pub uptime_seconds: Option<u64>,
pub load1: Option<f32>,
pub load5: Option<f32>,
pub load15: Option<f32>,
pub mem_total_mb: Option<u64>,
pub mem_used_mb: Option<u64>,
pub disk_root_total_gb: Option<f32>,
pub disk_root_used_gb: Option<f32>,
}
/// Heartbeat data reported by each node.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NodeHeartbeat {
pub node_id: Uuid,
pub hostname: String,
pub os_profile: String,
pub cloudflare_ok: bool,
pub services_ok: bool,
pub vaultmesh_root: String,
#[serde(with = "time::serde::rfc3339")]
pub timestamp: OffsetDateTime,
#[serde(default)]
pub metrics: NodeMetrics,
}
/// A single node's heartbeat history.
#[derive(Clone)]
pub struct NodeHistory {
pub latest: NodeHeartbeat,
pub history: VecDeque<NodeHeartbeat>,
}
/// Command payload to be signed and sent to agents.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CommandPayload {
pub node_id: Uuid,
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub nonce: String,
pub cmd: String,
pub args: serde_json::Value,
}
/// Signed command delivered to the agent.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SignedCommand {
pub payload: CommandPayload,
/// Base64-encoded Ed25519 signature of `payload` JSON bytes.
pub signature: String,
}
/// Result of executing a command on the node.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CommandResult {
pub node_id: Uuid,
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub nonce: String,
pub cmd: String,
pub status: String,
pub exit_code: Option<i32>,
pub stdout: String,
pub stderr: String,
}
/// Summary counts from a sovereign scan.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ScanSummary {
pub critical: u32,
pub high: u32,
pub medium: u32,
pub low: u32,
pub info: u32,
}
/// Last scan result for a node.
#[derive(Debug, Clone)]
pub struct LastScan {
pub ts: OffsetDateTime,
pub summary: ScanSummary,
pub findings_real: bool,
pub receipt_hash: String,
}
/// Scheduler configuration for automatic scans.
#[derive(Debug, Clone)]
pub struct SchedulerConfig {
pub scan_interval: Duration,
pub scan_stale: Duration,
pub heartbeat_stale: Duration,
}
/// Node attention status for dashboard.
#[derive(Debug, Clone)]
pub struct NodeAttentionStatus {
pub needs_attention: bool,
pub reasons: Vec<String>,
}
// ============================================================================
// V0.7: SSE Event Bus payloads
// ============================================================================
/// SSE payload for heartbeat events (wire-efficient subset).
#[derive(Debug, Clone, Serialize)]
pub struct HeartbeatEventPayload {
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub node_id: Uuid,
pub hostname: String,
pub os_profile: String,
pub cloudflare_ok: bool,
pub services_ok: bool,
pub vaultmesh_root: String,
pub uptime_secs: Option<u64>,
pub load_1: Option<f32>,
pub load_5: Option<f32>,
pub load_15: Option<f32>,
}
/// SSE payload for scan events.
#[derive(Debug, Clone, Serialize)]
pub struct ScanEventPayload {
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub node_id: Uuid,
pub hostname: String,
pub os_profile: String,
pub summary: ScanSummary,
pub findings_real: bool,
pub receipt_hash: String,
}
/// SSE payload for command events.
#[derive(Debug, Clone, Serialize)]
pub struct CommandEventPayload {
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub node_id: Uuid,
pub hostname: String,
pub os_profile: String,
pub cmd: String,
pub status: String,
pub exit_code: Option<i32>,
pub nonce: String,
}
/// SSE payload for attention state changes.
#[derive(Debug, Clone, Serialize)]
pub struct AttentionEventPayload {
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub node_id: Uuid,
pub hostname: String,
pub os_profile: String,
pub needs_attention: bool,
pub reasons: Vec<String>,
}
/// Server event types for SSE broadcast.
#[derive(Debug, Clone)]
pub enum ServerEvent {
Heartbeat(HeartbeatEventPayload),
Scan(ScanEventPayload),
Command(CommandEventPayload),
Attention(AttentionEventPayload),
/// V0.7.2: Generic envelope for comms events (note, incident, ack, tag, resolve)
Envelope(EventEnvelope),
}
// ============================================================================
// V0.7.2: Communication Layer - EventEnvelope
// ============================================================================
/// Canonical message format for all comms events.
/// Used for notes, incidents, acknowledgements, tags, and resolutions.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EventEnvelope {
/// Unique event ID (server-assigned)
pub id: Uuid,
/// Event kind: "note", "incident", "ack", "tag", "resolve"
pub kind: String,
/// Timestamp (server-assigned)
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
/// Node this event relates to (optional for global events)
#[serde(skip_serializing_if = "Option::is_none")]
pub node_id: Option<Uuid>,
/// Author: "operator", "system", "vm-copilot", "scheduler", agent name
pub author: String,
/// Structured payload (kind-specific)
pub body: serde_json::Value,
}
/// Log entry wrapper for EventEnvelope (versioned for future compatibility).
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EventEnvelopeLogEntry {
pub version: u8,
pub event: EventEnvelope,
}
/// Maximum number of envelopes to keep in memory.
const MAX_ENVELOPES_IN_MEMORY: usize = 500;
// ============================================================================
// V0.6: Append-only log events
// ============================================================================
/// Event logged when a heartbeat is received.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct HeartbeatEvent {
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub node_id: Uuid,
pub hostname: String,
pub os_profile: String,
pub heartbeat: NodeHeartbeat,
}
/// Event logged when a sovereign-scan completes successfully.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ScanEvent {
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub node_id: Uuid,
pub os_profile: String,
pub summary: ScanSummary,
pub findings_real: bool,
pub receipt_hash: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub receipt_path: Option<String>,
}
/// Event logged when any command result is reported.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CommandEvent {
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub node_id: Uuid,
pub cmd: String,
pub status: String,
pub exit_code: Option<i32>,
pub nonce: String,
}
/// Append-only log sink for durable event storage.
#[derive(Clone)]
pub struct LogSink {
pub dir: PathBuf,
}
impl LogSink {
/// Create a new LogSink, initializing the directory if needed.
pub fn new_from_env() -> std::io::Result<Self> {
let dir = std::env::var("VAULTMESH_LOG_DIR")
.map(PathBuf::from)
.unwrap_or_else(|_| PathBuf::from("/var/lib/vaultmesh/cc-logs"));
std::fs::create_dir_all(&dir)?;
Ok(LogSink { dir })
}
/// Get the full path for a log file.
pub fn file_path(&self, name: &str) -> PathBuf {
self.dir.join(name)
}
/// Append a JSON line to a log file.
pub fn append_json_line<T: Serialize>(&self, file: &str, value: &T) -> std::io::Result<()> {
let path = self.file_path(file);
let mut f = OpenOptions::new()
.create(true)
.append(true)
.open(path)?;
serde_json::to_writer(&mut f, value).map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?;
f.write_all(b"\n")?;
f.flush()?;
Ok(())
}
}
/// Command policy for allowed commands per profile.
#[derive(Clone)]
pub struct CommandPolicy {
/// Per-profile allowed command names.
pub per_profile: HashMap<String, HashSet<String>>,
/// Global fallback set (applied if no per-profile entry).
pub global_allowed: HashSet<String>,
}
impl CommandPolicy {
pub fn default_policy() -> Self {
let mut global = HashSet::new();
global.insert("service-status".to_string());
global.insert("tail-journal".to_string());
global.insert("sovereign-scan".to_string());
global.insert("restart-service".to_string());
CommandPolicy {
per_profile: HashMap::new(),
global_allowed: global,
}
}
pub fn is_allowed(&self, profile: &str, cmd: &str) -> bool {
if let Some(set) = self.per_profile.get(profile) {
set.contains(cmd)
} else {
self.global_allowed.contains(cmd)
}
}
}
/// Shared application state.
#[derive(Clone)]
pub struct AppState {
pub nodes: Arc<RwLock<HashMap<Uuid, NodeHistory>>>,
pub commands_pending: Arc<RwLock<HashMap<Uuid, VecDeque<SignedCommand>>>>,
pub command_results: Arc<RwLock<HashMap<Uuid, VecDeque<CommandResult>>>>,
pub last_scans: Arc<RwLock<HashMap<Uuid, LastScan>>>,
pub signing_key: Arc<SigningKey>,
pub public_key_b64: String,
pub scheduler_cfg: SchedulerConfig,
pub policy: CommandPolicy,
/// V0.6: Append-only log sink for persistence
pub logs: LogSink,
/// V0.7: SSE event broadcast channel
pub events_tx: broadcast::Sender<ServerEvent>,
/// V0.7.2: In-memory event envelope store
pub events: Arc<RwLock<Vec<EventEnvelope>>>,
}
impl AppState {
pub fn new(signing_key: SigningKey) -> Self {
use base64::Engine;
let pk_bytes = signing_key.verifying_key().to_bytes();
let public_key_b64 = base64::engine::general_purpose::STANDARD.encode(pk_bytes);
// Parse scheduler config from env (with sane defaults)
let scan_interval_hours = std::env::var("VAULTMESH_SCAN_INTERVAL_HOURS")
.ok()
.and_then(|s| s.parse::<i64>().ok())
.unwrap_or(24);
let scan_stale_hours = std::env::var("VAULTMESH_SCAN_STALE_HOURS")
.ok()
.and_then(|s| s.parse::<i64>().ok())
.unwrap_or(48);
let heartbeat_stale_minutes = std::env::var("VAULTMESH_HEARTBEAT_STALE_MINUTES")
.ok()
.and_then(|s| s.parse::<i64>().ok())
.unwrap_or(10);
let scheduler_cfg = SchedulerConfig {
scan_interval: Duration::hours(scan_interval_hours),
scan_stale: Duration::hours(scan_stale_hours),
heartbeat_stale: Duration::minutes(heartbeat_stale_minutes),
};
tracing::info!(
"Scheduler config: scan_interval={}h, scan_stale={}h, heartbeat_stale={}m",
scan_interval_hours,
scan_stale_hours,
heartbeat_stale_minutes
);
// V0.6: Initialize log sink
let logs = LogSink::new_from_env()
.expect("failed to initialize log directory");
tracing::info!("Log directory: {:?}", logs.dir);
// V0.7: Initialize SSE broadcast channel
let (events_tx, _) = broadcast::channel(1024);
AppState {
nodes: Arc::new(RwLock::new(HashMap::new())),
commands_pending: Arc::new(RwLock::new(HashMap::new())),
command_results: Arc::new(RwLock::new(HashMap::new())),
last_scans: Arc::new(RwLock::new(HashMap::new())),
signing_key: Arc::new(signing_key),
public_key_b64,
scheduler_cfg,
policy: CommandPolicy::default_policy(),
logs,
events_tx,
events: Arc::new(RwLock::new(Vec::new())),
}
}
/// Check if a command is allowed by policy.
pub fn is_command_allowed(&self, profile: &str, node_id: &Uuid, cmd: &str) -> bool {
let allowed = self.policy.is_allowed(profile, cmd);
if !allowed {
tracing::warn!(
"policy: command {} denied for node {} (profile={})",
cmd,
node_id,
profile
);
}
allowed
}
// ========================================================================
// V0.7: SSE Event Bus
// ========================================================================
/// Publish an event to all SSE subscribers.
pub fn publish(&self, event: ServerEvent) {
if let Err(e) = self.events_tx.send(event) {
tracing::debug!("no SSE listeners: {e}");
}
}
/// Recompute and publish attention status for a node.
pub async fn recompute_and_publish_attention(&self, node_id: Uuid) {
let now = OffsetDateTime::now_utc();
let latest = self.list_latest().await;
let last_scans = self.list_last_scans().await;
if let Some(hb) = latest.iter().find(|h| h.node_id == node_id) {
let attn = compute_attention(now, hb, last_scans.get(&node_id), &self.scheduler_cfg);
let payload = AttentionEventPayload {
ts: now,
node_id: hb.node_id,
hostname: hb.hostname.clone(),
os_profile: hb.os_profile.clone(),
needs_attention: attn.needs_attention,
reasons: attn.reasons,
};
self.publish(ServerEvent::Attention(payload));
}
}
// ========================================================================
// V0.7.2: Communication Layer
// ========================================================================
/// Record an EventEnvelope: log to JSONL, store in memory, broadcast via SSE.
pub async fn record_envelope(&self, ev: EventEnvelope) {
// 1) Log to JSONL
let entry = EventEnvelopeLogEntry { version: 1, event: ev.clone() };
if let Err(e) = self.logs.append_json_line("events.jsonl", &entry) {
tracing::warn!("failed to append events log: {e}");
}
// 2) In-memory store with cap
{
let mut events = self.events.write().await;
events.push(ev.clone());
if events.len() > MAX_ENVELOPES_IN_MEMORY {
let overflow = events.len() - MAX_ENVELOPES_IN_MEMORY;
events.drain(0..overflow);
}
}
// 3) SSE broadcast
self.publish(ServerEvent::Envelope(ev));
}
/// Replay envelopes from events.jsonl on startup.
pub async fn replay_envelopes(&self) -> std::io::Result<()> {
let path = self.logs.file_path("events.jsonl");
if !path.exists() {
return Ok(());
}
let file = std::fs::File::open(&path)?;
let reader = BufReader::new(file);
let mut count = 0;
for line in reader.lines() {
let line = line?;
if line.trim().is_empty() {
continue;
}
let entry: EventEnvelopeLogEntry = match serde_json::from_str(&line) {
Ok(v) => v,
Err(e) => {
tracing::warn!("invalid events line: {e}");
continue;
}
};
// Store in memory (no broadcast during replay)
let mut events = self.events.write().await;
events.push(entry.event);
if events.len() > MAX_ENVELOPES_IN_MEMORY {
let overflow = events.len() - MAX_ENVELOPES_IN_MEMORY;
events.drain(0..overflow);
}
count += 1;
}
tracing::info!("replayed {} events from {:?}", count, path);
Ok(())
}
/// Insert a new heartbeat for a node, updating history.
/// Logs the event to heartbeats.jsonl.
pub async fn upsert_heartbeat(&self, hb: NodeHeartbeat) {
// V0.6: Log first
let event = HeartbeatEvent {
ts: hb.timestamp,
node_id: hb.node_id,
hostname: hb.hostname.clone(),
os_profile: hb.os_profile.clone(),
heartbeat: hb.clone(),
};
if let Err(e) = self.logs.append_json_line("heartbeats.jsonl", &event) {
tracing::warn!("failed to append heartbeat log: {e}");
}
self.upsert_heartbeat_no_log(hb).await;
}
/// Insert heartbeat without logging (used during replay).
async fn upsert_heartbeat_no_log(&self, hb: NodeHeartbeat) {
let mut nodes = self.nodes.write().await;
let entry = nodes.entry(hb.node_id).or_insert_with(|| NodeHistory {
latest: hb.clone(),
history: VecDeque::new(),
});
entry.history.push_front(entry.latest.clone());
if entry.history.len() > MAX_HEARTBEATS_PER_NODE {
entry.history.pop_back();
}
entry.latest = hb;
}
/// Get a snapshot of all latest heartbeats.
pub async fn list_latest(&self) -> Vec<NodeHeartbeat> {
let nodes = self.nodes.read().await;
nodes.values().map(|h| h.latest.clone()).collect()
}
/// Queue a signed command for a node.
pub async fn queue_command(&self, cmd: SignedCommand) {
let mut pending = self.commands_pending.write().await;
let entry = pending
.entry(cmd.payload.node_id)
.or_insert_with(VecDeque::new);
entry.push_back(cmd);
}
/// Take (and clear) all pending commands for a node.
pub async fn take_pending_commands(&self, node_id: Uuid) -> Vec<SignedCommand> {
let mut pending = self.commands_pending.write().await;
if let Some(mut queue) = pending.remove(&node_id) {
queue.drain(..).collect()
} else {
Vec::new()
}
}
/// Record a command result.
/// Logs the event to commands.jsonl.
pub async fn record_command_result(&self, result: CommandResult) {
// V0.6: Log command event
let event = CommandEvent {
ts: result.ts,
node_id: result.node_id,
cmd: result.cmd.clone(),
status: result.status.clone(),
exit_code: result.exit_code,
nonce: result.nonce.clone(),
};
if let Err(e) = self.logs.append_json_line("commands.jsonl", &event) {
tracing::warn!("failed to append command log: {e}");
}
self.record_command_result_no_log(result).await;
}
/// Record command result without logging (used during replay).
async fn record_command_result_no_log(&self, result: CommandResult) {
let mut results = self.command_results.write().await;
let entry = results
.entry(result.node_id)
.or_insert_with(VecDeque::new);
entry.push_front(result);
if entry.len() > MAX_COMMAND_RECORDS_PER_NODE {
entry.pop_back();
}
}
/// Get recent command results for a node.
pub async fn list_command_results(&self, node_id: Uuid) -> Vec<CommandResult> {
let results = self.command_results.read().await;
results
.get(&node_id)
.map(|q| q.iter().cloned().collect())
.unwrap_or_default()
}
/// Update the last scan result for a node.
/// Note: ScanEvent logging is handled in routes.rs where we have os_profile context.
pub async fn update_last_scan(&self, node_id: Uuid, scan: LastScan) {
let mut scans = self.last_scans.write().await;
scans.insert(node_id, scan);
}
/// Update last scan without logging (used during replay).
pub async fn update_last_scan_no_log(&self, node_id: Uuid, scan: LastScan) {
let mut scans = self.last_scans.write().await;
scans.insert(node_id, scan);
}
/// Get the last scan result for a node.
pub async fn get_last_scan(&self, node_id: &Uuid) -> Option<LastScan> {
let scans = self.last_scans.read().await;
scans.get(node_id).cloned()
}
/// Get all last scan results.
pub async fn list_last_scans(&self) -> HashMap<Uuid, LastScan> {
let scans = self.last_scans.read().await;
scans.clone()
}
// ========================================================================
// V0.6: Log replay on startup
// ========================================================================
/// Replay all logs to reconstruct state on startup.
pub async fn replay_from_logs(&self) {
if let Err(e) = self.replay_heartbeats().await {
tracing::warn!("failed to replay heartbeats: {e}");
}
if let Err(e) = self.replay_scans().await {
tracing::warn!("failed to replay scans: {e}");
}
if let Err(e) = self.replay_commands().await {
tracing::warn!("failed to replay commands: {e}");
}
// V0.7.2: Replay event envelopes
if let Err(e) = self.replay_envelopes().await {
tracing::warn!("failed to replay envelopes: {e}");
}
}
async fn replay_heartbeats(&self) -> std::io::Result<()> {
let path = self.logs.file_path("heartbeats.jsonl");
if !path.exists() {
return Ok(());
}
let file = std::fs::File::open(&path)?;
let reader = BufReader::new(file);
let mut count = 0;
for line in reader.lines() {
let line = line?;
if line.trim().is_empty() {
continue;
}
let event: HeartbeatEvent = match serde_json::from_str(&line) {
Ok(v) => v,
Err(e) => {
tracing::warn!("invalid heartbeat line: {e}");
continue;
}
};
// Reconstruct heartbeat from event
let hb = NodeHeartbeat {
node_id: event.node_id,
hostname: event.hostname,
os_profile: event.os_profile,
cloudflare_ok: event.heartbeat.cloudflare_ok,
services_ok: event.heartbeat.services_ok,
vaultmesh_root: event.heartbeat.vaultmesh_root,
timestamp: event.ts,
metrics: event.heartbeat.metrics,
};
self.upsert_heartbeat_no_log(hb).await;
count += 1;
}
tracing::info!("replayed {} heartbeats from {:?}", count, path);
Ok(())
}
async fn replay_scans(&self) -> std::io::Result<()> {
let path = self.logs.file_path("scans.jsonl");
if !path.exists() {
return Ok(());
}
let file = std::fs::File::open(&path)?;
let reader = BufReader::new(file);
let mut count = 0;
for line in reader.lines() {
let line = line?;
if line.trim().is_empty() {
continue;
}
let event: ScanEvent = match serde_json::from_str(&line) {
Ok(v) => v,
Err(e) => {
tracing::warn!("invalid scan line: {e}");
continue;
}
};
let scan = LastScan {
ts: event.ts,
summary: event.summary,
findings_real: event.findings_real,
receipt_hash: event.receipt_hash,
};
self.update_last_scan_no_log(event.node_id, scan).await;
count += 1;
}
tracing::info!("replayed {} scans from {:?}", count, path);
Ok(())
}
async fn replay_commands(&self) -> std::io::Result<()> {
let path = self.logs.file_path("commands.jsonl");
if !path.exists() {
return Ok(());
}
let file = std::fs::File::open(&path)?;
let reader = BufReader::new(file);
let mut count = 0;
for line in reader.lines() {
let line = line?;
if line.trim().is_empty() {
continue;
}
let event: CommandEvent = match serde_json::from_str(&line) {
Ok(v) => v,
Err(e) => {
tracing::warn!("invalid command line: {e}");
continue;
}
};
// Build minimal CommandResult for history (stdout/stderr not stored in log)
let result = CommandResult {
node_id: event.node_id,
ts: event.ts,
nonce: event.nonce,
cmd: event.cmd,
status: event.status,
exit_code: event.exit_code,
stdout: String::new(),
stderr: String::new(),
};
self.record_command_result_no_log(result).await;
count += 1;
}
tracing::info!("replayed {} commands from {:?}", count, path);
Ok(())
}
}
// ============================================================================
// V0.7: Attention computation (moved from routes.rs for SSE access)
// ============================================================================
/// Compute attention status for a node based on current state.
pub fn compute_attention(
now: OffsetDateTime,
hb: &NodeHeartbeat,
scan: Option<&LastScan>,
cfg: &SchedulerConfig,
) -> NodeAttentionStatus {
let mut reasons = Vec::new();
// Heartbeat staleness
if now - hb.timestamp > cfg.heartbeat_stale {
reasons.push("heartbeat_stale".to_string());
}
// Scan staleness / findings
match scan {
None => reasons.push("never_scanned".to_string()),
Some(s) => {
if now - s.ts > cfg.scan_stale {
reasons.push("scan_stale".to_string());
}
if s.summary.critical > 0 {
reasons.push("critical_findings".to_string());
} else if s.summary.high > 0 {
reasons.push("high_findings".to_string());
}
}
}
// Service flags
if !hb.cloudflare_ok {
reasons.push("cloudflare_down".to_string());
}
if !hb.services_ok {
reasons.push("services_down".to_string());
}
NodeAttentionStatus {
needs_attention: !reasons.is_empty(),
reasons,
}
}

182
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,182 @@
# VaultMesh Command Center Architecture
## Overview
The VaultMesh Command Center is a minimal control plane for monitoring and managing VaultMesh nodes. It consists of two components:
1. **Command Center (CC)** - Central Rust/Axum web server
2. **Node Agent** - Lightweight daemon running on each VaultMesh node
## Communication Model
```
┌─────────────────┐ HTTPS (Cloudflare Tunnel) ┌─────────────────┐
│ Node Agent │ ─────────────────────────────────▶│ Command Center │
│ (ArchVault) │ POST /api/agent/heartbeat │ (Axum) │
└─────────────────┘ └─────────────────┘
┌─────────────────┐ HTTPS (Cloudflare Tunnel) │
│ Node Agent │ ─────────────────────────────────────────┘
│ (DebianVault) │ POST /api/agent/heartbeat
└─────────────────┘
┌─────────────────────────────────┐
│ Cloudflare Access │
│ (Zero Trust Authentication) │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Admin Browser │
│ GET / (Dashboard) │
└─────────────────────────────────┘
```
## Security Model
### Outbound-Only from Nodes
Nodes only make **outbound** connections to the Command Center. No inbound ports are opened on VaultMesh nodes for CC communication.
### Cloudflare Zero Trust
- **Cloudflare Tunnel**: CC is exposed via `cloudflared` tunnel, not a public IP
- **Cloudflare Access**: Admin dashboard protected by Cloudflare Access policies
- **Agent Authentication**: Future: signed JWTs or shared secrets for agent auth
### Network Flow
1. Node agent wakes up every N seconds
2. Agent collects local health metrics
3. Agent POSTs heartbeat JSON to CC via Cloudflare Tunnel
4. CC stores heartbeat in memory (future: SQLite)
5. Admin views dashboard via Cloudflare Access-protected URL
## Data Model
### NodeHeartbeat
```json
{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"hostname": "vault-node-01",
"os_profile": "ArchVault",
"cloudflare_ok": true,
"services_ok": true,
"vaultmesh_root": "/var/lib/vaultmesh",
"timestamp": "2024-01-15T10:30:00Z"
}
```
### Storage (V1)
In-memory `HashMap<Uuid, NodeHeartbeat>` wrapped in `Arc<RwLock<>>`.
Nodes are keyed by `node_id`. Each heartbeat overwrites the previous entry for that node.
## Component Details
### Command Center
| Component | Technology |
|-----------|------------|
| Web Framework | Axum 0.7 |
| Async Runtime | Tokio |
| Serialization | Serde + serde_json |
| Logging | tracing + tracing-subscriber |
| HTML | Server-rendered (HTMX-ready) |
### Node Agent
| Component | Technology |
|-----------|------------|
| HTTP Client | reqwest (rustls-tls) |
| Async Runtime | Tokio |
| System Checks | systemctl calls |
| Config | Environment variables |
## Deployment
### Command Center Deployment
1. Build: `cargo build --release -p vaultmesh-command-center`
2. Install binary to `/usr/local/bin/`
3. Install systemd unit
4. Configure Cloudflare Tunnel to `http://127.0.0.1:8088`
5. Configure Cloudflare Access policies
### Node Agent Deployment
1. Build: `cargo build --release -p vaultmesh-node-agent`
2. Install binary to `/usr/local/bin/`
3. Create `/etc/vaultmesh/agent.env` with:
- `VAULTMESH_CC_URL=https://cc.your-domain.example`
- `VAULTMESH_OS_PROFILE=ArchVault`
4. Install systemd unit
## Version History
### V0.7.2: Communication Layer (Current)
- Unified `EventEnvelope` as canonical message format for comms events.
- `POST /api/events` - Ingest endpoint for operators, agents, and bots.
- `GET /api/events` - Query endpoint with since/kind/node_id/limit filtering.
- New event kinds: `note`, `incident`, `ack`, `tag`, `resolve`.
- SSE broadcast of envelope events by their `kind` name.
- Durable persistence to `events.jsonl` with replay on startup.
- Memory-bounded in-memory store (500 most recent envelopes).
### V0.7.1: Mission Console
- NASA-style 3-panel dashboard at `GET /console`.
- Global Mission Bar with fleet KPIs (Total/Healthy/Attention/Critical).
- Left panel: Node list with status pills (OK/ATTN/CRIT), live heartbeat glow.
- Center panel: Selected node telemetry + per-node event timeline.
- Right panel: Attention summary, scan findings, global event feed.
- Full SSE wiring for real-time DOM updates without page refresh.
### V0.7: SSE Event Bus
- Real-time event streaming via `GET /events` (Server-Sent Events).
- Named events: `heartbeat`, `scan`, `command`, `attention`.
- Broadcast channel distributes events to all connected SSE clients.
- Keepalive every 15s to prevent connection timeouts.
- JS probe in dashboard for console.log debugging.
### V0.6.1: Log Tools
- CLI subcommands for querying JSONL event logs.
- `logs view` with filters: --kind, --node, --since, --min-severity, --limit.
- `logs tail` for real-time log following.
- `logs stats` for per-node event statistics.
### V0.6: Append-Only Persistence
- Event logging to JSONL files (heartbeats, scans, commands).
- State replay on startup from `$VAULTMESH_LOG_DIR`.
- Foundation for V0.8 Ledger Bridge (Merkle over logs).
### V0.5: Fleet Orchestrator
- Background scheduler for autonomous scans.
- Attention model with staleness / drift detection.
- Command policy enforcement (per-profile allowlists).
### V0.4: Commands & Sovereign Scan
- Push commands to nodes (Ed25519 signed).
- Command receipts with proof chain.
- Sovereign scan execution and reporting.
### V0.3: Signed Commands
- Ed25519 key generation and command signing.
- Agent signature verification.
- Nonce-based replay protection.
### V0.2: Node Metrics
- System metrics in heartbeat (load, memory, disk).
- Heartbeat history tracking.
## Future Extensions
### V0.7.1: Mission Console
- NASA-inspired dashboard layout with node tree sidebar.
- Live row updates via SSE (no page refresh).
- Attention summary panel with fleet health overview.
### V0.8: Ledger Bridge
- Merkle tree over `sovereign-scans.jsonl`.
- `ROOT.txt` + `PROOFCHAIN.json` for VaultMesh proof layer.

83
docs/EVENT_GENERATION.md Normal file
View File

@@ -0,0 +1,83 @@
# VaultMesh Command Center: Event Generation Mechanism
## Overview
The VaultMesh Command Center generates events through a sophisticated, multi-layered mechanism designed for real-time monitoring and fleet management.
## Event Types
### 1. Heartbeat Events
- **Trigger**: Node heartbeat submission
- **Payload Includes**:
* Timestamp
* Node ID
* Hostname
* OS Profile
* Cloudflare Status
* Services Status
* VaultMesh Root Path
* System Metrics (uptime, load averages)
### 2. Scan Events
- **Trigger**: Successful scan result submission
- **Payload Includes**:
* Timestamp
* Node ID
* Hostname
* OS Profile
* Scan Summary (critical/high/medium/low findings)
* Real/Mock Findings Flag
* Receipt Hash
### 3. Command Events
- **Trigger**: Command execution result
- **Payload Includes**:
* Timestamp
* Node ID
* Hostname
* OS Profile
* Command Name
* Execution Status
* Exit Code
* Nonce (for replay protection)
## Event Generation Flow
1. **Data Collection**
- Node agents submit heartbeats and scan results
- Command results are reported back to the Command Center
2. **Event Processing**
- Raw data is transformed into structured event payloads
- Events are published to a broadcast channel
- Server-Sent Events (SSE) distribute events to connected clients
3. **State Management**
- Events trigger state updates (node history, last scan, etc.)
- Attention status is recomputed based on new events
## Advanced Features
- **Automatic Scan Scheduling**
- Periodic scans triggered based on node profile and last scan timestamp
- Configurable scan intervals
- **Attention Computation**
- Dynamic assessment of node health
- Tracks critical findings, heartbeat staleness, service status
## Security Considerations
- Ed25519 key signing for commands
- Nonce-based replay protection
- Configurable command policies per node profile
## Performance Characteristics
- In-memory event storage (500 most recent events)
- Optional JSONL log persistence
- Low-overhead event broadcasting
## Extensibility
The event system supports easy addition of new event types and payloads through the `ServerEvent` enum and corresponding payload structures.

170
docs/EVENT_PROCESSING.md Normal file
View File

@@ -0,0 +1,170 @@
# VaultMesh Command Center: Event Processing Architecture
## Overview
The Command Center implements a sophisticated, multi-layered event processing system designed for robust, real-time fleet management.
## Key Components
### 1. Event Types
- `NodeHeartbeat`: Node status updates
- `ScanEvent`: Scan findings and results
- `CommandEvent`: Command execution outcomes
- `EventEnvelope`: Generic communication events
### 2. Processing Stages
#### a. Ingestion
- Raw event data received via HTTP endpoints
- Validated and transformed into structured payloads
- Logged to append-only JSONL files for durability
#### b. State Update
- In-memory state updated with latest information
- Maintain sliding window of recent events (max 50 per node)
- Compute derived states (attention status, last scan)
#### c. Broadcasting
- Events published via Server-Sent Events (SSE)
- Broadcast to all connected clients
- Low-latency, real-time updates
## Event Processing Workflow
### Heartbeat Processing
1. Receive heartbeat data
2. Log to `heartbeats.jsonl`
3. Update node history
4. Publish heartbeat event
5. Recompute node attention status
6. Broadcast attention event
```rust
pub async fn upsert_heartbeat(&self, hb: NodeHeartbeat) {
// Log event
let event = HeartbeatEvent { ... };
self.logs.append_json_line("heartbeats.jsonl", &event);
// Update in-memory state
self.upsert_heartbeat_no_log(hb).await;
}
```
### Scan Result Processing
1. Receive scan results
2. Log to `scans.jsonl`
3. Update last scan information
4. Publish scan event
5. Recompute node attention status
6. Broadcast attention event
```rust
pub async fn update_last_scan(&self, node_id: Uuid, scan: LastScan) {
// Update scan history
let mut scans = self.last_scans.write().await;
scans.insert(node_id, scan);
}
```
### Command Result Processing
1. Receive command result
2. Log to `commands.jsonl`
3. Store command history
4. Publish command event
5. Optionally trigger additional actions
```rust
pub async fn record_command_result(&self, result: CommandResult) {
// Log command event
let event = CommandEvent { ... };
self.logs.append_json_line("commands.jsonl", &event);
// Update command result history
self.record_command_result_no_log(result).await;
}
```
## Attention Computation
The system dynamically computes a node's attention status based on multiple factors:
- Heartbeat staleness
- Scan staleness
- Scan findings severity
- Service status
- Cloudflare tunnel status
```rust
pub fn compute_attention(
now: OffsetDateTime,
hb: &NodeHeartbeat,
scan: Option<&LastScan>,
cfg: &SchedulerConfig,
) -> NodeAttentionStatus {
let mut reasons = Vec::new();
// Check heartbeat age
if now - hb.timestamp > cfg.heartbeat_stale {
reasons.push("heartbeat_stale");
}
// Check scan status
match scan {
None => reasons.push("never_scanned"),
Some(s) => {
if now - s.ts > cfg.scan_stale {
reasons.push("scan_stale");
}
if s.summary.critical > 0 {
reasons.push("critical_findings");
}
}
}
// Check service flags
if !hb.cloudflare_ok {
reasons.push("cloudflare_down");
}
NodeAttentionStatus {
needs_attention: !reasons.is_empty(),
reasons,
}
}
```
## Persistence and State Replay
### Log Replay Mechanism
- On startup, reconstruct in-memory state from JSONL logs
- Replay events in chronological order
- Recreate node history, scan results, and command results
```rust
pub async fn replay_from_logs(&self) {
self.replay_heartbeats().await;
self.replay_scans().await;
self.replay_commands().await;
self.replay_envelopes().await;
}
```
## Performance Characteristics
- In-memory event store: 500 most recent events
- Append-only logging for durability
- Non-blocking event processing
- Low-overhead broadcasting
## Security Considerations
- Ed25519 key signing for commands
- Configurable command policies
- Nonce-based replay protection
- No sensitive data stored in logs
## Extensibility
- Easy to add new event types
- Flexible attention computation
- Modular event processing pipeline

163
docs/NODE_AGENT_CONTRACT.md Normal file
View File

@@ -0,0 +1,163 @@
# Node Agent Contract
This document defines the API contract between VaultMesh Node Agents and the Command Center.
## Heartbeat Endpoint
### Request
```
POST /api/agent/heartbeat
Content-Type: application/json
```
### Heartbeat Schema
```json
{
"node_id": "UUID v4",
"hostname": "string",
"os_profile": "string",
"cloudflare_ok": "boolean",
"services_ok": "boolean",
"vaultmesh_root": "string",
"timestamp": "ISO 8601 / RFC 3339"
}
```
### Field Definitions
| Field | Type | Description |
|-------|------|-------------|
| `node_id` | UUID v4 | Unique identifier for this node. Should persist across reboots. |
| `hostname` | String | System hostname (from `hostname::get()` or `/etc/hostname`) |
| `os_profile` | String | VaultMesh profile name: `ArchVault`, `DebianVault`, etc. |
| `cloudflare_ok` | Boolean | `true` if `cloudflared` service is active |
| `services_ok` | Boolean | `true` if VAULTMESH_ROOT exists and is healthy |
| `vaultmesh_root` | String | Path to VAULTMESH_ROOT (e.g., `/var/lib/vaultmesh`) |
| `timestamp` | RFC 3339 | UTC timestamp when heartbeat was generated |
### Response
**Success (200 OK)**:
```json
{
"status": "ok"
}
```
**Error (4xx/5xx)**:
```json
{
"error": "description"
}
```
## Environment Variables
The node agent is configured via environment variables, typically set in `/etc/vaultmesh/agent.env`.
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `VAULTMESH_NODE_ID` | No | Auto-generated UUID v4 | Persistent node identifier |
| `VAULTMESH_CC_URL` | No | `http://127.0.0.1:8088` | Command Center base URL |
| `VAULTMESH_OS_PROFILE` | No | `ArchVault` | OS profile name to report |
| `VAULTMESH_ROOT` | No | `/var/lib/vaultmesh` | Path to check for services_ok |
| `VAULTMESH_HEARTBEAT_SECS` | No | `30` | Seconds between heartbeats |
| `RUST_LOG` | No | `info` | Log level (trace, debug, info, warn, error) |
### Example `/etc/vaultmesh/agent.env`
```bash
VAULTMESH_NODE_ID=550e8400-e29b-41d4-a716-446655440000
VAULTMESH_CC_URL=https://cc.vaultmesh.example
VAULTMESH_OS_PROFILE=ArchVault
VAULTMESH_ROOT=/var/lib/vaultmesh
VAULTMESH_HEARTBEAT_SECS=30
RUST_LOG=info
```
## Node Registration
Nodes self-register on first heartbeat. There is no explicit registration endpoint.
When the Command Center receives a heartbeat with a new `node_id`, it creates a new entry. Subsequent heartbeats update the existing entry.
### Node ID Persistence
For consistent tracking, the `VAULTMESH_NODE_ID` should be persisted. Options:
1. **Environment file**: Set in `/etc/vaultmesh/agent.env`
2. **Machine ID**: Could derive from `/etc/machine-id`
3. **Auto-generated**: If not set, agent generates a new UUID on each start (not recommended for production)
**Recommended**: Generate a UUID once during node bootstrap and store in `agent.env`:
```bash
# During node bootstrap
echo "VAULTMESH_NODE_ID=$(uuidgen)" >> /etc/vaultmesh/agent.env
```
## Health Checks
### cloudflare_ok
The agent runs:
```bash
systemctl is-active --quiet cloudflared
```
Returns `true` if exit code is 0 (service active).
### services_ok
The agent checks if `VAULTMESH_ROOT` exists and is a directory:
```rust
std::path::Path::new(vaultmesh_root).is_dir()
```
Future versions may add additional checks:
- Disk space
- Key services running
- ProofChain integrity
## Error Handling
### Network Errors
If the agent cannot reach the Command Center:
- Log error at WARN/ERROR level
- Sleep for heartbeat interval
- Retry on next cycle
No exponential backoff in V1. The agent will retry every `VAULTMESH_HEARTBEAT_SECS` seconds indefinitely.
### Invalid Response
If CC returns non-2xx status:
- Log warning with status code
- Continue normal operation
- Retry on next cycle
## Security Considerations
### Transport Security
- Agent should connect to CC via HTTPS (Cloudflare Tunnel)
- `reqwest` configured with `rustls-tls` (no OpenSSL dependency)
### Authentication (Future)
V1 has no agent authentication. Future versions may add:
- Signed JWTs
- Shared secrets
- mTLS
### Data Sensitivity
Heartbeat data is low-sensitivity:
- No secrets or credentials
- No PII
- No file contents
The `vaultmesh_root` path and hostname are the most identifying fields.

19
node-agent/Cargo.toml Normal file
View File

@@ -0,0 +1,19 @@
[package]
name = "vaultmesh-node-agent"
version = "0.1.0"
edition = "2021"
[dependencies]
reqwest = { version = "0.12", features = ["json", "rustls-tls"], default-features = false }
tokio = { version = "1", features = ["rt-multi-thread", "macros", "time", "process"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
uuid = { version = "1", features = ["v4", "serde"] }
time = { version = "0.3", features = ["serde", "macros", "formatting", "parsing"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
hostname = "0.4"
anyhow = "1"
ed25519-dalek = "2"
base64 = "0.22"
blake3 = "1"

52
node-agent/src/config.rs Normal file
View File

@@ -0,0 +1,52 @@
use std::env;
use uuid::Uuid;
/// Agent configuration loaded from environment variables.
#[derive(Debug, Clone)]
pub struct Config {
pub node_id: Uuid,
pub cc_url: String,
pub os_profile: String,
pub vaultmesh_root: String,
pub heartbeat_secs: u64,
}
impl Config {
/// Load configuration from environment variables.
///
/// | Variable | Default |
/// |---------------------------|----------------------------|
/// | VAULTMESH_NODE_ID | auto-generated UUID |
/// | VAULTMESH_CC_URL | http://127.0.0.1:8088 |
/// | VAULTMESH_OS_PROFILE | ArchVault |
/// | VAULTMESH_ROOT | /var/lib/vaultmesh |
/// | VAULTMESH_HEARTBEAT_SECS | 30 |
pub fn from_env() -> Self {
let node_id = env::var("VAULTMESH_NODE_ID")
.ok()
.and_then(|s| s.parse().ok())
.unwrap_or_else(Uuid::new_v4);
let cc_url = env::var("VAULTMESH_CC_URL")
.unwrap_or_else(|_| "http://127.0.0.1:8088".into());
let os_profile = env::var("VAULTMESH_OS_PROFILE")
.unwrap_or_else(|_| "ArchVault".into());
let vaultmesh_root = env::var("VAULTMESH_ROOT")
.unwrap_or_else(|_| "/var/lib/vaultmesh".into());
let heartbeat_secs = env::var("VAULTMESH_HEARTBEAT_SECS")
.ok()
.and_then(|s| s.parse().ok())
.unwrap_or(30);
Self {
node_id,
cc_url,
os_profile,
vaultmesh_root,
heartbeat_secs,
}
}
}

818
node-agent/src/main.rs Normal file
View File

@@ -0,0 +1,818 @@
mod config;
use crate::config::Config;
use anyhow::{bail, Result};
use ed25519_dalek::{Signature, Verifier, VerifyingKey};
use serde::{Deserialize, Serialize};
use std::{fs, path::Path, process::Command as StdCommand};
use time::OffsetDateTime;
use tokio::time::{sleep, Duration};
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
use uuid::Uuid;
/// Metrics collected from the node.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct NodeMetrics {
pub uptime_seconds: Option<u64>,
pub load1: Option<f32>,
pub load5: Option<f32>,
pub load15: Option<f32>,
pub mem_total_mb: Option<u64>,
pub mem_used_mb: Option<u64>,
pub disk_root_total_gb: Option<f32>,
pub disk_root_used_gb: Option<f32>,
}
/// Heartbeat data sent to the Command Center.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NodeHeartbeat {
pub node_id: Uuid,
pub hostname: String,
pub os_profile: String,
pub cloudflare_ok: bool,
pub services_ok: bool,
pub vaultmesh_root: String,
#[serde(with = "time::serde::rfc3339")]
pub timestamp: OffsetDateTime,
pub metrics: NodeMetrics,
}
/// Command payload from CC.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CommandPayload {
pub node_id: Uuid,
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub nonce: String,
pub cmd: String,
pub args: serde_json::Value,
}
/// Signed command from CC.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SignedCommand {
pub payload: CommandPayload,
pub signature: String,
}
/// Command execution result sent back to CC.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CommandResult {
pub node_id: Uuid,
#[serde(with = "time::serde::rfc3339")]
pub ts: OffsetDateTime,
pub nonce: String,
pub cmd: String,
pub status: String,
pub exit_code: Option<i32>,
pub stdout: String,
pub stderr: String,
}
/// Summary of scan findings by severity.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct ScanSummary {
pub critical: u32,
pub high: u32,
pub medium: u32,
pub low: u32,
pub info: u32,
}
impl ScanSummary {
fn merge(&mut self, other: &ScanSummary) {
self.critical += other.critical;
self.high += other.high;
self.medium += other.medium;
self.low += other.low;
self.info += other.info;
}
}
/// Result from running a single scanner.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct ScannerResult {
pub available: bool,
pub ran: bool,
pub version: Option<String>,
pub findings_real: bool,
pub findings: u32,
pub output_file: Option<String>,
pub error: Option<String>,
pub summary: ScanSummary,
}
/// Check if cloudflared service is active via systemctl.
fn check_cloudflared() -> bool {
StdCommand::new("systemctl")
.args(["is-active", "--quiet", "cloudflared"])
.status()
.map(|s| s.success())
.unwrap_or(false)
}
/// Check if essential VaultMesh services are healthy.
fn check_services(vaultmesh_root: &str) -> bool {
Path::new(vaultmesh_root).is_dir()
}
/// Gather system metrics from /proc and df.
fn gather_metrics() -> NodeMetrics {
NodeMetrics {
uptime_seconds: read_uptime_seconds(),
load1: read_loadavg_index(0),
load5: read_loadavg_index(1),
load15: read_loadavg_index(2),
mem_total_mb: read_meminfo_total_mb(),
mem_used_mb: read_meminfo_used_mb(),
disk_root_total_gb: read_disk_root_total_gb(),
disk_root_used_gb: read_disk_root_used_gb(),
}
}
fn read_uptime_seconds() -> Option<u64> {
let contents = fs::read_to_string("/proc/uptime").ok()?;
let first = contents.split_whitespace().next()?;
let secs_str = first.split('.').next().unwrap_or(first);
secs_str.parse().ok()
}
fn read_loadavg_index(idx: usize) -> Option<f32> {
let contents = fs::read_to_string("/proc/loadavg").ok()?;
let mut parts = contents.split_whitespace();
parts.nth(idx)?.parse().ok()
}
fn read_meminfo_total_mb() -> Option<u64> {
read_meminfo_field_kb("MemTotal").map(|kb| kb / 1024)
}
fn read_meminfo_used_mb() -> Option<u64> {
let total_kb = read_meminfo_field_kb("MemTotal")?;
let avail_kb = read_meminfo_field_kb("MemAvailable")?;
let used_kb = total_kb.saturating_sub(avail_kb);
Some(used_kb / 1024)
}
fn read_meminfo_field_kb(field: &str) -> Option<u64> {
let contents = fs::read_to_string("/proc/meminfo").ok()?;
for line in contents.lines() {
if line.starts_with(field) {
let val = line.split_whitespace().nth(1)?;
return val.parse().ok();
}
}
None
}
fn read_disk_root_total_gb() -> Option<f32> {
read_df_root_column(1)
}
fn read_disk_root_used_gb() -> Option<f32> {
read_df_root_column(2)
}
fn read_df_root_column(col_index: usize) -> Option<f32> {
let output = StdCommand::new("df")
.arg("-B1")
.arg("/")
.output()
.ok()?;
if !output.status.success() {
return None;
}
let stdout = String::from_utf8_lossy(&output.stdout);
let mut lines = stdout.lines();
lines.next()?;
let line = lines.next()?;
let cols: Vec<&str> = line.split_whitespace().collect();
let val: f64 = cols.get(col_index)?.parse().ok()?;
Some((val / (1024.0 * 1024.0 * 1024.0)) as f32)
}
/// Load CC public key from file.
fn load_cc_public_key(path: &str) -> Result<VerifyingKey> {
use base64::Engine;
let p = Path::new(path);
if !p.exists() {
bail!("CC public key file not found: {}", path);
}
let contents = fs::read_to_string(p)?.trim().to_string();
let bytes = base64::engine::general_purpose::STANDARD.decode(&contents)?;
let key_bytes: [u8; 32] = bytes
.try_into()
.map_err(|_| anyhow::anyhow!("Public key must be 32 bytes"))?;
Ok(VerifyingKey::from_bytes(&key_bytes)?)
}
/// Verify a signed command.
fn verify_command(cmd: &SignedCommand, pubkey: &VerifyingKey) -> Result<()> {
use base64::Engine;
let payload_bytes = serde_json::to_vec(&cmd.payload)?;
let sig_bytes = base64::engine::general_purpose::STANDARD.decode(&cmd.signature)?;
let sig_arr: [u8; 64] = sig_bytes
.try_into()
.map_err(|_| anyhow::anyhow!("Signature must be 64 bytes"))?;
let signature = Signature::from_bytes(&sig_arr);
pubkey.verify(&payload_bytes, &signature)?;
Ok(())
}
/// Execute a whitelisted command.
fn execute_command(cmd: &SignedCommand, config: &Config) -> CommandResult {
let payload = &cmd.payload;
let allowed = ["restart-service", "service-status", "tail-journal", "sovereign-scan"];
if !allowed.contains(&payload.cmd.as_str()) {
return CommandResult {
node_id: payload.node_id,
ts: OffsetDateTime::now_utc(),
nonce: payload.nonce.clone(),
cmd: payload.cmd.clone(),
status: "rejected".into(),
exit_code: None,
stdout: String::new(),
stderr: format!("Command '{}' not in whitelist", payload.cmd),
};
}
match payload.cmd.as_str() {
"service-status" => {
let service = payload.args["service"].as_str().unwrap_or("cloudflared");
let output = StdCommand::new("systemctl")
.args(["status", service])
.output();
make_result(payload, output)
}
"restart-service" => {
let service = payload.args["service"].as_str().unwrap_or("cloudflared");
let output = StdCommand::new("systemctl")
.args(["restart", service])
.output();
make_result(payload, output)
}
"tail-journal" => {
let unit = payload.args["unit"].as_str().unwrap_or("cloudflared");
let lines = payload.args["lines"].as_u64().unwrap_or(50);
let output = StdCommand::new("journalctl")
.args(["-u", unit, "-n", &lines.to_string(), "--no-pager"])
.output();
make_result(payload, output)
}
"sovereign-scan" => {
run_sovereign_scan(payload, config)
}
_ => CommandResult {
node_id: payload.node_id,
ts: OffsetDateTime::now_utc(),
nonce: payload.nonce.clone(),
cmd: payload.cmd.clone(),
status: "error".into(),
exit_code: None,
stdout: String::new(),
stderr: "Unknown command".into(),
},
}
}
/// Convert command output to CommandResult.
fn make_result(payload: &CommandPayload, output: std::io::Result<std::process::Output>) -> CommandResult {
match output {
Ok(out) => CommandResult {
node_id: payload.node_id,
ts: OffsetDateTime::now_utc(),
nonce: payload.nonce.clone(),
cmd: payload.cmd.clone(),
status: if out.status.success() { "success".into() } else { "failed".into() },
exit_code: out.status.code(),
stdout: String::from_utf8_lossy(&out.stdout).into_owned(),
stderr: String::from_utf8_lossy(&out.stderr).into_owned(),
},
Err(e) => CommandResult {
node_id: payload.node_id,
ts: OffsetDateTime::now_utc(),
nonce: payload.nonce.clone(),
cmd: payload.cmd.clone(),
status: "error".into(),
exit_code: None,
stdout: String::new(),
stderr: format!("Failed to execute: {}", e),
},
}
}
/// Check if a scanner binary is available on the system.
fn scanner_available(name: &str) -> bool {
StdCommand::new("which")
.arg(name)
.output()
.map(|o| o.status.success())
.unwrap_or(false)
}
/// Get scanner version string.
fn get_scanner_version(name: &str) -> Option<String> {
let output = StdCommand::new(name)
.arg("--version")
.output()
.ok()?;
if output.status.success() {
let stdout = String::from_utf8_lossy(&output.stdout);
Some(stdout.lines().next().unwrap_or("unknown").to_string())
} else {
None
}
}
/// Run nuclei scanner or generate mock results.
fn run_nuclei(nonce: &str, profile: &str, receipts_dir: &str) -> ScannerResult {
if !scanner_available("nuclei") {
tracing::info!("nuclei not available, generating mock results");
return generate_mock_nuclei_result(nonce, receipts_dir);
}
let output_file = format!("{}/nuclei-{}.json", receipts_dir, nonce);
let templates = match profile {
"fast" => "cves/",
"deep" => ".",
_ => "cves/,vulnerabilities/",
};
let output = StdCommand::new("nuclei")
.args(["-t", templates, "-silent", "-json", "-o", &output_file])
.args(["-target", "localhost"])
.output();
match output {
Ok(out) if out.status.success() || out.status.code() == Some(0) => {
let summary = parse_nuclei_findings(&output_file);
ScannerResult {
available: true,
ran: true,
version: get_scanner_version("nuclei"),
findings_real: true,
findings: summary.critical + summary.high + summary.medium + summary.low + summary.info,
output_file: Some(format!("nuclei-{}.json", nonce)),
error: None,
summary,
}
}
Ok(out) => ScannerResult {
available: true,
ran: true,
version: get_scanner_version("nuclei"),
findings_real: false,
findings: 0,
output_file: None,
error: Some(format!("nuclei exited with code {:?}: {}",
out.status.code(),
String::from_utf8_lossy(&out.stderr))),
summary: ScanSummary::default(),
},
Err(e) => ScannerResult {
available: true,
ran: false,
version: None,
findings_real: false,
findings: 0,
output_file: None,
error: Some(format!("Failed to run nuclei: {}", e)),
summary: ScanSummary::default(),
},
}
}
/// Generate mock nuclei results when scanner not available.
fn generate_mock_nuclei_result(nonce: &str, receipts_dir: &str) -> ScannerResult {
let output_file = format!("{}/nuclei-{}.json", receipts_dir, nonce);
let mock_findings = serde_json::json!([
{"info": {"severity": "high", "name": "MOCK: Example CVE Detection"}, "host": "localhost", "matched-at": "localhost:80"},
{"info": {"severity": "medium", "name": "MOCK: Outdated Software Version"}, "host": "localhost", "matched-at": "localhost:443"},
{"info": {"severity": "low", "name": "MOCK: Information Disclosure"}, "host": "localhost", "matched-at": "localhost:22"}
]);
fs::write(&output_file, serde_json::to_string_pretty(&mock_findings).unwrap()).ok();
ScannerResult {
available: false,
ran: false,
version: None,
findings_real: false,
findings: 3,
output_file: Some(format!("nuclei-{}.json", nonce)),
error: Some("Scanner not installed - mock data generated".into()),
summary: ScanSummary {
critical: 0,
high: 1,
medium: 1,
low: 1,
info: 0,
},
}
}
/// Parse nuclei JSON output for findings summary.
fn parse_nuclei_findings(output_file: &str) -> ScanSummary {
let mut summary = ScanSummary::default();
if let Ok(contents) = fs::read_to_string(output_file) {
for line in contents.lines() {
if let Ok(finding) = serde_json::from_str::<serde_json::Value>(line) {
if let Some(severity) = finding["info"]["severity"].as_str() {
match severity.to_lowercase().as_str() {
"critical" => summary.critical += 1,
"high" => summary.high += 1,
"medium" => summary.medium += 1,
"low" => summary.low += 1,
"info" => summary.info += 1,
_ => {}
}
}
}
}
}
summary
}
/// Run trivy scanner or generate mock results.
fn run_trivy(nonce: &str, profile: &str, receipts_dir: &str) -> ScannerResult {
if !scanner_available("trivy") {
tracing::info!("trivy not available, generating mock results");
return generate_mock_trivy_result(nonce, receipts_dir);
}
let output_file = format!("{}/trivy-{}.json", receipts_dir, nonce);
let severity = match profile {
"fast" => "CRITICAL,HIGH",
"deep" => "CRITICAL,HIGH,MEDIUM,LOW",
_ => "CRITICAL,HIGH,MEDIUM",
};
let output = StdCommand::new("trivy")
.args(["fs", "/", "--format", "json", "--output", &output_file])
.args(["--severity", severity])
.args(["--scanners", "vuln"])
.output();
match output {
Ok(out) if out.status.success() || out.status.code() == Some(0) => {
let summary = parse_trivy_findings(&output_file);
ScannerResult {
available: true,
ran: true,
version: get_scanner_version("trivy"),
findings_real: true,
findings: summary.critical + summary.high + summary.medium + summary.low + summary.info,
output_file: Some(format!("trivy-{}.json", nonce)),
error: None,
summary,
}
}
Ok(out) => ScannerResult {
available: true,
ran: true,
version: get_scanner_version("trivy"),
findings_real: false,
findings: 0,
output_file: None,
error: Some(format!("trivy exited with code {:?}: {}",
out.status.code(),
String::from_utf8_lossy(&out.stderr))),
summary: ScanSummary::default(),
},
Err(e) => ScannerResult {
available: true,
ran: false,
version: None,
findings_real: false,
findings: 0,
output_file: None,
error: Some(format!("Failed to run trivy: {}", e)),
summary: ScanSummary::default(),
},
}
}
/// Generate mock trivy results when scanner not available.
fn generate_mock_trivy_result(nonce: &str, receipts_dir: &str) -> ScannerResult {
let output_file = format!("{}/trivy-{}.json", receipts_dir, nonce);
let mock_findings = serde_json::json!({
"Results": [{
"Target": "/",
"Vulnerabilities": [
{"VulnerabilityID": "MOCK-CVE-2024-0001", "Severity": "CRITICAL", "Title": "MOCK: Critical System Vulnerability"},
{"VulnerabilityID": "MOCK-CVE-2024-0002", "Severity": "HIGH", "Title": "MOCK: High Risk Package"},
{"VulnerabilityID": "MOCK-CVE-2024-0003", "Severity": "HIGH", "Title": "MOCK: Outdated Dependency"},
{"VulnerabilityID": "MOCK-CVE-2024-0004", "Severity": "MEDIUM", "Title": "MOCK: Medium Risk Finding"},
{"VulnerabilityID": "MOCK-CVE-2024-0005", "Severity": "MEDIUM", "Title": "MOCK: Configuration Issue"}
]
}]
});
fs::write(&output_file, serde_json::to_string_pretty(&mock_findings).unwrap()).ok();
ScannerResult {
available: false,
ran: false,
version: None,
findings_real: false,
findings: 5,
output_file: Some(format!("trivy-{}.json", nonce)),
error: Some("Scanner not installed - mock data generated".into()),
summary: ScanSummary {
critical: 1,
high: 2,
medium: 2,
low: 0,
info: 0,
},
}
}
/// Parse trivy JSON output for findings summary.
fn parse_trivy_findings(output_file: &str) -> ScanSummary {
let mut summary = ScanSummary::default();
if let Ok(contents) = fs::read_to_string(output_file) {
if let Ok(report) = serde_json::from_str::<serde_json::Value>(&contents) {
if let Some(results) = report["Results"].as_array() {
for result in results {
if let Some(vulns) = result["Vulnerabilities"].as_array() {
for vuln in vulns {
if let Some(severity) = vuln["Severity"].as_str() {
match severity.to_uppercase().as_str() {
"CRITICAL" => summary.critical += 1,
"HIGH" => summary.high += 1,
"MEDIUM" => summary.medium += 1,
"LOW" => summary.low += 1,
_ => summary.info += 1,
}
}
}
}
}
}
}
}
summary
}
/// Execute the sovereign-scan command.
fn run_sovereign_scan(payload: &CommandPayload, config: &Config) -> CommandResult {
use time::format_description::well_known::Rfc3339;
let profile = payload.args["profile"].as_str().unwrap_or("default");
let scanners: Vec<&str> = payload.args["scanners"]
.as_array()
.map(|arr| arr.iter().filter_map(|v| v.as_str()).collect())
.unwrap_or_else(|| vec!["nuclei", "trivy"]);
tracing::info!("Running sovereign-scan: profile={}, scanners={:?}", profile, scanners);
// Create receipts directory
let receipts_dir = format!("{}/receipts/sovereign", config.vaultmesh_root);
if let Err(e) = fs::create_dir_all(&receipts_dir) {
return CommandResult {
node_id: payload.node_id,
ts: OffsetDateTime::now_utc(),
nonce: payload.nonce.clone(),
cmd: "sovereign-scan".into(),
status: "error".into(),
exit_code: None,
stdout: String::new(),
stderr: format!("Failed to create receipts directory: {}", e),
};
}
let mut scanner_results = serde_json::Map::new();
let mut summary = ScanSummary::default();
// Run each requested scanner
for scanner in &scanners {
match *scanner {
"nuclei" => {
let result = run_nuclei(&payload.nonce, profile, &receipts_dir);
summary.merge(&result.summary);
scanner_results.insert("nuclei".into(), serde_json::to_value(&result).unwrap());
}
"trivy" => {
let result = run_trivy(&payload.nonce, profile, &receipts_dir);
summary.merge(&result.summary);
scanner_results.insert("trivy".into(), serde_json::to_value(&result).unwrap());
}
_ => {
tracing::warn!("Unknown scanner requested: {}", scanner);
}
}
}
// Build receipt (without hash first)
let mut receipt = serde_json::json!({
"kind": "sovereign-scan",
"node_id": payload.node_id,
"nonce": payload.nonce,
"timestamp": OffsetDateTime::now_utc().format(&Rfc3339).unwrap(),
"os_profile": config.os_profile,
"vaultmesh_root": config.vaultmesh_root,
"profile": profile,
"scanners": scanner_results,
"summary": summary
});
// Hash the receipt content (before adding hash field)
let receipt_bytes = serde_json::to_vec(&receipt).unwrap();
let hash = blake3::hash(&receipt_bytes);
let hash_hex = format!("blake3:{}", hash.to_hex());
receipt["receipt_hash"] = serde_json::Value::String(hash_hex.clone());
// Write receipt file
let receipt_path = format!("{}/SCAN-{}.json", receipts_dir, payload.nonce);
if let Err(e) = fs::write(&receipt_path, serde_json::to_string_pretty(&receipt).unwrap()) {
tracing::error!("Failed to write receipt: {}", e);
}
tracing::info!("Sovereign scan complete: receipt_hash={}", hash_hex);
// Append to proofchain log
let proofchain_dir = format!("{}/proofchain", config.vaultmesh_root);
fs::create_dir_all(&proofchain_dir).ok();
let proofchain_entry = serde_json::json!({
"ts": OffsetDateTime::now_utc().format(&Rfc3339).unwrap(),
"node_id": payload.node_id,
"nonce": payload.nonce,
"receipt_hash": hash_hex
});
let proofchain_path = format!("{}/sovereign-scans.jsonl", proofchain_dir);
if let Ok(mut file) = fs::OpenOptions::new()
.create(true)
.append(true)
.open(&proofchain_path)
{
use std::io::Write;
writeln!(file, "{}", serde_json::to_string(&proofchain_entry).unwrap()).ok();
}
// Determine if any scanner had real findings
let any_real = scanner_results.values().any(|v| {
v.get("findings_real").and_then(|f| f.as_bool()).unwrap_or(false)
});
// Return summary as stdout
let stdout = serde_json::to_string_pretty(&serde_json::json!({
"status": "completed",
"receipt_path": receipt_path,
"receipt_hash": hash_hex,
"summary": summary,
"findings_real": any_real
}))
.unwrap();
CommandResult {
node_id: payload.node_id,
ts: OffsetDateTime::now_utc(),
nonce: payload.nonce.clone(),
cmd: "sovereign-scan".into(),
status: "success".into(),
exit_code: Some(0),
stdout,
stderr: String::new(),
}
}
#[tokio::main]
async fn main() {
tracing_subscriber::registry()
.with(tracing_subscriber::EnvFilter::new(
std::env::var("RUST_LOG").unwrap_or_else(|_| "info".into()),
))
.with(tracing_subscriber::fmt::layer())
.init();
let config = Config::from_env();
let hostname = hostname::get()
.map(|h| h.to_string_lossy().to_string())
.unwrap_or_else(|_| "unknown".into());
tracing::info!(
"VaultMesh Node Agent starting (node_id={}, hostname={})",
config.node_id,
hostname
);
tracing::info!("Command Center URL: {}", config.cc_url);
tracing::info!("Heartbeat interval: {}s", config.heartbeat_secs);
// Load CC public key for signature verification
let cc_pubkey_path = std::env::var("VAULTMESH_CC_PUBKEY")
.unwrap_or_else(|_| "/etc/vaultmesh/cc-ed25519.pub".into());
let cc_pubkey = match load_cc_public_key(&cc_pubkey_path) {
Ok(pk) => {
tracing::info!("Loaded CC public key from {}", cc_pubkey_path);
Some(pk)
}
Err(e) => {
tracing::warn!("Could not load CC public key: {} - commands disabled", e);
None
}
};
let client = reqwest::Client::new();
let heartbeat_url = format!("{}/api/agent/heartbeat", config.cc_url);
let commands_url = format!("{}/api/agent/commands?node_id={}", config.cc_url, config.node_id);
let result_url = format!("{}/api/agent/command-result", config.cc_url);
loop {
// Send heartbeat
let metrics = gather_metrics();
let heartbeat = NodeHeartbeat {
node_id: config.node_id,
hostname: hostname.clone(),
os_profile: config.os_profile.clone(),
cloudflare_ok: check_cloudflared(),
services_ok: check_services(&config.vaultmesh_root),
vaultmesh_root: config.vaultmesh_root.clone(),
timestamp: OffsetDateTime::now_utc(),
metrics,
};
tracing::debug!("Sending heartbeat: {:?}", heartbeat);
match client.post(&heartbeat_url).json(&heartbeat).send().await {
Ok(resp) if resp.status().is_success() => {
tracing::info!(
"Heartbeat sent (cf={}, svc={}, load={:.2}, mem={}MiB, disk={:.1}GiB)",
heartbeat.cloudflare_ok,
heartbeat.services_ok,
heartbeat.metrics.load1.unwrap_or(0.0),
heartbeat.metrics.mem_used_mb.unwrap_or(0),
heartbeat.metrics.disk_root_used_gb.unwrap_or(0.0),
);
}
Ok(resp) => {
tracing::warn!("Heartbeat returned non-success status: {}", resp.status());
}
Err(e) => {
tracing::error!("Failed to send heartbeat: {}", e);
}
}
// Poll for commands (only if we have the public key)
if let Some(ref pubkey) = cc_pubkey {
match client.get(&commands_url).send().await {
Ok(resp) if resp.status().is_success() => {
match resp.json::<Vec<SignedCommand>>().await {
Ok(commands) => {
for cmd in commands {
tracing::info!("Received command: {} (nonce={})", cmd.payload.cmd, cmd.payload.nonce);
// Verify signature
if let Err(e) = verify_command(&cmd, pubkey) {
tracing::error!("Signature verification failed: {}", e);
let result = CommandResult {
node_id: cmd.payload.node_id,
ts: OffsetDateTime::now_utc(),
nonce: cmd.payload.nonce.clone(),
cmd: cmd.payload.cmd.clone(),
status: "rejected".into(),
exit_code: None,
stdout: String::new(),
stderr: format!("Signature verification failed: {}", e),
};
let _ = client.post(&result_url).json(&result).send().await;
continue;
}
tracing::info!("Signature verified, executing command");
let result = execute_command(&cmd, &config);
tracing::info!(
"Command '{}' completed: {} (exit={:?})",
result.cmd,
result.status,
result.exit_code
);
// Report result
if let Err(e) = client.post(&result_url).json(&result).send().await {
tracing::error!("Failed to report command result: {}", e);
}
}
}
Err(e) => {
tracing::warn!("Failed to parse commands response: {}", e);
}
}
}
Ok(resp) => {
tracing::warn!("Commands endpoint returned: {}", resp.status());
}
Err(e) => {
tracing::debug!("Failed to poll commands: {}", e);
}
}
}
sleep(Duration::from_secs(config.heartbeat_secs)).await;
}
}

113
scripts/event_generation_demo.py Executable file
View File

@@ -0,0 +1,113 @@
#!/usr/bin/env python3
"""
VaultMesh Command Center: Event Generation Demonstration
This script simulates the event generation mechanism of the VaultMesh Command Center.
"""
import uuid
import json
import time
from datetime import datetime, timezone
import requests
class EventGenerator:
def __init__(self, cc_url="http://localhost:8088"):
self.cc_url = cc_url
self.node_id = str(uuid.uuid4())
self.hostname = "demo-node"
self.os_profile = "DemoVault"
def generate_heartbeat(self):
"""Generate a heartbeat event payload."""
return {
"node_id": self.node_id,
"hostname": self.hostname,
"os_profile": self.os_profile,
"cloudflare_ok": True,
"services_ok": True,
"vaultmesh_root": "/var/lib/vaultmesh",
"timestamp": datetime.now(timezone.utc).isoformat(),
"metrics": {
"uptime_seconds": int(time.time()),
"load1": 0.5,
"load5": 0.3,
"load15": 0.2
}
}
def generate_scan_result(self):
"""Generate a mock scan result event payload."""
return {
"cmd": "sovereign-scan",
"node_id": self.node_id,
"ts": datetime.now(timezone.utc).isoformat(),
"status": "success",
"stdout": json.dumps({
"summary": {
"critical": 1,
"high": 3,
"medium": 5,
"low": 10,
"info": 15
},
"findings_real": False,
"receipt_hash": str(uuid.uuid4())
}),
"exit_code": 0
}
def generate_command_result(self):
"""Generate a mock command result event payload."""
return {
"node_id": self.node_id,
"cmd": "service-status",
"ts": datetime.now(timezone.utc).isoformat(),
"status": "success",
"stdout": "cloudflared: active\nnginx: active",
"exit_code": 0,
"nonce": str(uuid.uuid4())
}
def send_event(self, event_type, event_data):
"""Send an event to the Command Center."""
try:
response = requests.post(
f"{self.cc_url}/api/events",
json={
"kind": event_type,
"node_id": self.node_id,
"body": event_data
},
headers={"Content-Type": "application/json"}
)
response.raise_for_status()
print(f"[SUCCESS] Sent {event_type} event for node {self.node_id}")
return response.json()
except requests.RequestException as e:
print(f"[ERROR] Failed to send {event_type} event: {e}")
return None
def main():
generator = EventGenerator()
print("🚀 VaultMesh Command Center Event Generation Demo")
print("---------------------------------------------")
# Simulate event generation
events = [
("heartbeat", generator.generate_heartbeat()),
("scan", generator.generate_scan_result()),
("command", generator.generate_command_result())
]
for event_type, event_data in events:
print(f"\nGenerating {event_type.upper()} event:")
print(json.dumps(event_data, indent=2))
result = generator.send_event(event_type, event_data)
if result:
print(f"Event ID: {result.get('id')}")
time.sleep(1) # Add a small delay between events
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,23 @@
[Unit]
Description=VaultMesh Command Center
After=network-online.target cloudflared.service
Wants=network-online.target
[Service]
Type=exec
ExecStart=/usr/local/bin/vaultmesh-command-center
Restart=on-failure
RestartSec=5
Environment=RUST_LOG=info
# Hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,23 @@
[Unit]
Description=VaultMesh Node Agent
After=network-online.target cloudflared.service
Wants=network-online.target
[Service]
Type=exec
ExecStart=/usr/local/bin/vaultmesh-node-agent
Restart=on-failure
RestartSec=10
EnvironmentFile=/etc/vaultmesh/agent.env
# Hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1 @@
requests>=2.28.0

View File

@@ -0,0 +1,305 @@
#!/usr/bin/env python3
"""
VaultMesh vm-copilot Agent
A bot that monitors Command Center nodes and posts incidents/notes
when attention conditions are detected.
Usage:
python vm_copilot_agent.py
Environment variables:
VAULTMESH_CC_URL - Command Center URL (default: http://127.0.0.1:8088)
VM_COPILOT_INTERVAL - Poll interval in seconds (default: 60)
VM_COPILOT_NODES - Comma-separated list of node IDs to watch
"""
import os
import sys
import time
import logging
from datetime import datetime, timezone
from typing import Optional
import requests
# -----------------------------------------------------------------------------
# Configuration
# -----------------------------------------------------------------------------
CC_URL = os.environ.get("VAULTMESH_CC_URL", "http://127.0.0.1:8088")
POLL_INTERVAL = int(os.environ.get("VM_COPILOT_INTERVAL", "60"))
# Node IDs to watch (can be overridden via env var)
DEFAULT_NODES = [
"b5d5a72e-71d9-46d7-a4d7-a1c89d53224b", # gamma (machine-id derived)
]
WATCH_NODES = os.environ.get("VM_COPILOT_NODES", ",".join(DEFAULT_NODES)).split(",")
WATCH_NODES = [n.strip() for n in WATCH_NODES if n.strip()]
AUTHOR = "vm-copilot"
# Logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
log = logging.getLogger("vm-copilot")
# -----------------------------------------------------------------------------
# State Tracking
# -----------------------------------------------------------------------------
# Track conditions we've already alerted on: node_id -> set of condition keys
alerted_conditions: dict[str, set[str]] = {}
# Track last event timestamp per node for incremental fetching
last_event_ts: dict[str, str] = {}
# -----------------------------------------------------------------------------
# HTTP Helpers
# -----------------------------------------------------------------------------
def get_node(node_id: str) -> Optional[dict]:
"""Fetch a single node's state from CC by querying the nodes list."""
try:
# /nodes returns JSON array of all nodes
resp = requests.get(f"{CC_URL}/nodes", timeout=10)
if resp.status_code == 200:
nodes = resp.json()
# Find the node by ID
for node in nodes:
if node.get("node_id") == node_id:
return node
log.warning(f"Node {node_id} not found in nodes list")
return None
else:
log.error(f"Failed to get nodes: {resp.status_code}")
return None
except requests.RequestException as e:
log.error(f"Request error getting nodes: {e}")
return None
def get_events(
node_id: str,
since: Optional[str] = None,
kinds: Optional[list[str]] = None,
limit: int = 50,
) -> list[dict]:
"""Query events for a node from CC."""
params = {"node_id": node_id, "limit": str(limit)}
if since:
params["since"] = since
if kinds:
params["kind"] = ",".join(kinds)
try:
resp = requests.get(f"{CC_URL}/api/events", params=params, timeout=10)
if resp.status_code == 200:
return resp.json()
else:
log.error(f"Failed to get events: {resp.status_code}")
return []
except requests.RequestException as e:
log.error(f"Request error getting events: {e}")
return []
def post_event(kind: str, node_id: str, body: dict) -> Optional[dict]:
"""Post an event to CC."""
payload = {
"kind": kind,
"node_id": node_id,
"author": AUTHOR,
"body": body,
}
try:
resp = requests.post(
f"{CC_URL}/api/events",
json=payload,
headers={"Content-Type": "application/json"},
timeout=10,
)
if resp.status_code == 200:
result = resp.json()
log.info(f"Posted {kind} event: {result.get('id', 'unknown')}")
return result
else:
log.error(f"Failed to post event: {resp.status_code} - {resp.text}")
return None
except requests.RequestException as e:
log.error(f"Request error posting event: {e}")
return None
# -----------------------------------------------------------------------------
# Rule Engine
# -----------------------------------------------------------------------------
def analyze_node(node: dict, recent_events: list[dict]) -> list[tuple[str, str, dict]]:
"""
Analyze a node's state and return events to emit.
Returns:
List of (kind, condition_key, body) tuples
"""
to_emit = []
attention = node.get("attention", [])
hostname = node.get("hostname", "unknown")
node_id = node.get("node_id", "unknown")
# Rule 1: critical_findings - needs attention
if "critical_findings" in attention:
scan_summary = node.get("last_scan_summary")
crit_count = 0
if scan_summary:
crit_count = scan_summary.get("critical", 0) + scan_summary.get("high", 0)
to_emit.append((
"incident",
"critical_findings",
{
"title": "Critical Findings Detected",
"description": f"Node {hostname} has {crit_count} critical/high scan findings requiring attention",
"severity": "high",
},
))
to_emit.append((
"note",
"critical_findings_note",
{
"text": f"Auto-detected critical_findings on node {hostname}. Recommend manual review of scan results.",
"severity": "warn",
},
))
# Rule 2: heartbeat_stale - node may be offline
if "heartbeat_stale" in attention:
last_seen = node.get("last_seen", "unknown")
to_emit.append((
"incident",
"heartbeat_stale",
{
"title": "Node Heartbeat Stale",
"description": f"Node {hostname} has not sent a heartbeat recently. Last seen: {last_seen}",
"severity": "high",
},
))
# Rule 3: cloudflare_down - informational note
if "cloudflare_down" in attention:
to_emit.append((
"note",
"cloudflare_down",
{
"text": f"Node {hostname} reports Cloudflare tunnel is down.",
"severity": "warn",
},
))
# Rule 4: never_scanned - needs initial scan
if "never_scanned" in attention:
to_emit.append((
"note",
"never_scanned",
{
"text": f"Node {hostname} has never been scanned. Recommend triggering initial sovereign scan.",
"severity": "info",
},
))
return to_emit
def has_recent_operator_ack(events: list[dict]) -> bool:
"""Check if there's a recent ack from someone other than vm-copilot."""
for ev in events:
if ev.get("kind") == "ack":
author = ev.get("author", "")
if author and author != AUTHOR:
return True
return False
# -----------------------------------------------------------------------------
# Main Loop
# -----------------------------------------------------------------------------
def run_cycle():
"""Run a single monitoring cycle."""
for node_id in WATCH_NODES:
log.debug(f"Checking node {node_id}")
# Fetch node state
node = get_node(node_id)
if not node:
continue
hostname = node.get("hostname", node_id[:8])
attention = node.get("attention", [])
log.info(f"Node {hostname}: attention={attention}")
# Fetch recent events
since = last_event_ts.get(node_id)
events = get_events(
node_id,
since=since,
kinds=["note", "incident", "ack", "resolve"],
)
# Update last seen timestamp
if events:
# Events are returned oldest-first by default, get the latest
latest_ts = max(ev.get("ts", "") for ev in events)
if latest_ts:
last_event_ts[node_id] = latest_ts
# Check for operator acks
recent_ack = has_recent_operator_ack(events)
if recent_ack:
log.info(f"Node {hostname}: operator ack detected, resetting alerts")
alerted_conditions.pop(node_id, None)
# Analyze and emit events
to_emit = analyze_node(node, events)
for kind, condition_key, body in to_emit:
# Skip if we've already alerted on this condition
node_alerts = alerted_conditions.setdefault(node_id, set())
if condition_key in node_alerts:
log.debug(f"Already alerted on {condition_key} for {hostname}")
continue
# Post the event
result = post_event(kind, node_id, body)
if result:
node_alerts.add(condition_key)
def main():
"""Main entry point."""
log.info("=" * 60)
log.info("VaultMesh vm-copilot starting")
log.info(f" CC URL: {CC_URL}")
log.info(f" Poll interval: {POLL_INTERVAL}s")
log.info(f" Watching nodes: {WATCH_NODES}")
log.info("=" * 60)
while True:
try:
run_cycle()
except Exception as e:
log.exception(f"Error in monitoring cycle: {e}")
log.info(f"Sleeping {POLL_INTERVAL}s...")
time.sleep(POLL_INTERVAL)
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
log.info("Shutting down...")
sys.exit(0)