Files
vm-core/spec/sentinel/canonicalization.md
2025-12-27 00:10:32 +00:00

124 lines
4.0 KiB
Markdown

# Sentinel v1 Canonicalization & Hashing Rules
This document defines deterministic event hashing and Merkle root computation for Sentinel v1. Verification MUST be deterministic across platforms given the same artifacts.
## 1) Hash function (`hash_algo`)
`hash_algo` MUST be one of:
- `blake3` (recommended)
- `sha256` (fallback for constrained platforms)
The chosen `hash_algo` MUST be constant for a given Sentinel instance/build. Verifiers MUST reject mixed algorithms within a single bundle unless explicitly versioned.
### 1.1 `vmhash`
`vmhash(data: bytes) -> string` returns:
- `"blake3:" + hex(blake3(data))` when `hash_algo=blake3`
- `"sha256:" + hex(sha256(data))` when `hash_algo=sha256`
`hex(...)` is lowercase hex with no separators.
## 2) JSON canonicalization (`canonicalization_version`)
`canonicalization_version` for Sentinel v1 events is:
- `sentinel-event-jcs-v1`
Canonical JSON MUST use RFC 8785 (JSON Canonicalization Scheme, “JCS”):
- UTF-8 encoding
- Object keys sorted lexicographically
- No insignificant whitespace
- Numbers encoded per JCS rules
If a platform cannot implement full JCS, it MUST NOT claim `sentinel-event-jcs-v1`.
## 3) Event canonical bytes
Each exported event is a JSON object that conforms to `event.schema.json`.
`event_canonical_bytes` is the UTF-8 bytes of the JCS-canonicalized event object.
## 4) Event hash + hash chain
### 4.1 `event_hash`
`event_hash` MUST be computed over the canonical bytes of the event object *excluding* the `event_hash` field itself.
Define:
- `event_without_event_hash = event` with the `event_hash` property removed (if present)
- `event_canonical_bytes = jcs_bytes(event_without_event_hash)`
Then:
`event_hash = vmhash(event_canonical_bytes)`
For exported artifacts, `event_hash` MUST be present in the event record and verifiers MUST recompute and compare it.
### 4.2 `prev_event_hash`
- For `seq = 0` (or the first event in a new ledger): `prev_event_hash = "0"`
- For `seq = n > 0`: `prev_event_hash` MUST equal the computed `event_hash` of the immediately preceding event (`seq = n-1`) in the same ledger.
This provides fast tamper evidence even without Merkle recomputation.
## 5) Operation digest (`op_digest`)
`op_digest` commits to the *normalized* operation descriptor.
Define the normalized object:
```json
{
"op": "<op>",
"params": { "canonical": "params" }
}
```
Normalization rules:
- `op` MUST be a stable, versioned identifier (e.g., `sentinel.export_seal.v1`).
- `params` MUST be JSON (no NaN/Infinity); omit unset fields rather than using null where possible.
- Canonicalize the object using `sentinel-event-jcs-v1`, then hash:
`op_digest = vmhash(jcs_bytes({"op": op, "params": params}))`
## 6) Merkle root (`ROOT.current.txt`)
### 6.1 Leaves
The Merkle tree commits to the ordered list of event hashes:
`leaves = [event_hash(seq=0), event_hash(seq=1), ...]`
Each leaf is a `vmhash` string (`algo:hex`).
Note on ranged bundles: A verifier can only recompute the global Merkle roots for an arbitrary `since_seq > 0` bundle if it is also given a verifiable Merkle continuation state (e.g., a frontier snapshot) at `since_seq-1`. Otherwise, verification MUST fall back to hash-chain + file-integrity checks for that range, or the bundle MUST start at `since_seq = 0`.
### 6.2 Parent computation (VaultMesh-style)
To compute a parent from two children:
- Let `left_hex = left.split(":", 1)[-1]`
- Let `right_hex = right.split(":", 1)[-1]`
- `parent = vmhash( (left_hex + right_hex).encode("utf-8") )`
If the level has an odd count, duplicate the last element (i.e., `right = left`).
### 6.3 Empty tree root
If there are no leaves, the root MUST be:
`vmhash(b"empty")`
### 6.4 Root publication file format
`ROOT.current.txt` MUST be human-readable and parseable as key/value lines:
```
format=vm-sentinel-root-v1
root=<algo:hex>
seq=<u64>
updated_at=<ISO-8601 Z>
hash_algo=<blake3|sha256>
canonicalization_version=sentinel-event-jcs-v1
```
Additional keys MAY be included, but verifiers MUST ignore unknown keys.