Files
vm-core/spec/sentinel/canonicalization.md
2025-12-27 00:10:32 +00:00

4.0 KiB

Sentinel v1 Canonicalization & Hashing Rules

This document defines deterministic event hashing and Merkle root computation for Sentinel v1. Verification MUST be deterministic across platforms given the same artifacts.

1) Hash function (hash_algo)

hash_algo MUST be one of:

  • blake3 (recommended)
  • sha256 (fallback for constrained platforms)

The chosen hash_algo MUST be constant for a given Sentinel instance/build. Verifiers MUST reject mixed algorithms within a single bundle unless explicitly versioned.

1.1 vmhash

vmhash(data: bytes) -> string returns:

  • "blake3:" + hex(blake3(data)) when hash_algo=blake3
  • "sha256:" + hex(sha256(data)) when hash_algo=sha256

hex(...) is lowercase hex with no separators.

2) JSON canonicalization (canonicalization_version)

canonicalization_version for Sentinel v1 events is:

  • sentinel-event-jcs-v1

Canonical JSON MUST use RFC 8785 (JSON Canonicalization Scheme, “JCS”):

  • UTF-8 encoding
  • Object keys sorted lexicographically
  • No insignificant whitespace
  • Numbers encoded per JCS rules

If a platform cannot implement full JCS, it MUST NOT claim sentinel-event-jcs-v1.

3) Event canonical bytes

Each exported event is a JSON object that conforms to event.schema.json.

event_canonical_bytes is the UTF-8 bytes of the JCS-canonicalized event object.

4) Event hash + hash chain

4.1 event_hash

event_hash MUST be computed over the canonical bytes of the event object excluding the event_hash field itself.

Define:

  • event_without_event_hash = event with the event_hash property removed (if present)
  • event_canonical_bytes = jcs_bytes(event_without_event_hash)

Then:

event_hash = vmhash(event_canonical_bytes)

For exported artifacts, event_hash MUST be present in the event record and verifiers MUST recompute and compare it.

4.2 prev_event_hash

  • For seq = 0 (or the first event in a new ledger): prev_event_hash = "0"
  • For seq = n > 0: prev_event_hash MUST equal the computed event_hash of the immediately preceding event (seq = n-1) in the same ledger.

This provides fast tamper evidence even without Merkle recomputation.

5) Operation digest (op_digest)

op_digest commits to the normalized operation descriptor.

Define the normalized object:

{
  "op": "<op>",
  "params": { "canonical": "params" }
}

Normalization rules:

  • op MUST be a stable, versioned identifier (e.g., sentinel.export_seal.v1).
  • params MUST be JSON (no NaN/Infinity); omit unset fields rather than using null where possible.
  • Canonicalize the object using sentinel-event-jcs-v1, then hash:

op_digest = vmhash(jcs_bytes({"op": op, "params": params}))

6) Merkle root (ROOT.current.txt)

6.1 Leaves

The Merkle tree commits to the ordered list of event hashes:

leaves = [event_hash(seq=0), event_hash(seq=1), ...]

Each leaf is a vmhash string (algo:hex).

Note on ranged bundles: A verifier can only recompute the global Merkle roots for an arbitrary since_seq > 0 bundle if it is also given a verifiable Merkle continuation state (e.g., a frontier snapshot) at since_seq-1. Otherwise, verification MUST fall back to hash-chain + file-integrity checks for that range, or the bundle MUST start at since_seq = 0.

6.2 Parent computation (VaultMesh-style)

To compute a parent from two children:

  • Let left_hex = left.split(":", 1)[-1]
  • Let right_hex = right.split(":", 1)[-1]
  • parent = vmhash( (left_hex + right_hex).encode("utf-8") )

If the level has an odd count, duplicate the last element (i.e., right = left).

6.3 Empty tree root

If there are no leaves, the root MUST be:

vmhash(b"empty")

6.4 Root publication file format

ROOT.current.txt MUST be human-readable and parseable as key/value lines:

format=vm-sentinel-root-v1
root=<algo:hex>
seq=<u64>
updated_at=<ISO-8601 Z>
hash_algo=<blake3|sha256>
canonicalization_version=sentinel-event-jcs-v1

Additional keys MAY be included, but verifiers MUST ignore unknown keys.