Expand description
v0.7.0 L2-5 (issue #670) — forensic evidence bundle.
This module assembles and verifies the procurement-grade evidence
tarball produced by ai-memory export-forensic-bundle. The bundle
is the OSS surface for the AgenticMem Attest tier — a single
tar file an external auditor can re-verify with no network and no
daemon state, just the public keys of the signing agents.
§Bundle layout
<bundle>.tar
manifest.json — bundle metadata + SHA-256s + sig
verification.json — L1-3 `verify-reflection-chain` JSON
memories/<id>.json — target memory + sources
edges/<src>__<rel>__<dst>.json — reflects_on / supersedes /
derived_from edges + signatures
signed_events/<event_id>.json — append-only audit rows for
the chain
transcripts/<id>.json — transcript metadata
transcripts/<id>.content — raw decompressed UTF-8 body§Determinism + reproducibility
Acceptance criterion from #670 is “byte-identical mod timestamp”. We enforce that by:
- Writing a minimal POSIX ustar archive in-process (no
tarcrate dep — keeps the dep surface flat per repo convention). - Sorting every file name lexicographically before emission so two builds over the same DB produce identical bytes regardless of SQLite row order.
- Pinning every per-file ustar header field (uid, gid, mtime, mode, uname, gname) to a constant — there is no caller-supplied filesystem metadata in the archive.
- Pinning the manifest field order via a struct definition rather
than a
serde_json::Map(which isBTreeMap-backed but the defaultto_stringwriter is still order-preserving for the struct path) and emitting viaserde_json::to_vec_prettywhich is deterministic for#[derive(Serialize)]structs.
The only legitimate non-determinism is manifest.generated_at —
the RFC3339 instant the bundle was assembled. That field is
explicitly documented as “expected to vary across rebuilds” and
lives in a stable position so a downstream diff tool can ignore it
exactly.
§Signature
The bundle’s manifest.json includes a SHA-256 over every file in
the archive AND, when an AlphaOne operator keypair is on disk, an
Ed25519 signature over a canonical concatenation of those hashes.
An auditor verifies the bundle by:
- Re-hashing every file in the tar.
- Comparing each hash to
manifest.files[path].sha256. - (If
manifest.signatureis present) re-deriving the same canonical concat and verifying the Ed25519 signature against the operator’s public key.
Structs§
- Atomisation
Envelope - v0.7.0 WT-1-E — per-memory atomisation enrichment block. Carries
the substrate-visible signals (
atomised_into,archived_at,atom_ids,atom_of) directly so an auditor can reconstruct the chain from a single envelope. - Edge
Envelope - One signed link inside the bundle. Carries the canonical
SignableLinkfield set plus the raw signature so an auditor can re-derive the canonical-CBOR bytes and re-verify the Ed25519 signature without joining back to a substrate row. - Export
Forensic Bundle Args - Arguments for
ai-memory export-forensic-bundle. - Manifest
- Manifest metadata + integrity index for the bundle.
- Manifest
File - One entry in the manifest’s per-file index.
- Memory
Envelope - One stored memory inside the bundle. We re-emit a stable subset of
the
crate::models::Memoryshape so a future struct refactor doesn’t silently break the on-disk format. - Signed
Event Envelope - One
signed_eventsaudit row inside the bundle. Mirrors the column shape ofcrate::signed_events::SignedEventbut emitspayload_hashandsignatureas hex strings so the on-wire format is JSON-safe. - Transcript
Envelope - One transcript inside the bundle. We split metadata from content so callers can deserialise the metadata without holding the body in memory.
- Verification
Report - Result of
verify. One row per discrepancy plus anokflag. - Verify
Forensic Bundle Args - Arguments for
ai-memory verify-forensic-bundle <path>.
Enums§
- Signature
Status - Manifest-signature outcome.
Constants§
- BUNDLE_
SCHEMA_ VERSION - Bundle schema version pin. Bumped on any change that breaks the auditor’s deserialisation contract (new mandatory field, removed field, reshuffled enum, etc.).
- MAX_
TAR_ ENTRY_ BYTES - #1250 — hard cap on the per-entry body size accepted by
read_ustar. Set to 1 GiB: two orders of magnitude above the largest realistic forensic-bundle file (a fully-attested signed chain of a 7-day mid-tier namespace is ~10 MB) and small enough thatpos.checked_add(size)cannot wrap on any supported platform. A crafted bundle declaring a larger entry is refused at parse time withtar entry … exceeds the … hard cap.
Functions§
- build
- Build the bundle for the given memory id, writing the tarball to
output_path. - build_
files - In-memory variant of
build. Returns the path-keyedBundleFilesmap ready to be serialised by either [write_ustar] (production) orpack_to_vec(tests). Public so the integration test suite can rebuild the same bundle twice and diff the bytes without going through the filesystem. - canonical_
signed_ bytes - Canonical signing input:
path:size:sha256per file, joined by\n, then the bundle’s schema version + memory id appended. The ordering of themanifest.filesvec is already deterministic (it reflectsBundleFiles’s BTreeMap iteration order), so the same bundle always produces the same signing input. - pack_
to_ vec - Serialise
filesto an in-memoryVec<u8>— used by the reproducibility tests so two builds can be byte-diffed without hitting the disk. - read_
ustar - Parse a POSIX ustar archive emitted by [
write_ustar] back into a path-keyedBundleFilesmap. We deliberately keep the parser strict — only the field set we ourselves emit is accepted, so a downstream auditor running this code path is auditing the same minimal grammar the build path emits. - run_
export - Run
ai-memory export-forensic-bundle. - run_
verify - Run
ai-memory verify-forensic-bundle. - verify
- Verify a forensic bundle on disk.