Skip to main content

Module bundle

Module bundle 

Source
Expand description

v0.7.0 L2-5 (issue #670) — forensic evidence bundle.

This module assembles and verifies the procurement-grade evidence tarball produced by ai-memory export-forensic-bundle. The bundle is the OSS surface for the AgenticMem Attest tier — a single tar file an external auditor can re-verify with no network and no daemon state, just the public keys of the signing agents.

§Bundle layout

    <bundle>.tar
      manifest.json                 — bundle metadata + SHA-256s + sig
      verification.json             — L1-3 `verify-reflection-chain` JSON
      memories/<id>.json            — target memory + sources
      edges/<src>__<rel>__<dst>.json — reflects_on / supersedes /
                                      derived_from edges + signatures
      signed_events/<event_id>.json  — append-only audit rows for
                                      the chain
      transcripts/<id>.json          — transcript metadata
      transcripts/<id>.content       — raw decompressed UTF-8 body

§Determinism + reproducibility

Acceptance criterion from #670 is “byte-identical mod timestamp”. We enforce that by:

  • Writing a minimal POSIX ustar archive in-process (no tar crate dep — keeps the dep surface flat per repo convention).
  • Sorting every file name lexicographically before emission so two builds over the same DB produce identical bytes regardless of SQLite row order.
  • Pinning every per-file ustar header field (uid, gid, mtime, mode, uname, gname) to a constant — there is no caller-supplied filesystem metadata in the archive.
  • Pinning the manifest field order via a struct definition rather than a serde_json::Map (which is BTreeMap-backed but the default to_string writer is still order-preserving for the struct path) and emitting via serde_json::to_vec_pretty which is deterministic for #[derive(Serialize)] structs.

The only legitimate non-determinism is manifest.generated_at — the RFC3339 instant the bundle was assembled. That field is explicitly documented as “expected to vary across rebuilds” and lives in a stable position so a downstream diff tool can ignore it exactly.

§Signature

The bundle’s manifest.json includes a SHA-256 over every file in the archive AND, when an AlphaOne operator keypair is on disk, an Ed25519 signature over a canonical concatenation of those hashes. An auditor verifies the bundle by:

  1. Re-hashing every file in the tar.
  2. Comparing each hash to manifest.files[path].sha256.
  3. (If manifest.signature is present) re-deriving the same canonical concat and verifying the Ed25519 signature against the operator’s public key.

Structs§

AtomisationEnvelope
v0.7.0 WT-1-E — per-memory atomisation enrichment block. Carries the substrate-visible signals (atomised_into, archived_at, atom_ids, atom_of) directly so an auditor can reconstruct the chain from a single envelope.
EdgeEnvelope
One signed link inside the bundle. Carries the canonical SignableLink field set plus the raw signature so an auditor can re-derive the canonical-CBOR bytes and re-verify the Ed25519 signature without joining back to a substrate row.
ExportForensicBundleArgs
Arguments for ai-memory export-forensic-bundle.
Manifest
Manifest metadata + integrity index for the bundle.
ManifestFile
One entry in the manifest’s per-file index.
MemoryEnvelope
One stored memory inside the bundle. We re-emit a stable subset of the crate::models::Memory shape so a future struct refactor doesn’t silently break the on-disk format.
SignedEventEnvelope
One signed_events audit row inside the bundle. Mirrors the column shape of crate::signed_events::SignedEvent but emits payload_hash and signature as hex strings so the on-wire format is JSON-safe.
TranscriptEnvelope
One transcript inside the bundle. We split metadata from content so callers can deserialise the metadata without holding the body in memory.
VerificationReport
Result of verify. One row per discrepancy plus an ok flag.
VerifyForensicBundleArgs
Arguments for ai-memory verify-forensic-bundle <path>.

Enums§

SignatureStatus
Manifest-signature outcome.

Constants§

BUNDLE_SCHEMA_VERSION
Bundle schema version pin. Bumped on any change that breaks the auditor’s deserialisation contract (new mandatory field, removed field, reshuffled enum, etc.).
MAX_TAR_ENTRY_BYTES
#1250 — hard cap on the per-entry body size accepted by read_ustar. Set to 1 GiB: two orders of magnitude above the largest realistic forensic-bundle file (a fully-attested signed chain of a 7-day mid-tier namespace is ~10 MB) and small enough that pos.checked_add(size) cannot wrap on any supported platform. A crafted bundle declaring a larger entry is refused at parse time with tar entry … exceeds the … hard cap.

Functions§

build
Build the bundle for the given memory id, writing the tarball to output_path.
build_files
In-memory variant of build. Returns the path-keyed BundleFiles map ready to be serialised by either [write_ustar] (production) or pack_to_vec (tests). Public so the integration test suite can rebuild the same bundle twice and diff the bytes without going through the filesystem.
canonical_signed_bytes
Canonical signing input: path:size:sha256 per file, joined by \n, then the bundle’s schema version + memory id appended. The ordering of the manifest.files vec is already deterministic (it reflects BundleFiles’s BTreeMap iteration order), so the same bundle always produces the same signing input.
pack_to_vec
Serialise files to an in-memory Vec<u8> — used by the reproducibility tests so two builds can be byte-diffed without hitting the disk.
read_ustar
Parse a POSIX ustar archive emitted by [write_ustar] back into a path-keyed BundleFiles map. We deliberately keep the parser strict — only the field set we ourselves emit is accepted, so a downstream auditor running this code path is auditing the same minimal grammar the build path emits.
run_export
Run ai-memory export-forensic-bundle.
run_verify
Run ai-memory verify-forensic-bundle.
verify
Verify a forensic bundle on disk.