boundary-compiler
RFC 8785 JSON Canonicalization Scheme (JCS) for Rust, with strict
duplicate-key rejection and a content-hash primitive. Extracted from
semantic-memory as part of the V30 hostile
audit so that JCS is a reusable, audited, dependency-free crate.
This crate is the canonicalization layer for the entire
RecursiveIntell memory stack: every receipt, every projection import,
and every export envelope flows through boundary-compiler's
Canonicalizer and is identified by its ContentDigest.
Why a separate JCS crate?
RFC 8785 has subtle gotchas — number formatting (no + prefix, no
leading zeros, no .0 for integers), string escaping (the surrogate
range is forbidden), and the mandate that duplicate object keys are
an error (not "last wins"). Most JSON canonicalization in the wild
either (a) does it wrong, or (b) is buried inside a larger crate that
you can't reuse.
boundary-compiler is the small, correct, reusable version. It
depends on blake3, serde, serde_json, sha2, and thiserror.
No async, no platform-specific code, no FFI. Builds in 2 seconds.
Quick Start
use ;
use json;
Run it: cargo run --example quick_start (see examples/).
What you get
Core API
Canonicalizer::new()— construct a canonicalizer with the default RFC 8785 settings.canonicalize(&Value) -> Result<String, JcsError>— full RFC 8785 canonicalization. Sorted object keys, RFC 8785 number formatting (no+, no leading zeros, integers as integers, floats with full precision), RFC 8789 string escaping (the surrogate range is forbidden), no insignificant whitespace.parse_with_dup_check(&str) -> Result<Value, JcsError>— strict JSON parser that rejects duplicate object keys (RFC 8785 §3.2.2.2). ReturnsJcsError::DuplicateKey { key, line, column }on conflict.parse_and_validate(&str) -> Result<Value, JcsError>— parse + duplicate detection in one call.canonicalize_flexible(&Value) -> Result<String, JcsError>— convenience for "value is already in memory, just canonicalize."ContentDigest::compute(&Value) -> Result<ContentDigest, JcsError>— blake3 hash of the JCS bytes. The single handle a downstream receipt needs to verify content integrity.
Boundary profiles
A BoundaryProfile is a configuration bundle that names the
canonicalization dialect, the schema id + version, the
canonicalization profile, the unknown-field policy, and the resource
ceilings. It travels with the canonicalized value as a typed
contract — not a free-form string.
use ;
let profile = builder
.dialect
.canonicalization
.unknown_fields
.resource_ceilings
.schema_id
.schema_version
.build;
UnknownFieldPolicy::Reject is the strict mode — values with
fields not in the declared schema are rejected. Strip removes
them silently. Accept allows them.
ResourceCeilings is a DoS guard. The default profile is 32 levels
deep and 10,000 keys per object, which is enough for any normal
payload but not for an adversarial one. Tighten it for your context.
Schema validation
SchemaValidator is currently a pass-through with the validation
result struct in place. The actual JSON Schema validator (via
jsonschema crate) is wired in via a future schema-validation
feature flag. For now, use the boundary profile's
UnknownFieldPolicy to reject non-conforming payloads, and use
semantic-memory (which depends on this crate) if you need full
JSON Schema validation.
Error handling
All fallible operations return Result<_, JcsError>. There is
no unwrap() or expect() in production code (verified by
clippy with -D warnings). Errors are:
JcsError::DuplicateKey { key, line, column }— the input JSON has two object members with the same key.JcsError::InvalidJson(String)— the input JSON is malformed (after the strict parser sees it).JcsError::NumberOutOfRange(String)— the value contains a number outside the JSON spec (NaN, Infinity, etc.).JcsError::SchemaValidation(String)— aBoundaryProfilecheck failed.JcsError::ResourceExceeded(String)— aResourceCeilingslimit was hit.
Test coverage
- 27 integration tests in
tests/:- RFC 8785 key sorting (single object, nested, mixed with arrays)
- Duplicate-key rejection (single, nested, with arrays)
- Number formatting (integers, negative, scientific, large floats,
0.0,-0.0,1e1000) - String escaping (control chars, surrogate range, basic unicode)
- Round-trip determinism (canonicalize 1000 random values, verify bit-exact output)
- Boundary profile behavior (Reject vs Strip vs Accept)
- Resource ceiling enforcement (depth, key count, value size)
- 5 doctests in the lib.rs doc-comment.
cargo testclean,cargo clippy --all-targets -- -D warningsclean.
Performance
The canonicalizer is O(n) in the JSON value size, with the constant dominated by the sorted-key pass. For typical payloads (≤10 KB):
canonicalizeof a 1 KB object: ~10 µsContentDigest::computeof a 1 KB object: ~15 µs- Round-trip 1000 canonicalizations of 1 KB objects: ~25 ms
Numbers measured on a Fedora 43 dev box with cargo bench
(included as benches/canonicalize.rs).
MSRV
Rust 1.75 (2021 edition). The crate uses only stable features.
Dependencies
blake3— forContentDigest. The single hash function used.serde/serde_json— for the strict JSON parser and value traversal.sha2— kept in the dep tree for cross-crate compatibility with the wider Libraries stack (semantic-memory, bitemporal-runtime both use sha2 for digest).thiserror— for theJcsErrorenum.
Zero platform-specific code. Zero FFI. Zero async. Builds in ~2s.
License
Apache-2.0 (single-licensed). See LICENSE-APACHE for the full text.
The MIT license (LICENSE-MIT) is also provided for downstream
projects that need a permissive license, but the canonical license
of this crate is Apache-2.0.
Changelog
See CHANGELOG.md for the release history.
Where it's used
boundary-compiler is a foundational layer of:
semantic-memory— every receipt, every projection import, and every export envelope is JCS-canonicalized and content-hashed by this crate.bitemporal-runtime— theSupersessionReceiptdigest usesboundary_compiler::ContentDigestfor the value-binding hash (added in the V31 hostile audit fix).forge-memory-bridge— the bridge from Forge export envelopes to projection import batches canonicalizes every envelope via this crate.
Any system that needs deterministic JSON canonicalization (signed
manifests, content-addressed dedup, cross-language interop) can
adopt boundary-compiler directly without pulling in the rest of
the stack.