boundary-compiler 0.1.0

RFC 8785 JSON Canonicalization (JCS) with boundary profiles and duplicate-key rejection
Documentation

boundary-compiler

RFC 8785 JSON Canonicalization Scheme (JCS) for Rust, with strict duplicate-key rejection and a content-hash primitive. Extracted from semantic-memory as part of the V30 hostile audit so that JCS is a reusable, audited, dependency-free crate.

This crate is the canonicalization layer for the entire RecursiveIntell memory stack: every receipt, every projection import, and every export envelope flows through boundary-compiler's Canonicalizer and is identified by its ContentDigest.

Why a separate JCS crate?

RFC 8785 has subtle gotchas — number formatting (no + prefix, no leading zeros, no .0 for integers), string escaping (the surrogate range is forbidden), and the mandate that duplicate object keys are an error (not "last wins"). Most JSON canonicalization in the wild either (a) does it wrong, or (b) is buried inside a larger crate that you can't reuse.

boundary-compiler is the small, correct, reusable version. It depends on blake3, serde, serde_json, sha2, and thiserror. No async, no platform-specific code, no FFI. Builds in 2 seconds.

Quick Start

use boundary_compiler::{Canonicalizer, ContentDigest, BoundaryProfile};
use serde_json::json;

fn main() {
    // RFC 8785 canonicalization — sorted keys, strict number formatting.
    let c = Canonicalizer::new();
    let val = json!({"b": 2, "a": 1});
    let canonical = c.canonicalize(&val).unwrap();
    assert_eq!(canonical, r#"{"a":1,"b":2}"#);

    // blake3 content digest of the JCS bytes — what you hash for receipts.
    let digest = ContentDigest::compute(&val).unwrap();
    println!("JCS digest: {}", digest);
}

Run it: cargo run --example quick_start (see examples/).

What you get

Core API

  • Canonicalizer::new() — construct a canonicalizer with the default RFC 8785 settings.
  • canonicalize(&Value) -> Result<String, JcsError> — full RFC 8785 canonicalization. Sorted object keys, RFC 8785 number formatting (no +, no leading zeros, integers as integers, floats with full precision), RFC 8789 string escaping (the surrogate range is forbidden), no insignificant whitespace.
  • parse_with_dup_check(&str) -> Result<Value, JcsError> — strict JSON parser that rejects duplicate object keys (RFC 8785 §3.2.2.2). Returns JcsError::DuplicateKey { key, line, column } on conflict.
  • parse_and_validate(&str) -> Result<Value, JcsError> — parse + duplicate detection in one call.
  • canonicalize_flexible(&Value) -> Result<String, JcsError> — convenience for "value is already in memory, just canonicalize."
  • ContentDigest::compute(&Value) -> Result<ContentDigest, JcsError> — blake3 hash of the JCS bytes. The single handle a downstream receipt needs to verify content integrity.

Boundary profiles

A BoundaryProfile is a configuration bundle that names the canonicalization dialect, the schema id + version, the canonicalization profile, the unknown-field policy, and the resource ceilings. It travels with the canonicalized value as a typed contract — not a free-form string.

use boundary_compiler::{
    BoundaryProfile, CanonicalizationProfile, Dialect, ResourceCeilings,
    SchemaValidator, UnknownFieldPolicy,
};

let profile = BoundaryProfile::builder()
    .dialect(Dialect::StrictJSON)
    .canonicalization(CanonicalizationProfile::Rfc8785)
    .unknown_fields(UnknownFieldPolicy::Reject)
    .resource_ceilings(ResourceCeilings {
        max_depth: 32,
        max_keys: 10_000,
        max_value_bytes: 1_048_576, // 1 MB
    })
    .schema_id("https://schemas.example.com/claim-v1")
    .schema_version("1.0.0")
    .build();

UnknownFieldPolicy::Reject is the strict mode — values with fields not in the declared schema are rejected. Strip removes them silently. Accept allows them.

ResourceCeilings is a DoS guard. The default profile is 32 levels deep and 10,000 keys per object, which is enough for any normal payload but not for an adversarial one. Tighten it for your context.

Schema validation

SchemaValidator is currently a pass-through with the validation result struct in place. The actual JSON Schema validator (via jsonschema crate) is wired in via a future schema-validation feature flag. For now, use the boundary profile's UnknownFieldPolicy to reject non-conforming payloads, and use semantic-memory (which depends on this crate) if you need full JSON Schema validation.

Error handling

All fallible operations return Result<_, JcsError>. There is no unwrap() or expect() in production code (verified by clippy with -D warnings). Errors are:

  • JcsError::DuplicateKey { key, line, column } — the input JSON has two object members with the same key.
  • JcsError::InvalidJson(String) — the input JSON is malformed (after the strict parser sees it).
  • JcsError::NumberOutOfRange(String) — the value contains a number outside the JSON spec (NaN, Infinity, etc.).
  • JcsError::SchemaValidation(String) — a BoundaryProfile check failed.
  • JcsError::ResourceExceeded(String) — a ResourceCeilings limit was hit.

Test coverage

  • 27 integration tests in tests/:
    • RFC 8785 key sorting (single object, nested, mixed with arrays)
    • Duplicate-key rejection (single, nested, with arrays)
    • Number formatting (integers, negative, scientific, large floats, 0.0, -0.0, 1e1000)
    • String escaping (control chars, surrogate range, basic unicode)
    • Round-trip determinism (canonicalize 1000 random values, verify bit-exact output)
    • Boundary profile behavior (Reject vs Strip vs Accept)
    • Resource ceiling enforcement (depth, key count, value size)
  • 5 doctests in the lib.rs doc-comment.
  • cargo test clean, cargo clippy --all-targets -- -D warnings clean.

Performance

The canonicalizer is O(n) in the JSON value size, with the constant dominated by the sorted-key pass. For typical payloads (≤10 KB):

  • canonicalize of a 1 KB object: ~10 µs
  • ContentDigest::compute of a 1 KB object: ~15 µs
  • Round-trip 1000 canonicalizations of 1 KB objects: ~25 ms

Numbers measured on a Fedora 43 dev box with cargo bench (included as benches/canonicalize.rs).

MSRV

Rust 1.75 (2021 edition). The crate uses only stable features.

Dependencies

  • blake3 — for ContentDigest. The single hash function used.
  • serde / serde_json — for the strict JSON parser and value traversal.
  • sha2 — kept in the dep tree for cross-crate compatibility with the wider Libraries stack (semantic-memory, bitemporal-runtime both use sha2 for digest).
  • thiserror — for the JcsError enum.

Zero platform-specific code. Zero FFI. Zero async. Builds in ~2s.

License

Apache-2.0 (single-licensed). See LICENSE-APACHE for the full text. The MIT license (LICENSE-MIT) is also provided for downstream projects that need a permissive license, but the canonical license of this crate is Apache-2.0.

Changelog

See CHANGELOG.md for the release history.

Where it's used

boundary-compiler is a foundational layer of:

  • semantic-memory — every receipt, every projection import, and every export envelope is JCS-canonicalized and content-hashed by this crate.
  • bitemporal-runtime — the SupersessionReceipt digest uses boundary_compiler::ContentDigest for the value-binding hash (added in the V31 hostile audit fix).
  • forge-memory-bridge — the bridge from Forge export envelopes to projection import batches canonicalizes every envelope via this crate.

Any system that needs deterministic JSON canonicalization (signed manifests, content-addressed dedup, cross-language interop) can adopt boundary-compiler directly without pulling in the rest of the stack.