Expand description
HEDL Canonicalization
Provides deterministic output generation for HEDL documents. Canonical output ensures stable hashing, diffing, and round-trips.
§Overview
This crate implements the canonical serialization format for HEDL documents, as specified in SPEC.md Section 13.2. Canonicalization ensures:
- Deterministic output: Same document always produces same output
- Idempotency:
canonicalize(canonicalize(x)) == canonicalize(x) - Round-trip preservation: Parsing canonical output preserves semantics
- Stable hashing: Enables content-addressable storage and diffing
§Features
- Minimal or always-quote string formatting strategies
- Legacy ditto support for pre-v2.0 documents
- Proper escaping of quotes and control characters
- Alphabetically sorted keys, aliases, and struct declarations
- Count hints in STRUCT directives for performance optimization
- Security: Recursion depth limits prevent stack overflow
DoSattacks
§Examples
use hedl_c14n::{canonicalize, CanonicalConfig, CanonicalConfigBuilder, QuotingStrategy};
use hedl_core::Document;
// Simple canonicalization with defaults
let output = canonicalize(&doc)?;
// Custom configuration using fluent API
let config = CanonicalConfig::new()
.with_quoting(QuotingStrategy::Always)
.with_ditto(false);
let output = hedl_c14n::canonicalize_with_config(&doc, &config)?;
// Custom configuration using builder pattern
let config = CanonicalConfig::builder()
.quoting(QuotingStrategy::Always)
.use_ditto(false)
.sort_keys(true)
.build();
let output = hedl_c14n::canonicalize_with_config(&doc, &config)?;§Security
This crate implements protection against denial-of-service attacks:
- Recursion depth limit: Maximum nesting depth of 1000 levels prevents stack overflow
- Proper escaping: All special characters are escaped to prevent injection attacks
- Type safety: Rust’s type system prevents memory safety issues
§Performance
Several optimizations are implemented:
- P0: Direct
BTreeMapiteration eliminates key cloning (1.15x speedup, 10-15% fewer allocations) - P1: Pre-allocated output buffer (1.2-1.3x speedup)
- P1: Cell buffer reuse across rows (1.05-1.1x speedup for large matrices)
- Count hints:
add_count_hints()function to automatically add count hints to matrix lists
Structs§
- Canonical
Config - Configuration for canonical output format.
- Canonical
Config Builder - Builder for constructing a
CanonicalConfigwith a chainable API. - Canonical
Writer - Writer for canonical HEDL output.
Enums§
- Quoting
Strategy - Quoting strategy for string values.
Functions§
- add_
count_ hints - Recursively add count hints to all matrix lists in the document.
- can_
use_ ditto - Check if a value can use ditto marker from previous row.
- canonicalize
- Canonicalize a HEDL document to a string.
- canonicalize_
with_ config - Canonicalize a HEDL document with custom configuration.