lnmp-codec
Parser and encoder implementations for LNMP (LLM Native Minimal Protocol) v0.3 text format and v0.4 binary format.
Features
Text Format (v0.3)
- Deterministic serialization: Fields always sorted by FID for consistent output
- Canonical format: Newline-separated, no extra whitespace (v0.2)
- Type hints: Optional type annotations (
:i,:f,:b,:s,:sa,:r,:ra) - Nested structures: Parse and encode nested records and arrays (v0.3)
- Semantic checksums: Optional SC32 checksums for drift prevention (v0.3)
- Value normalization: Canonical value transformations (v0.3)
- Equivalence mapping: Synonym recognition (v0.3)
- Semantic dictionary (optional): Apply
lnmp-sfedictionaries during parse/encode to map values to canonical equivalents - Strict mode: Validates canonical format compliance
- Loose mode: Accepts format variations (default)
- Lenient sanitizer: Optional pre-parse repair layer shared with
lnmp-sanitizefor LLM-facing inputs - Round-trip stability:
parse(encode(parse(x))) == parse(encode(x))
Binary Format (v0.4)
- Efficient encoding: 30-50% size reduction compared to text format
- Zero-copy design: Fast serialization and deserialization
- Bidirectional conversion: Seamless text ↔ binary conversion
- Canonical binary: Fields sorted by FID, deterministic encoding
- VarInt encoding: Space-efficient integer representation
- Type safety: Explicit type tags for all values
- Version validation: Protocol version checking
- Interoperability: Compatible with v0.3 text format for supported types
Quick Start
Text Format
use ;
// Parse LNMP text
let input = "F12=14532\nF7=1\nF23=[admin,dev]";
let mut parser = new.unwrap;
let record = parser.parse_record.unwrap;
// Encode to canonical format
let encoder = new;
let output = encoder.encode;
// Output: F7=1\nF12=14532\nF23=[admin,dev] (sorted by FID)
Semantic Dictionary Normalization (optional)
use ;
use SemanticDictionary;
// Build a dictionary: map Admin/ADMIN -> admin for field 23
let mut dict = new;
dict.add_equivalence;
// Parse with dictionary (applies equivalence during parse)
let mut parser = with_config
.unwrap;
let record = parser.parse_record.unwrap;
// Encode with the same dictionary (ensures canonical output)
let encoder = with_config;
let output = encoder.encode;
assert_eq!;
Binary Format (v0.4)
use ;
// Encode text to binary
let text = "F7=1\nF12=14532\nF23=[admin,dev]";
let encoder = new;
let binary = encoder.encode_text.unwrap;
// Decode binary to text
let decoder = new;
let decoded_text = decoder.decode_to_text.unwrap;
// Output: F7=1\nF12=14532\nF23=[admin,dev] (canonical format)
// Round-trip conversion maintains data integrity
assert_eq!;
v0.3 Quick Start - Nested Structures
use ;
// Parse nested record
let input = "F50={F12=1;F7=1}";
let mut parser = new.unwrap;
let record = parser.parse_record.unwrap;
// Parse nested array
let input = "F60=[{F1=alice},{F1=bob}]";
let mut parser = new.unwrap;
let record = parser.parse_record.unwrap;
// Encode with checksums
use EncoderConfig;
let config = EncoderConfig ;
let encoder = with_config;
let output = encoder.encode;
// Output: F12=14532#36AAE667 (with checksum)
Lenient LLM-Friendly Parsing
use ;
use BinaryEncoder;
let messy = r#"F1=hello "world"; F2 = yes;F3=00042"#;
// Parser profile geared for LLM output
let mut parser = with_config.unwrap;
let record = parser.parse_record.unwrap;
// Binary encoder also provides lenient/strict helpers
let encoder = new;
let bytes = encoder.encode_text_llm_profile.unwrap;
// For M2M strict flows use `Parser::new_strict` or `encode_text_strict_profile`.
LNMP v0.2 Features
Deterministic Serialization
Fields are always sorted by FID, ensuring consistent output:
let mut record = new;
record.add_field;
record.add_field;
record.add_field;
let encoder = new;
let output = encoder.encode;
// Output: F5=1\nF50=2\nF100=3 (sorted)
Type Hints
Optional type annotations for explicit typing:
use ;
let config = EncoderConfig ;
let encoder = with_config;
let output = encoder.encode;
// Output: F12:i=14532\nF5:f=3.14\nF7:b=1
Strict vs Loose Parsing
use ;
// Loose mode (default): accepts format variations
let mut parser = new.unwrap; // Unsorted, semicolons OK
// Strict mode: requires canonical format
let mut parser = with_mode.unwrap;
// Strict input mode (no sanitizer)
let mut strict_input_parser = new_strict.unwrap;
v0.3 Features
Nested Structures
Parse and encode hierarchical data:
use ;
// Nested record: F50={F12=1;F7=1}
let input = "F50={F12=1;F7=1}";
let mut parser = new.unwrap;
let record = parser.parse_record.unwrap;
// Nested array: F60=[{F1=alice},{F1=bob}]
let input = "F60=[{F1=alice},{F1=bob}]";
let mut parser = new.unwrap;
let record = parser.parse_record.unwrap;
// Deep nesting
let input = "F100={F1=user;F2={F10=nested;F11=data}}";
let mut parser = new.unwrap;
let record = parser.parse_record.unwrap;
Compliance & Lenient Test Suite
tests/compliance/rustcontains the cross-language suite for strict flows.tests/compliance/rust/test-cases-lenient.yamlmirrors the shared sanitizer behavior (auto-quote, comment trimming, nested repairs).- Run
cargo test -p lnmp-codec --tests test-driver -- --nocaptureto execute both strict and lenient suites.
The lenient path uses the lnmp-sanitize crate under the hood so SDKs (Rust/TS/Go/Python) can apply identical repair logic before calling strict parsers.
Recommended SDK Profiles
| Profile | Parser Config | Binary Encoder | Intended Use |
|---|---|---|---|
| LLM-facing | text_input_mode = Lenient, mode = ParsingMode::Loose, normalize_values = true |
encode_text_llm_profile |
Repair user/LLM text before strict parsing |
| M2M strict | Parser::new_strict() or ParserConfig { text_input_mode = Strict, mode = ParsingMode::Strict } |
encode_text_strict_profile |
Deterministic machine-to-machine pipelines |
- Rust exposes helpers (
Parser::new_lenient,Parser::new_strict, binary profile methods). - TypeScript/Go/Python SDKs mirror the same defaults:
LLMProfile(Lenient+Loose) for agent/model traffic andM2MProfile(Strict+Strict) for canonical pipelines. - All SDKs rely on the same sanitizer rules from
lnmp-sanitize, ensuring identical repairs across languages.
Nested Structure Rules:
- Nested records use
{...}syntax with semicolon separators - Nested arrays use
[{...},{...}]syntax - Fields sorted by FID at every nesting level
- Arbitrary nesting depth supported
Semantic Checksums (SC32)
Enable checksums for drift prevention:
use ;
let config = EncoderConfig ;
let encoder = with_config;
let output = encoder.encode;
// Output: F12:i=14532#36AAE667
// Parse and validate checksums
use ;
let config = ParserConfig ;
let mut parser = with_config.unwrap;
let record = parser.parse_record.unwrap; // Validates checksums
Value Normalization
Canonical value transformations:
use ;
let config = NormalizationConfig ;
let normalizer = new;
// Normalizes: true → 1, -0.0 → 0.0, 3.140 → 3.14
let normalized = normalizer.normalize;
Equivalence Mapping
Synonym recognition:
use EquivalenceMapper;
let mut mapper = new;
mapper.add_mapping;
mapper.add_mapping;
// Maps "yes" → "1" for field 7
let canonical = mapper.map; // Some("1")
Canonical Format Rules
v0.3 canonical format:
- ✓ Fields sorted by FID at all nesting levels
- ✓ Newline-separated (no semicolons at top level)
- ✓ Semicolons required in nested records
- ✓ No whitespace around equals signs
- ✓ No spaces after commas in arrays
- ✓ No comments (except in explain mode)
- ✓ Checksums appended as
#XXXXXXXXwhen enabled
Configuration Options
EncoderConfig
ParserConfig
NormalizationConfig
Migration from v0.2
v0.3 is backward compatible with v0.2. New features:
| Feature | v0.2 | v0.3 |
|---|---|---|
| Nested structures | Not supported | Supported |
| Checksums | Not supported | Optional SC32 |
| Value normalization | Not supported | Configurable |
| Equivalence mapping | Not supported | Configurable |
| Type hints | :i, :f, :b, :s, :sa |
+ :r, :ra |
Migration Guide
- Parsing: No changes needed - v0.3 parser accepts v0.2 format
- Encoding: New optional features (checksums, normalization)
- Nested structures: Use new
NestedRecordandNestedArrayvariants - Tests: Update for new value types if using nested structures
// v0.2 code (still works)
let encoder = new;
// v0.3 code with new features
let config = EncoderConfig ;
let encoder = with_config;
Performance Notes
- Sorting overhead: Minimal - uses stable sort on encode
- Memory: Sorted fields are cloned, original record unchanged
- Parsing: Loose mode has same performance as v0.1
Binary Format Details (v0.4)
Binary Frame Structure
┌─────────┬─────────┬─────────────┬──────────────────────┐
│ VERSION │ FLAGS │ ENTRY_COUNT │ ENTRIES... │
│ (1 byte)│(1 byte) │ (VarInt) │ (variable) │
└─────────┴─────────┴─────────────┴──────────────────────┘
Each entry contains:
┌──────────┬──────────┬──────────────────┐
│ FID │ THTAG │ VALUE │
│ (2 bytes)│ (1 byte) │ (variable) │
└──────────┴──────────┴──────────────────┘
Supported Types
- Integer (0x01): VarInt encoded signed 64-bit integers
- Float (0x02): IEEE 754 double-precision (8 bytes, little-endian)
- Boolean (0x03): Single byte (0x00 = false, 0x01 = true)
- String (0x04): Length-prefixed UTF-8 (length as VarInt + bytes)
- String Array (0x05): Count-prefixed array of length-prefixed strings
Binary Encoding Example
use BinaryEncoder;
use ;
let mut record = new;
record.add_field;
record.add_field;
let encoder = new;
let binary = encoder.encode.unwrap;
// Binary format: [0x04, 0x00, 0x02, ...] (version, flags, entry count, entries)
Configuration Options
use ;
// Encoder configuration
let encoder_config = new
.with_validate_canonical
.with_sort_fields;
let encoder = with_config;
// Decoder configuration
let decoder_config = new
.with_validate_ordering // Enforce canonical field order
.with_strict_parsing; // Detect trailing data
let decoder = with_config;
Performance Characteristics
- Space Efficiency: 30-50% size reduction compared to text format
- Encoding Speed: < 1μs per field for simple types
- Decoding Speed: < 1μs per field for simple types
- Round-trip: < 10μs for typical 10-field record
Examples
See the examples/ directory for complete examples:
v0.2 Examples:
type_hints.rs: Type hint usagestrict_vs_loose.rs: Parsing mode comparisondeterministic_serialization.rs: Canonical format demo
v0.3 Examples:
nested_structures.rs: Nested records and arrayssemantic_checksums.rs: SC32 checksum usageexplain_mode.rs: Explain mode encodingshortform.rs: ShortForm encoding/parsingstructural_canonicalization.rs: Structural canonicalization
v0.4 Examples (Binary Format):
binary_encoding.rs: Basic binary encoding and decodingbinary_roundtrip.rs: Round-trip conversion and data integrity
Run examples:
# v0.2 examples
# v0.3 examples
# v0.4 examples (binary format)
License
MIT OR Apache-2.0