Expand description
§Byte Punch Compression
Profile-aware semantic tokenization for CML documents and other structured content.
Byte Punch achieves 40-70% compression by replacing common patterns with fixed-size tokens:
- 2-byte tokens: Reserved words (e.g., “shall” → 0x2001)
- 4-byte tokens: Common terms (e.g., “Congress” → 0x40000001)
- 8-byte tokens: Phrases (e.g., “We the People” → 0x8000000000000001)
§Compression Goals by Profile
- Legal: 60-70% (highest due to boilerplate repetition)
- Code: 55-65% (method names, type signatures)
- Bookstack: 50-60% (Markdown syntax, headings)
§Key Properties
- Predictable: Same input → same output, always
- Bidirectional: Perfect decompression, no data loss
- Profile-aware: Uses domain-specific dictionaries
- Fast: Simple byte replacement, no entropy encoding
Re-exports§
pub use compressor::Compressor;pub use decompressor::Decompressor;pub use dictionary::Dictionary;pub use error::BytePunchError;pub use error::Result;
Modules§
- compressor
- Byte Punch compressor implementation
- decompressor
- Byte Punch decompressor implementation
- dictionary
- Dictionary management for Byte Punch compression
- error
- Error types for Byte Punch compression/decompression
Structs§
- Compression
Stats - Compression statistics for a document