Crate bytepunch_rs

Crate bytepunch_rs 

Source
Expand description

§Byte Punch Compression

Profile-aware semantic tokenization for CML documents and other structured content.

Byte Punch achieves 40-70% compression by replacing common patterns with fixed-size tokens:

  • 2-byte tokens: Reserved words (e.g., “shall” → 0x2001)
  • 4-byte tokens: Common terms (e.g., “Congress” → 0x40000001)
  • 8-byte tokens: Phrases (e.g., “We the People” → 0x8000000000000001)

§Compression Goals by Profile

  • Legal: 60-70% (highest due to boilerplate repetition)
  • Code: 55-65% (method names, type signatures)
  • Bookstack: 50-60% (Markdown syntax, headings)

§Key Properties

  • Predictable: Same input → same output, always
  • Bidirectional: Perfect decompression, no data loss
  • Profile-aware: Uses domain-specific dictionaries
  • Fast: Simple byte replacement, no entropy encoding

Re-exports§

pub use compressor::Compressor;
pub use decompressor::Decompressor;
pub use dictionary::Dictionary;
pub use error::BytePunchError;
pub use error::Result;

Modules§

compressor
Byte Punch compressor implementation
decompressor
Byte Punch decompressor implementation
dictionary
Dictionary management for Byte Punch compression
error
Error types for Byte Punch compression/decompression

Structs§

CompressionStats
Compression statistics for a document