Skip to main content

Crate haagenti_zstd

Crate haagenti_zstd 

Source
Expand description

§Haagenti Zstd

Native Rust implementation of Zstandard compression (RFC 8878).

Zstandard provides an excellent balance of compression ratio and speed, making it suitable for general-purpose compression. This implementation is fully cross-compatible with the reference zstd C library.

§Features

  • Pure Rust: No C dependencies, fully native implementation
  • Cross-Compatible: Output compatible with reference zstd, and vice versa
  • Fast Decompression: 1.5x - 5x faster than reference zstd
  • RFC 8878 Compliant: Follows the Zstandard specification
  • 354 Tests Passing: Comprehensive test coverage

§Quick Start

use haagenti_zstd::{ZstdCodec, ZstdCompressor, ZstdDecompressor};
use haagenti_core::{Compressor, Decompressor, CompressionLevel};

// Using the codec (compression + decompression)
let codec = ZstdCodec::new();
let compressed = codec.compress(b"Hello, World!").unwrap();
let original = codec.decompress(&compressed).unwrap();
assert_eq!(original, b"Hello, World!");

// With compression level
let compressor = ZstdCompressor::with_level(CompressionLevel::Best);
let compressed = compressor.compress(b"test data").unwrap();

§Performance vs Reference zstd

§Decompression (64KB data)

Data Typehaagentizstd refSpeedup
Text9,948 MB/s3,755 MB/s2.7x
Binary15,782 MB/s10,257 MB/s1.5x
Random42,827 MB/s8,119 MB/s5.3x

§Compression Ratio (64KB data)

Data Typehaagentizstd refParity
Text964x1024x94%
Binary234x237x99%
Repetitive4681x3449x136%

§Cross-Library Compatibility

  • ✓ haagenti can decompress zstd output
  • ✓ zstd can decompress haagenti output

§Architecture

┌─────────────────────────────────────────────────────────────┐
│                      haagenti-zstd                          │
├─────────────────────────────────────────────────────────────┤
│  compress/          │  decompress.rs                        │
│  ├── analysis.rs    │  (Full decompression pipeline)        │
│  ├── match_finder   │                                       │
│  ├── block.rs       │                                       │
│  └── sequences.rs   │                                       │
├─────────────────────────────────────────────────────────────┤
│  huffman/           │  fse/                                 │
│  ├── encoder.rs     │  ├── encoder.rs                       │
│  ├── decoder.rs     │  ├── decoder.rs                       │
│  └── table.rs       │  └── table.rs                         │
├─────────────────────────────────────────────────────────────┤
│  frame/             │  block/                               │
│  ├── header.rs      │  ├── literals.rs                      │
│  ├── block.rs       │  └── sequences.rs                     │
│  └── checksum.rs    │                                       │
└─────────────────────────────────────────────────────────────┘

§Implementation Status

§Completed

Decompression:

  • FSE (Finite State Entropy) decoding tables
  • FSE bitstream decoder with backward reading
  • Huffman decoding tables (single-stream and 4-stream)
  • Huffman weight parsing (direct representation)
  • Frame header parsing (all flags, window size, dictionary ID, FCS)
  • Block header parsing (Raw, RLE, Compressed)
  • XXHash64 checksum verification
  • Literals section parsing (Raw, RLE, Huffman-compressed)
  • Sequences section (count parsing, all symbol modes)
  • FSE-based sequence decoding (predefined tables, RLE mode)
  • Baseline tables for LL/ML/OF codes (extra bits, baselines)
  • Sequence execution (literal copy, match copy, overlapping matches)

Compression:

  • Compressibility fingerprinting (novel approach)
  • Match finder with hash chains
  • Huffman encoding (single-stream and 4-stream)
  • Huffman weight normalization (Kraft inequality)
  • Block encoding (Raw, RLE, Compressed)
  • RLE sequence mode for uniform patterns
  • FSE sequence encoding with predefined tables
  • tANS encoder with correct state transitions
  • Frame encoding with checksum
  • Cross-library compatibility with reference zstd

§Planned

  • SIMD-accelerated match finding
  • Custom FSE table encoding (for patterns not covered by predefined)
  • FSE-compressed Huffman weights (for >127 unique symbols)
  • Dictionary support
  • Streaming compression/decompression

§Known Limitations

  1. Symbol Limit: Huffman uses direct weight format, limited to 127 symbols
  2. Predefined Tables: FSE uses only predefined tables; some patterns fall back
  3. Compression Speed: Pure Rust is ~0.2-0.7x of reference zstd (decompression is faster)

§References

Re-exports§

pub use dictionary::ZstdDictCompressor;
pub use dictionary::ZstdDictDecompressor;
pub use dictionary::ZstdDictionary;

Modules§

block
Zstd block decoding.
compress
Zstd compression pipeline.
decompress
Full Zstd decompression pipeline.
dictionary
Zstandard Dictionary Support
frame
Zstandard frame format.
fse
Finite State Entropy (FSE) coding.
huffman
Huffman coding for Zstandard.

Structs§

CustomFseTables
Custom FSE tables for sequence encoding.
CustomHuffmanTable
Custom Huffman table for literal encoding.
ZstdCodec
Zstandard codec combining compression and decompression.
ZstdCompressor
Zstandard compressor.
ZstdDecompressor
Zstandard decompressor.

Constants§

MAX_WINDOW_SIZE
Maximum window size (128 MB).
MIN_WINDOW_SIZE
Minimum window size (1 KB).
ZSTD_MAGIC
Zstd magic number (little-endian: 0xFD2FB528).