Expand description
§Haagenti Zstd
Native Rust implementation of Zstandard compression (RFC 8878).
Zstandard provides an excellent balance of compression ratio and speed, making it suitable for general-purpose compression. This implementation is fully cross-compatible with the reference zstd C library.
§Features
- Pure Rust: No C dependencies, fully native implementation
- Cross-Compatible: Output compatible with reference zstd, and vice versa
- Fast Decompression: 1.5x - 5x faster than reference zstd
- RFC 8878 Compliant: Follows the Zstandard specification
- 354 Tests Passing: Comprehensive test coverage
§Quick Start
use haagenti_zstd::{ZstdCodec, ZstdCompressor, ZstdDecompressor};
use haagenti_core::{Compressor, Decompressor, CompressionLevel};
// Using the codec (compression + decompression)
let codec = ZstdCodec::new();
let compressed = codec.compress(b"Hello, World!").unwrap();
let original = codec.decompress(&compressed).unwrap();
assert_eq!(original, b"Hello, World!");
// With compression level
let compressor = ZstdCompressor::with_level(CompressionLevel::Best);
let compressed = compressor.compress(b"test data").unwrap();§Performance vs Reference zstd
§Decompression (64KB data)
| Data Type | haagenti | zstd ref | Speedup |
|---|---|---|---|
| Text | 9,948 MB/s | 3,755 MB/s | 2.7x |
| Binary | 15,782 MB/s | 10,257 MB/s | 1.5x |
| Random | 42,827 MB/s | 8,119 MB/s | 5.3x |
§Compression Ratio (64KB data)
| Data Type | haagenti | zstd ref | Parity |
|---|---|---|---|
| Text | 964x | 1024x | 94% |
| Binary | 234x | 237x | 99% |
| Repetitive | 4681x | 3449x | 136% |
§Cross-Library Compatibility
- ✓ haagenti can decompress zstd output
- ✓ zstd can decompress haagenti output
§Architecture
┌─────────────────────────────────────────────────────────────┐
│ haagenti-zstd │
├─────────────────────────────────────────────────────────────┤
│ compress/ │ decompress.rs │
│ ├── analysis.rs │ (Full decompression pipeline) │
│ ├── match_finder │ │
│ ├── block.rs │ │
│ └── sequences.rs │ │
├─────────────────────────────────────────────────────────────┤
│ huffman/ │ fse/ │
│ ├── encoder.rs │ ├── encoder.rs │
│ ├── decoder.rs │ ├── decoder.rs │
│ └── table.rs │ └── table.rs │
├─────────────────────────────────────────────────────────────┤
│ frame/ │ block/ │
│ ├── header.rs │ ├── literals.rs │
│ ├── block.rs │ └── sequences.rs │
│ └── checksum.rs │ │
└─────────────────────────────────────────────────────────────┘§Implementation Status
§Completed
Decompression:
- FSE (Finite State Entropy) decoding tables
- FSE bitstream decoder with backward reading
- Huffman decoding tables (single-stream and 4-stream)
- Huffman weight parsing (direct representation)
- Frame header parsing (all flags, window size, dictionary ID, FCS)
- Block header parsing (Raw, RLE, Compressed)
- XXHash64 checksum verification
- Literals section parsing (Raw, RLE, Huffman-compressed)
- Sequences section (count parsing, all symbol modes)
- FSE-based sequence decoding (predefined tables, RLE mode)
- Baseline tables for LL/ML/OF codes (extra bits, baselines)
- Sequence execution (literal copy, match copy, overlapping matches)
Compression:
- Compressibility fingerprinting (novel approach)
- Match finder with hash chains
- Huffman encoding (single-stream and 4-stream)
- Huffman weight normalization (Kraft inequality)
- Block encoding (Raw, RLE, Compressed)
- RLE sequence mode for uniform patterns
- FSE sequence encoding with predefined tables
- tANS encoder with correct state transitions
- Frame encoding with checksum
- Cross-library compatibility with reference zstd
§Planned
- SIMD-accelerated match finding
- Custom FSE table encoding (for patterns not covered by predefined)
- FSE-compressed Huffman weights (for >127 unique symbols)
- Dictionary support
- Streaming compression/decompression
§Known Limitations
- Symbol Limit: Huffman uses direct weight format, limited to 127 symbols
- Predefined Tables: FSE uses only predefined tables; some patterns fall back
- Compression Speed: Pure Rust is ~0.2-0.7x of reference zstd (decompression is faster)
§References
Re-exports§
pub use dictionary::ZstdDictCompressor;pub use dictionary::ZstdDictDecompressor;pub use dictionary::ZstdDictionary;
Modules§
- block
- Zstd block decoding.
- compress
- Zstd compression pipeline.
- decompress
- Full Zstd decompression pipeline.
- dictionary
- Zstandard Dictionary Support
- frame
- Zstandard frame format.
- fse
- Finite State Entropy (FSE) coding.
- huffman
- Huffman coding for Zstandard.
Structs§
- Custom
FseTables - Custom FSE tables for sequence encoding.
- Custom
Huffman Table - Custom Huffman table for literal encoding.
- Zstd
Codec - Zstandard codec combining compression and decompression.
- Zstd
Compressor - Zstandard compressor.
- Zstd
Decompressor - Zstandard decompressor.
Constants§
- MAX_
WINDOW_ SIZE - Maximum window size (128 MB).
- MIN_
WINDOW_ SIZE - Minimum window size (1 KB).
- ZSTD_
MAGIC - Zstd magic number (little-endian: 0xFD2FB528).