zenflate
Pure Rust DEFLATE/zlib/gzip compression and decompression, ported from libdeflate.
no_std compatible (alloc required for compression and streaming decompression; decompression is fully stack-allocated).
Usage
[]
= "0.2"
Compress
use ;
let data = b"Hello, World! Hello, World! Hello, World!";
let mut compressor = new;
let bound = deflate_compress_bound;
let mut compressed = vec!;
let compressed_len = compressor
.deflate_compress
.unwrap;
let compressed = &compressed;
Decompress
use ;
let mut decompressor = new;
let mut output = vec!;
let result = decompressor
.deflate_decompress
.unwrap;
// result.input_consumed — bytes of compressed data consumed
// result.output_written — bytes of decompressed data produced
Streaming decompression
For inputs that don't fit in memory or arrive incrementally. Works with
&[u8] (zero overhead) or any std::io::BufRead via BufReadSource.
use ;
// From a slice (no_std compatible):
let mut stream = new_deflate;
loop
// From a BufRead (std only):
use BufReadSource;
let file = new;
let mut stream = new_gzip;
// stream also implements Read + BufRead
Formats
All three DEFLATE-based formats are supported:
// Raw DEFLATE
compressor.deflate_compress?;
decompressor.deflate_decompress?;
// zlib (2-byte header + DEFLATE + Adler-32)
compressor.zlib_compress?;
decompressor.zlib_decompress?;
// gzip (10-byte header + DEFLATE + CRC-32)
compressor.gzip_compress?;
decompressor.gzip_decompress?;
Compression levels
Pick a preset or dial in a specific effort from 0 to 30:
use CompressionLevel;
// Named presets
none // effort 0 — store (no compression)
fastest // effort 1 — turbo hash table
fast // effort 10 — greedy hash chains
balanced // effort 15 — lazy matching (default)
high // effort 22 — double-lazy matching
best // effort 30 — near-optimal parsing
// Fine-grained control (0-30, clamped)
new // lazy matching, mid-range
new // near-optimal, fast end
// Byte-identical C libdeflate compatibility (0-12)
libdeflate
| Preset | Effort | Strategy | Description |
|---|---|---|---|
none() |
0 | Store | Framing only, no compression |
fastest() |
1 | Turbo | Maximum throughput |
fast() |
10 | Greedy | Hash chains — big ratio jump over turbo |
balanced() |
15 | Lazy | Lazy matching — good default |
high() |
22 | Lazy2 | Double-lazy — best before near-optimal |
best() |
30 | Near-optimal | Best compression ratio |
Effort levels 0-30 map to six strategies:
| Effort | Strategy | Notes |
|---|---|---|
| 0 | Store | No compression |
| 1-4 | Turbo | Single-entry hash table, fastest |
| 5-9 | FastHt | 2-entry hash table, increasing match length |
| 10 | Greedy | Hash chains with greedy matching |
| 11-17 | Lazy | Hash chains with lazy matching |
| 18-22 | Lazy2 | Double-lazy matching |
| 23-30 | Near-optimal | Near-optimal parsing via binary trees |
Higher effort within a strategy increases search depth and match quality.
Strategy transitions (e.g. e9→e10, e10→e11) can occasionally produce
slightly larger output on specific inputs due to algorithmic differences.
Use CompressionLevel::monotonicity_fallback() to detect and handle these
transitions — it returns the previous strategy's max effort so you can
compare both and pick the smaller result.
Reuse Compressor and Decompressor across calls to avoid re-initialization.
Recommended effort levels
Benchmarked on real images (10 screenshots, 10 photos) from the codec-corpus. Ratio = compressed / raw size (lower is better). Speed = compression throughput.
| Effort | Preset | Strategy | Screenshots | Photos | Note |
|---|---|---|---|---|---|
| 1 | fastest() |
Turbo | 6.2%, 2360 MiB/s | 73.4%, 225 MiB/s | Max throughput |
| 9 | — | FastHt | 5.9%, 2175 MiB/s | 73.0%, 164 MiB/s | Best cheap compression |
| 10 | fast() |
Greedy | 5.3%, 630 MiB/s | 70.7%, 118 MiB/s | Hash chains — big ratio jump |
| 15 | balanced() |
Lazy | 5.1%, 466 MiB/s | 69.7%, 90 MiB/s | Good default |
| 22 | high() |
Lazy2 | 4.9%, 197 MiB/s | 69.8%, 72 MiB/s | Best before near-optimal |
| 30 | best() |
NearOptimal | 4.4%, 11 MiB/s | 67.4%, 19 MiB/s | Maximum compression |
For most uses, balanced() (effort 15) is a good default. Use fast() (effort 10)
when speed matters more than the last few percent of compression.
Parallel gzip compression
use ;
let mut compressor = new;
let bound = gzip_compress_bound + num_threads * 5;
let mut compressed = vec!;
let size = compressor
.gzip_compress_parallel
.unwrap;
Splits input into chunks with 32KB dictionary overlap, compresses in parallel, concatenates into a valid gzip stream. Near-linear scaling (3.3x with 4 threads).
Cancellation
All compression and whole-buffer decompression methods accept a stop parameter
implementing the Stop trait. Pass Unstoppable to disable cancellation, or
implement Stop to check a flag periodically:
use ;
// Unstoppable — never cancels
compressor.deflate_compress?;
// Custom cancellation
Streaming decompression doesn't take a Stop parameter — the caller controls
the loop and can stop between fill() calls.
Features
| Feature | Default | Effect |
|---|---|---|
std |
yes | std::error::Error impls, BufReadSource, parallel gzip |
alloc |
yes (via std) |
Compression, streaming decompression |
avx512 |
yes | AVX-512 SIMD for checksums on supported CPUs |
unchecked |
no | Elide bounds checks in hot paths (+10-25% compression speed) |
Decompression works in no_std without alloc; all state is stack-allocated.
Performance
Benchmarked on x86_64 with AVX-512 (Intel), --features unchecked.
Compression (3 MiB photo bitmap, reproducible via examples/ratio_bench.rs):
| Library | Level | Ratio | Speed | vs C |
|---|---|---|---|---|
| zenflate | effort 1 (fastest) | 91.69% | 149 MiB/s | 0.81x |
| zenflate | effort 15 (balanced) | 92.31% | 105 MiB/s | 0.88x |
| zenflate | effort 22 (high) | 92.31% | 104 MiB/s | 0.87x |
| zenflate | effort 30 (best) | 91.80% | 39 MiB/s | 0.89x |
| libdeflate (C) | L1 | 91.69% | 185 MiB/s | — |
| libdeflate (C) | L9 | 92.31% | 119 MiB/s | — |
| libdeflate (C) | L12 | 91.80% | 44 MiB/s | — |
| flate2 | L1 | 91.70% | 291 MiB/s | — |
| flate2 | L9 (best) | 91.58% | 55 MiB/s | — |
zenflate and libdeflate produce byte-identical output at every level
(via CompressionLevel::libdeflate(n)).
Decompression (compressed at L6):
| Data type | zenflate | libdeflate (C) | flate2 | miniz_oxide |
|---|---|---|---|---|
| Sequential | 27.7 GiB/s | 31.6 GiB/s | 7.2 GiB/s | 6.6 GiB/s |
| Zeros | 34.6 GiB/s | 14.5 GiB/s | 26.6 GiB/s | 17.2 GiB/s |
| Mixed | 717 MiB/s | 795 MiB/s | 585 MiB/s | 571 MiB/s |
Checksums:
| Algorithm | zenflate | libdeflate (C) | Implementation |
|---|---|---|---|
| Adler-32 | 114 GiB/s | 121 GiB/s | AVX-512 VNNI (x86), NEON (aarch64), WASM simd128 |
| CRC-32 | 78 GiB/s | 77 GiB/s | PCLMULQDQ (x86), PMULL (aarch64) |
Parallel gzip (4 MB mixed data):
| Level | 1 thread | 4 threads | Speedup |
|---|---|---|---|
| effort 1 | 161 MiB/s | 534 MiB/s | 3.3x |
| effort 15 | 133 MiB/s | 440 MiB/s | 3.3x |
| effort 30 | 46 MiB/s | 135 MiB/s | 2.9x |
How it works
A line-by-line port of Eric Biggers' libdeflate to safe Rust (#![forbid(unsafe_code)] by default). Same matchfinders (hash table, hash chains, binary trees), same Huffman construction, same block splitting heuristics, same near-optimal parser.
zenflate extends libdeflate with:
- Effort-based compression (0-30) with additional strategies (turbo, fast HT) and finer-grained parameter tuning between libdeflate's 13 fixed levels.
- Parallel gzip compression using pigz-style chunking with 32KB dictionary overlap and combined CRC-32.
- Streaming decompression via a pull-based API that works in
no_std + alloc.
The C original is faster — zenflate runs at roughly 0.8-0.9x the speed of libdeflate depending on level and data. The gap comes from register pressure differences and bounds checking. The unchecked feature closes some of this gap.
SIMD acceleration for checksums (AVX2/AVX-512/PCLMULQDQ on x86, NEON/PMULL on aarch64, simd128 on WASM). Runtime feature detection via archmage with zero unsafe.
License
MIT