zenflate
Pure Rust DEFLATE/zlib/gzip compression and decompression, ported from libdeflate.
Buffer-to-buffer only (no streaming). Supports compression levels 0-12. no_std compatible with alloc.
Usage
[]
= "0.1"
Compress
use ;
let data = b"Hello, World! Hello, World! Hello, World!";
let mut compressor = new;
let bound = deflate_compress_bound;
let mut compressed = vec!;
let compressed_len = compressor
.deflate_compress
.unwrap;
let compressed = &compressed;
Decompress
use Decompressor;
let mut decompressor = new;
let mut output = vec!;
let decompressed_len = decompressor
.deflate_decompress
.unwrap;
Formats
All three DEFLATE-based formats are supported:
// Raw DEFLATE
compressor.deflate_compress?;
decompressor.deflate_decompress?;
// zlib (2-byte header + DEFLATE + Adler-32)
compressor.zlib_compress?;
decompressor.zlib_decompress?;
// gzip (10-byte header + DEFLATE + CRC-32)
compressor.gzip_compress?;
decompressor.gzip_decompress?;
Compression levels
| Level | Strategy | Speed vs ratio |
|---|---|---|
| 0 | Uncompressed | No compression, just framing |
| 1 | Fastest (hash table) | Best throughput |
| 2-3 | Greedy | |
| 4-6 | Lazy | Good balance (6 is default) |
| 7-9 | Lazy2 (double lazy eval) | Better ratio |
| 10-12 | Near-optimal parsing | Best ratio, slowest |
use CompressionLevel;
NONE // 0
FASTEST // 1
DEFAULT // 6
BEST // 12
Reuse Compressor and Decompressor across calls to avoid re-initialization.
Features
std(default) — enablesstd::error::Errorimplsalloc(included bystd) — enables compression (requires heap allocation for matchfinder tables)
Decompression works in no_std without alloc; all state is stack-allocated.
Performance
Benchmarked on x86_64 with AVX-512 (Intel), --features unchecked.
Run cargo bench --features unchecked to reproduce.
Compression (3 MiB photo bitmap, reproducible via examples/ratio_bench.rs):
| Library | Level | Ratio | Safe | Unchecked | vs C |
|---|---|---|---|---|---|
| zenflate | 1 (fastest) | 91.69% | 134 MiB/s | 149 MiB/s | 0.81x |
| zenflate | 6 (lazy) | 92.31% | 102 MiB/s | 105 MiB/s | 0.88x |
| zenflate | 9 (lazy2) | 92.31% | 102 MiB/s | 104 MiB/s | 0.87x |
| zenflate | 10 (near-opt) | 91.97% | 38 MiB/s | 47 MiB/s | 0.87x |
| zenflate | 12 (best) | 91.80% | 33 MiB/s | 39 MiB/s | 0.89x |
| libdeflate (C) | 1 | 91.69% | — | 185 MiB/s | — |
| libdeflate (C) | 9 | 92.31% | — | 119 MiB/s | — |
| libdeflate (C) | 12 | 91.80% | — | 44 MiB/s | — |
| flate2 | 1 | 91.70% | — | 291 MiB/s | — |
| flate2 | 9 (best) | 91.58% | — | 55 MiB/s | — |
| miniz_oxide | 9 (best) | 91.58% | — | 55 MiB/s | — |
zenflate and libdeflate produce byte-identical output at every level.
zenflate L6-9 runs ~2x faster than flate2/miniz_oxide at comparable ratios.
The unchecked feature helps most at L10-12 (+18-24%), less at L1-9 (+2-11%).
Decompression (compressed at L6):
| Data type | zenflate | libdeflate (C) | flate2 | miniz_oxide |
|---|---|---|---|---|
| Sequential | 27.7 GiB/s | 31.6 GiB/s | 7.2 GiB/s | 6.6 GiB/s |
| Zeros | 34.6 GiB/s | 14.5 GiB/s | 26.6 GiB/s | 17.2 GiB/s |
| Mixed | 717 MiB/s | 795 MiB/s | 585 MiB/s | 571 MiB/s |
zenflate decompression is 4x faster than flate2/miniz_oxide on typical data.
Zeros decompression is 2.4x faster than C (Rust's fill() auto-vectorizes).
Checksums:
| Algorithm | zenflate | libdeflate (C) | Implementation |
|---|---|---|---|
| Adler-32 | 105 GiB/s | 120 GiB/s | AVX2 (x86), NEON (aarch64) |
| CRC-32 | 78 GiB/s | 77 GiB/s | PCLMULQDQ (x86), PMULL (aarch64) |
Parallel compression (4 MB mixed data, gzip):
| Level | 1 thread | 4 threads | Speedup |
|---|---|---|---|
| L1 | 161 MiB/s | 534 MiB/s | 3.3x |
| L6 | 133 MiB/s | 440 MiB/s | 3.3x |
| L12 | 46 MiB/s | 135 MiB/s | 2.9x |
How it works
This is a line-by-line port of Eric Biggers' libdeflate to safe Rust (#![forbid(unsafe_code)] by default). The algorithms are identical: same matchfinders (hash table, hash chains, binary trees), same Huffman construction, same block splitting heuristics, same near-optimal parser. zenflate produces byte-identical output to libdeflate at every compression level.
The C original is faster — zenflate runs at roughly 0.8-0.9x the speed of libdeflate depending on compression level and data (see benchmarks above). The gap comes from Rust's fat pointers, bounds checking, and register pressure differences. The unchecked feature closes some of this gap by eliding bounds checks in hot paths.
Parallel gzip compression is a zenflate addition — libdeflate is single-threaded. zenflate uses pigz-style chunking with dictionary overlap and combined CRC-32 for near-linear scaling.
SIMD acceleration for checksums (AVX2/PCLMULQDQ on x86, NEON/PMULL on aarch64) and decompression. Runtime feature detection via archmage with zero unsafe.
License
MIT