zenflate

Pure Rust DEFLATE/zlib/gzip compression and decompression.

no_std compatible (alloc required for compression and streaming decompression; decompression is fully stack-allocated).

Usage

[dependencies]
zenflate = "0.3"

Compress

use zenflate::{Compressor, CompressionLevel, Unstoppable};

let data = b"Hello, World! Hello, World! Hello, World!";
let mut compressor = Compressor::new(CompressionLevel::balanced());

let bound = Compressor::deflate_compress_bound(data.len());
let mut compressed = vec![0u8; bound];
let compressed_len = compressor
    .deflate_compress(data, &mut compressed, Unstoppable)
    .unwrap();
let compressed = &compressed[..compressed_len];

Decompress

use zenflate::{Decompressor, Unstoppable};

let mut decompressor = Decompressor::new();
let mut output = vec![0u8; original_len];
let result = decompressor
    .deflate_decompress(compressed, &mut output, Unstoppable)
    .unwrap();
// result.input_consumed — bytes of compressed data consumed
// result.output_written — bytes of decompressed data produced

Streaming decompression

For inputs that don't fit in memory or arrive incrementally. Works with &[u8] (zero overhead) or any std::io::BufRead via BufReadSource.

use zenflate::{StreamDecompressor, InputSource};

// From a slice (no_std compatible):
let mut stream = StreamDecompressor::new_deflate(compressed_data);
loop {
    let chunk = stream.fill()?;
    if chunk.is_empty() { break; }
    // process chunk...
    let n = chunk.len();
    stream.advance(n);
}

// From a BufRead (std only):
use zenflate::BufReadSource;
let file = std::io::BufReader::new(std::fs::File::open("data.gz").unwrap());
let mut stream = StreamDecompressor::new_gzip(BufReadSource::new(file));
// stream also implements Read + BufRead

Formats

All three DEFLATE-based formats are supported:

// Raw DEFLATE
compressor.deflate_compress(data, &mut out, Unstoppable)?;
decompressor.deflate_decompress(compressed, &mut out, Unstoppable)?;

// zlib (2-byte header + DEFLATE + Adler-32)
compressor.zlib_compress(data, &mut out, Unstoppable)?;
decompressor.zlib_decompress(compressed, &mut out, Unstoppable)?;

// gzip (10-byte header + DEFLATE + CRC-32)
compressor.gzip_compress(data, &mut out, Unstoppable)?;
decompressor.gzip_decompress(compressed, &mut out, Unstoppable)?;

Compression levels

Pick a preset or dial in a specific effort from 0 to 30:

use zenflate::CompressionLevel;

// Named presets
CompressionLevel::none()      // effort 0  — store (no compression)
CompressionLevel::fastest()   // effort 1  — turbo hash table
CompressionLevel::fast()      // effort 10 — greedy hash chains
CompressionLevel::balanced()  // effort 15 — lazy matching (default)
CompressionLevel::high()      // effort 22 — double-lazy matching
CompressionLevel::best()      // effort 30 — near-optimal parsing

// Fine-grained control (0-30, clamped)
CompressionLevel::new(12)     // lazy matching, mid-range
CompressionLevel::new(25)     // near-optimal, fast end

// Byte-identical C libdeflate compatibility (0-12)
CompressionLevel::libdeflate(6)

Preset	Effort	Strategy	Description
`none()`	0	Store	Framing only, no compression
`fastest()`	1	Turbo	Maximum throughput
`fast()`	10	Greedy	Hash chains — big ratio jump over turbo
`balanced()`	15	Lazy	Lazy matching — good default
`high()`	22	Lazy2	Double-lazy — best before near-optimal
`best()`	30	Near-optimal	Best compression ratio

Effort levels 0-30 map to six strategies:

Effort	Strategy	Notes
0	Store	No compression
1-4	Turbo	Single-entry hash table, fastest
5-9	FastHt	2-entry hash table, increasing match length
10	Greedy	Hash chains with greedy matching
11-17	Lazy	Hash chains with lazy matching
18-22	Lazy2	Double-lazy matching
23-30	Near-optimal	Near-optimal parsing via binary trees

Higher effort within a strategy increases search depth and match quality. Strategy transitions (e.g. e9→e10, e10→e11) can occasionally produce slightly larger output on specific inputs due to algorithmic differences. Use CompressionLevel::monotonicity_fallback() to detect and handle these transitions — it returns the previous strategy's max effort so you can compare both and pick the smaller result.

Reuse Compressor and Decompressor across calls to avoid re-initialization.

Recommended effort levels

Benchmarked on real images (10 screenshots, 10 photos) from the codec-corpus. Ratio = compressed / raw size (lower is better). Speed = compression throughput.

Effort	Preset	Strategy	Screenshots	Photos	Note
1	`fastest()`	Turbo	6.2%, 2360 MiB/s	73.4%, 225 MiB/s	Max throughput
9	—	FastHt	5.9%, 2175 MiB/s	73.0%, 164 MiB/s	Best cheap compression
10	`fast()`	Greedy	5.3%, 630 MiB/s	70.7%, 118 MiB/s	Hash chains — big ratio jump
15	`balanced()`	Lazy	5.1%, 466 MiB/s	69.7%, 90 MiB/s	Good default
22	`high()`	Lazy2	4.9%, 197 MiB/s	69.8%, 72 MiB/s	Best before near-optimal
30	`best()`	NearOptimal	4.4%, 11 MiB/s	67.4%, 19 MiB/s	Maximum compression

For most uses, balanced() (effort 15) is a good default. Use fast() (effort 10) when speed matters more than the last few percent of compression.

Parallel gzip compression

use zenflate::{Compressor, CompressionLevel, Unstoppable};

let mut compressor = Compressor::new(CompressionLevel::balanced());
let bound = Compressor::gzip_compress_bound(data.len()) + num_threads * 5;
let mut compressed = vec![0u8; bound];
let size = compressor
    .gzip_compress_parallel(data, &mut compressed, 4, Unstoppable)
    .unwrap();

Splits input into chunks with 32KB dictionary overlap, compresses in parallel, concatenates into a valid gzip stream. Near-linear scaling (3.3x with 4 threads).

Cancellation

All compression and whole-buffer decompression methods accept a stop parameter implementing the Stop trait. Pass Unstoppable to disable cancellation, or implement Stop to check a flag periodically:

use zenflate::{Stop, StopReason, Unstoppable};

// Unstoppable — never cancels
compressor.deflate_compress(data, &mut out, Unstoppable)?;

// Custom cancellation
struct MyStop { cancelled: std::sync::Arc<std::sync::atomic::AtomicBool> }
impl Stop for MyStop {
    fn check(&self) -> Result<(), StopReason> {
        if self.cancelled.load(std::sync::atomic::Ordering::Relaxed) {
            Err(StopReason)
        } else {
            Ok(())
        }
    }
}

Streaming decompression doesn't take a Stop parameter — the caller controls the loop and can stop between fill() calls.

Features

Feature	Default	Effect
`std`	yes	`std::error::Error` impls, `BufReadSource`, parallel gzip
`alloc`	yes (via `std`)	Compression, streaming decompression
`avx512`	yes	AVX-512 SIMD for checksums on supported CPUs
`unchecked`	no	Elide bounds checks in hot paths (+10-25% compression speed)

Decompression works in no_std without alloc; all state is stack-allocated.

Performance

Benchmarked on x86_64 with AVX-512 (Intel), --features unchecked (v0.3.1). As of v0.3.2, NearOptimalState uses Vec instead of fixed arrays; benchmarks should be re-run to confirm performance at levels 10-12 and 30.

Compression (3 MiB photo bitmap, reproducible via examples/ratio_bench.rs):

Library	Level	Ratio	Speed	vs C
zenflate	effort 1 (fastest)	91.69%	149 MiB/s	0.81x
zenflate	effort 15 (balanced)	92.31%	105 MiB/s	0.88x
zenflate	effort 22 (high)	92.31%	104 MiB/s	0.87x
zenflate	effort 30 (best)	91.80%	39 MiB/s	0.89x
libdeflate (C)	L1	91.69%	185 MiB/s	—
libdeflate (C)	L9	92.31%	119 MiB/s	—
libdeflate (C)	L12	91.80%	44 MiB/s	—
flate2	L1	91.70%	291 MiB/s	—
flate2	L9 (best)	91.58%	55 MiB/s	—

zenflate and libdeflate produce byte-identical output at every level (via CompressionLevel::libdeflate(n)).

Decompression (compressed at L6):

Data type	zenflate	libdeflate (C)	flate2	miniz_oxide
Sequential	27.7 GiB/s	31.6 GiB/s	7.2 GiB/s	6.6 GiB/s
Zeros	34.6 GiB/s	14.5 GiB/s	26.6 GiB/s	17.2 GiB/s
Mixed	717 MiB/s	795 MiB/s	585 MiB/s	571 MiB/s

Checksums:

Algorithm	zenflate	libdeflate (C)	Implementation
Adler-32	114 GiB/s	121 GiB/s	AVX-512 VNNI (x86), NEON (aarch64), WASM simd128
CRC-32	78 GiB/s	77 GiB/s	PCLMULQDQ (x86), PMULL (aarch64)

Parallel gzip (4 MB mixed data):

Level	1 thread	4 threads	Speedup
effort 1	161 MiB/s	534 MiB/s	3.3x
effort 15	133 MiB/s	440 MiB/s	3.3x
effort 30	46 MiB/s	135 MiB/s	2.9x

How it works

zenflate started as a port of Eric Biggers' libdeflate and has grown into its own implementation. The core decompressor, matchfinders, Huffman construction, and block splitting trace back to libdeflate. On top of that foundation, zenflate pulls in techniques from several other projects and adds original work:

Effort-based compression (0-30) with six strategies and named presets, replacing libdeflate's fixed 0-12 levels. Includes two original matchfinder designs (turbo, fast HT) for the low-effort range.
Full-optimal compression (Zopfli-style iterative squeeze), ported from zenzop with Katajainen bounded package-merge for optimal length-limited Huffman codes.
Multi-strategy Huffman optimization combining Brotli-inspired frequency smoothing, Zopfli-style RLE optimization, and max-bits sweeps to find the smallest encoding per block.
Parallel gzip compression using pigz-style chunking with 32KB dictionary overlap and combined CRC-32 via GF(2) matrix.
Streaming decompression via a pull-based API that works in no_std + alloc.
Snapshot/restore (CompressorSnapshot) for branching compression state — try different inputs from the same point and pick the best result (designed for PNG filter selection).
Cancellation via the Stop trait for cooperative interruption.

Safe Rust throughout (#![forbid(unsafe_code)] by default), with an opt-in unchecked feature for bounds-check elimination in compression hot paths. SIMD acceleration for checksums (AVX2/AVX-512/PCLMULQDQ on x86, NEON/PMULL on aarch64, simd128 on WASM) via archmage with zero unsafe.

zenflate can produce byte-identical output to libdeflate at every level (via CompressionLevel::libdeflate(n)), and runs at roughly 0.8-0.9x the speed of the C original depending on level and data. The gap comes from register pressure differences and bounds checking.

Acknowledgments

libdeflate by Eric Biggers — decompressor, matchfinders (hash table, hash chains, binary trees), Huffman construction, block splitting, near-optimal parser, checksum implementations
Zopfli by Lode Vandevenne and Jyrki Rissanen (Google) — full-optimal parsing concept, iterative cost refinement, optimize_huffman_for_rle (Zopfli-style variant)
zenzop — Rust Zopfli port used as the source for katajainen, squeeze, and block splitter modules
Brotli (Google) — frequency smoothing algorithm for Huffman RLE encoding
pigz by Mark Adler — parallel gzip chunking strategy with dictionary overlap

What's different from libdeflate

CompressionLevel::libdeflate(n) produces byte-identical output to C. The recommended effort-based API (CompressionLevel::new(n)) uses different algorithms and tuning at every level:

Effort	Strategy	Matchfinder	Encoding	vs libdeflate
0	Store	—	—	Same
1-4	Turbo	Single-entry hash, limited skip updates	Standard	Original matchfinder, not in libdeflate
5-9	FastHt	2-entry hash, limited skip updates	Standard	Original matchfinder, not in libdeflate
10	Greedy	Hash chains	Standard	`good_match` early-exit (libdeflate: disabled)
11-17	Lazy	Hash chains	Standard	`good_match`/`max_lazy` tuning curves (libdeflate: disabled)
18-22	Lazy2	Hash chains	Standard	`good_match`/`max_lazy` tuning (libdeflate: disabled)
23-25	NearOptimal	Binary trees	Exhaustive precode search	Multi-strategy precode flag search
26-27	NearOptimal	Binary trees	+ multi-strategy Huffman	+ Brotli/Zopfli RLE smoothing, reduced max_bits sweep
28-30	NearOptimal	Binary trees	+ diversified optimization	+ randomized cost model, 20-30 passes (libdeflate: 2-10)
31+	FullOptimal	Zopfli hash chains	Katajainen package-merge	Entirely different algorithm (from zenzop)

At effort 10-22, the core matching algorithms are the same as libdeflate (greedy, lazy, double-lazy with hash chains), but zenflate adds good_match and max_lazy early-exit thresholds that libdeflate leaves disabled. These let the compressor skip deep chain searches and lazy evaluations when it already has a good enough match, trading a small amount of compression ratio for speed at lower effort levels.

At effort 23+, the near-optimal parser is the same backward DP as libdeflate, but the block encoding pipeline diverges: multi-strategy Huffman code construction tries Brotli-inspired and Zopfli-style frequency smoothing with max-bits sweeps to find smaller encodings. At effort 28+, the optimizer runs 20-30 passes with randomized cost diversification instead of libdeflate's fixed 2-10 passes.

MSRV

The minimum supported Rust version is 1.89.

AI-Generated Code Notice

Developed with Claude (Anthropic). Not all code manually reviewed. Review critical paths before production use.

License

Dual-licensed: AGPL-3.0 or a commercial license.

Sustainable, large-scale open source work requires a funding model, and I've been doing this full-time for 15 years. If you use zenflate in closed-source software AND your company makes over $1M/year in revenue, you need a commercial license. Commercial licenses are company-specific, on a sliding scale, and similar to Apache 2.0 in what they permit. Everyone else can use this under the AGPL v3.

zenflate is an independent Rust implementation drawing on algorithms from several permissively-licensed projects. No original C/C++ code was copied. See LICENSE for detailed provenance of every component and the full text of all upstream licenses (libdeflate MIT, Zopfli Apache-2.0, Brotli MIT, pigz zlib). See Acknowledgments for links to the upstream projects.

zenflate 0.3.2