zenflate 0.1.0

Pure Rust DEFLATE/zlib/gzip compression and decompression, ported from libdeflate
Documentation

zenflate

Pure Rust DEFLATE/zlib/gzip compression and decompression, ported from libdeflate.

Buffer-to-buffer only (no streaming). Supports compression levels 0-12. no_std compatible with alloc.

Usage

[dependencies]
zenflate = "0.1"

Compress

use zenflate::{Compressor, CompressionLevel};

let data = b"Hello, World! Hello, World! Hello, World!";
let mut compressor = Compressor::new(CompressionLevel::DEFAULT);

let bound = Compressor::deflate_compress_bound(data.len());
let mut compressed = vec![0u8; bound];
let compressed_len = compressor
    .deflate_compress(data, &mut compressed)
    .unwrap();
let compressed = &compressed[..compressed_len];

Decompress

use zenflate::Decompressor;

let mut decompressor = Decompressor::new();
let mut output = vec![0u8; original_len];
let decompressed_len = decompressor
    .deflate_decompress(compressed, &mut output)
    .unwrap();

Formats

All three DEFLATE-based formats are supported:

// Raw DEFLATE
compressor.deflate_compress(data, &mut out)?;
decompressor.deflate_decompress(compressed, &mut out)?;

// zlib (2-byte header + DEFLATE + Adler-32)
compressor.zlib_compress(data, &mut out)?;
decompressor.zlib_decompress(compressed, &mut out)?;

// gzip (10-byte header + DEFLATE + CRC-32)
compressor.gzip_compress(data, &mut out)?;
decompressor.gzip_decompress(compressed, &mut out)?;

Compression levels

Level Strategy Speed vs ratio
0 Uncompressed No compression, just framing
1 Fastest (hash table) Best throughput
2-3 Greedy
4-6 Lazy Good balance (6 is default)
7-9 Lazy2 (double lazy eval) Better ratio
10-12 Near-optimal parsing Best ratio, slowest
use zenflate::CompressionLevel;

CompressionLevel::NONE     // 0
CompressionLevel::FASTEST  // 1
CompressionLevel::DEFAULT  // 6
CompressionLevel::BEST     // 12

Reuse Compressor and Decompressor across calls to avoid re-initialization.

Features

  • std (default) — enables std::error::Error impls
  • alloc (included by std) — enables compression (requires heap allocation for matchfinder tables)

Decompression works in no_std without alloc; all state is stack-allocated.

Performance

Benchmarked on x86_64 with AVX-512 (Intel), --features unchecked. Run cargo bench --features unchecked to reproduce.

Compression (3 MiB photo bitmap, reproducible via examples/ratio_bench.rs):

Library Level Ratio Safe Unchecked vs C
zenflate 1 (fastest) 91.69% 134 MiB/s 149 MiB/s 0.81x
zenflate 6 (lazy) 92.31% 102 MiB/s 105 MiB/s 0.88x
zenflate 9 (lazy2) 92.31% 102 MiB/s 104 MiB/s 0.87x
zenflate 10 (near-opt) 91.97% 38 MiB/s 47 MiB/s 0.87x
zenflate 12 (best) 91.80% 33 MiB/s 39 MiB/s 0.89x
libdeflate (C) 1 91.69% 185 MiB/s
libdeflate (C) 9 92.31% 119 MiB/s
libdeflate (C) 12 91.80% 44 MiB/s
flate2 1 91.70% 291 MiB/s
flate2 9 (best) 91.58% 55 MiB/s
miniz_oxide 9 (best) 91.58% 55 MiB/s

zenflate and libdeflate produce byte-identical output at every level. zenflate L6-9 runs ~2x faster than flate2/miniz_oxide at comparable ratios. The unchecked feature helps most at L10-12 (+18-24%), less at L1-9 (+2-11%).

Decompression (compressed at L6):

Data type zenflate libdeflate (C) flate2 miniz_oxide
Sequential 27.7 GiB/s 31.6 GiB/s 7.2 GiB/s 6.6 GiB/s
Zeros 34.6 GiB/s 14.5 GiB/s 26.6 GiB/s 17.2 GiB/s
Mixed 717 MiB/s 795 MiB/s 585 MiB/s 571 MiB/s

zenflate decompression is 4x faster than flate2/miniz_oxide on typical data. Zeros decompression is 2.4x faster than C (Rust's fill() auto-vectorizes).

Checksums:

Algorithm zenflate libdeflate (C) Implementation
Adler-32 105 GiB/s 120 GiB/s AVX2 (x86), NEON (aarch64)
CRC-32 78 GiB/s 77 GiB/s PCLMULQDQ (x86), PMULL (aarch64)

Parallel compression (4 MB mixed data, gzip):

Level 1 thread 4 threads Speedup
L1 161 MiB/s 534 MiB/s 3.3x
L6 133 MiB/s 440 MiB/s 3.3x
L12 46 MiB/s 135 MiB/s 2.9x

How it works

This is a line-by-line port of Eric Biggers' libdeflate to safe Rust (#![forbid(unsafe_code)] by default). The algorithms are identical: same matchfinders (hash table, hash chains, binary trees), same Huffman construction, same block splitting heuristics, same near-optimal parser. zenflate produces byte-identical output to libdeflate at every compression level.

The C original is faster — zenflate runs at roughly 0.8-0.9x the speed of libdeflate depending on compression level and data (see benchmarks above). The gap comes from Rust's fat pointers, bounds checking, and register pressure differences. The unchecked feature closes some of this gap by eliding bounds checks in hot paths.

Parallel gzip compression is a zenflate addition — libdeflate is single-threaded. zenflate uses pigz-style chunking with dictionary overlap and combined CRC-32 for near-linear scaling.

SIMD acceleration for checksums (AVX2/PCLMULQDQ on x86, NEON/PMULL on aarch64) and decompression. Runtime feature detection via archmage with zero unsafe.

License

MIT