compcol
A collection of compression algorithms in pure Rust.
compcol puts every supported algorithm — RLE, deflate, zlib, gzip, LZMA,
LZMA2, xz, Zstandard, Brotli, LZ4, Snappy, LZW — behind one uniform
streaming trait, with each algorithm gated by its own Cargo feature so
downstream crates only pay for what they pull in. A runtime by-name
factory makes algorithms selectable from configuration or a CLI flag,
and a compcol binary turns the library into a Unix-style filter.
Design principles
- Pure Rust. No
bindgen, no FFI, no C dependencies. The crate has zero runtime dependencies — nothing in[dependencies]. - 100% safe.
unsafe_code = "forbid"is set crate-wide; the library never opts out. no_std. The library is#![no_std].allocis used by everything except the bare-bonesrlealgorithm; algorithms that need large windows or work buffers pull inallocautomatically.- Streaming. The caller owns both buffers; the codec preserves its state across calls. Works in a 1-byte-on-both-sides streaming loop.
- Per-algorithm features.
default = ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]. Everything else is opt-in.
Supported algorithms
| Algorithm | Feature | Extension | Encoder | Decoder | Cross-validation |
|---|---|---|---|---|---|
| RLE | rle |
.rle |
full | full | — |
| Deflate (RFC 1951) | deflate |
.deflate |
full (dynamic Huffman) | full | python3 -c "import zlib" |
| Zlib (RFC 1950) | zlib |
.zz |
full | full | python3 -c "import zlib" |
| Gzip (RFC 1952) | gzip |
.gz |
full | full | gzip(1) |
| LZ4 block format | lz4 |
.lz4 |
LZ77 hash matcher | full | — |
| Snappy | snappy |
.sz |
LZ77 hash matcher (raw block format) | full | — |
LZW (compress(1) .Z) |
lzw |
.lzw |
full | full | compress(1) / uncompress(1) |
LZMA (legacy .lzma) |
lzma |
.lzma |
full | full | python3 -m lzma (FORMAT_ALONE) |
| LZMA2 | lzma2 |
.lzma2 |
LZMA-compressed chunks + uncompressed fallback | full (0xE0 + uncompressed) | xz --format=raw --lzma2 -d |
| xz | xz |
.xz |
compressed-LZMA2 chunks + uncompressed fallback | full envelope + all reset variants | xz(1) both directions |
| Zstandard (RFC 8478) | zstd |
.zst |
LZ77 + FSE sequences (Predefined tables) + Raw literals | full Compressed_Block | zstd(1) both directions |
| Brotli (RFC 7932) | brotli |
.br |
LZ77 + length-limited Huffman trees + 704-symbol IC alphabet | full (with 122 KiB static dictionary) | brotli(1) both directions |
Every algorithm decodes real-world output from its reference toolchain
and produces output that the same reference toolchain accepts. Some
encoders (zstd, brotli) ship without Huffman-encoded literals or
custom FSE tables — they emit valid compressed-format streams that are
weaker on compression ratio than zstd -1 / brotli -q1 (typically
within 1.3–1.5× on text, falling further behind on highly repetitive
input where reference tools use RLE blocks).
Library usage
# Cargo.toml
[]
= { = "0.1", = ["gzip", "factory"] }
The trait
use ;
Streaming a round-trip
use ;
use ;
let input = b"hello world hello world hello world";
// Encode.
let mut enc = new;
let mut buf = ;
let mut encoded = Vecnew;
let p = enc.encode.unwrap;
encoded.extend_from_slice;
loop
// Decode.
let mut dec = new;
let mut decoded = Vecnew;
let p = dec.decode.unwrap;
decoded.extend_from_slice;
let p = dec.finish.unwrap;
decoded.extend_from_slice;
assert!;
assert_eq!;
Runtime selection via the factory
use ;
let mut enc = encoder_by_name
.expect;
let mut out = ;
let p = enc.encode.unwrap;
// ...
println!;
factory::extension(name) returns the conventional file extension for
each algorithm (e.g. "gz" for gzip, "zst" for zstd).
Skipping decompressed bytes
Useful for tar-style archive browsing — read a header, skip past the file body, read the next header:
use Decoder;
use Decoder as _;
let mut dec = new;
// Skip past the first 100 decompressed bytes…
let p = dec.skip.unwrap;
// …then decode the next 50:
let mut out = ;
let p = dec.decode.unwrap;
The default skip implementation just reads-and-discards through a
small scratch buffer, so it works for every algorithm. Individual
decoders are free to override with a smarter implementation when the
format allows it (e.g. fast-forwarding through stored deflate blocks
without LZ77 expansion).
CLI usage
The compcol binary ships with the crate. Install with:
(or pick whichever subset of algorithms you want).
Usage: compcol -t ALGO [OPTIONS] [INPUT]
Required:
-t, --type ALGO Algorithm (use --list to see what's compiled in)
Mode:
-d, --decompress Decompress instead of compress
Output (mutually exclusive):
-c, --stdout Write to stdout, keep input file
-o, --output PATH Write to PATH
(default, INPUT given) Write to <INPUT>.<ext> on compress, or strip
<ext> on decompress; remove INPUT on success
(default, no INPUT) Read stdin, write stdout
Misc:
-k, --keep Keep input file even in in-place mode
-f, --force Overwrite an existing output file
-L, --list List available algorithms and exit
-V, --version Print version and exit
-h, --help Print this help and exit
Examples
# Pipe-style use (gzip via stdin → stdout)
|
# In-place compression (mirrors gzip(1) semantics: removes the original)
# Keep the original
# Decompress
# Force overwrite of an existing output file
# Round-trip into a pager
|
# Mix algorithms
# List what's compiled in
Exit codes: 0 success, 1 runtime / I/O error, 2 usage / argument
error.
Cargo feature topology
[]
= ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]
= []
= ["alloc"] # by-name lookup, returns Box<dyn …>
= [] # no_std clean
= ["alloc"]
= ["deflate"]
= ["deflate"]
= ["alloc"]
= ["lzma"]
= ["lzma2"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
A bare --no-default-features build produces a library with just the
trait surface and the RLE algorithm — useful for the most constrained
embedded targets. Adding factory pulls in alloc and the runtime
dispatch helpers; adding any individual algorithm feature pulls in
whatever it needs.
The compcol binary is gated on features = ["factory"] so a
--no-default-features library build doesn't try to compile it.
Errors
compcol::Error is a single crate-wide enum so trait objects work
without GATs:
Development
The crate currently ships with ~355 tests across 17 test binaries,
including round-trip tests for every algorithm, cross-validation
against system gzip / xz / zstd / brotli / compress, and
hand-crafted hex fixtures for known corner cases.
License
MIT. © 2026 Karpeles Lab Inc.