compcol
A collection of compression algorithms in pure Rust.
compcol puts every supported algorithm — RLE, deflate, zlib, gzip,
LZMA, xz, Zstandard, Brotli, LZ4, Snappy, LZW, LZO, LZX, Quantum, plus
decoders for RAR 1/2/3/5 — behind one uniform streaming trait, with
each algorithm gated by its own Cargo feature so downstream crates
only pay for what they pull in. A runtime by-name factory makes
algorithms selectable from configuration or a CLI flag, and a
compcol binary turns the library into a Unix-style filter.
Design principles
- Pure Rust. No
bindgen, no FFI, no C dependencies. The crate has zero runtime dependencies — nothing in[dependencies]. - 100% safe.
unsafe_code = "forbid"is set crate-wide; the library never opts out. no_std. The library is#![no_std].allocis used by everything except the bare-bonesrlealgorithm; algorithms that need large windows or work buffers pull inallocautomatically.- Streaming. The caller owns both buffers; the codec preserves its state across calls. Works in a 1-byte-on-both-sides streaming loop.
- Per-algorithm features.
default = ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]. Everything else is opt-in. allmeta-feature.features = ["all"]is a single name that enables every algorithm — useful for downstream crates and the CLI install command instead of a 20-item feature list.
Supported algorithms
| Algorithm | Feature | Extension | Encoder | Decoder | Cross-validation |
|---|---|---|---|---|---|
| RLE | rle |
.rle |
full | full | — |
| Deflate (RFC 1951) | deflate |
.deflate |
full (lazy LZ77 + dynamic / fixed / stored Huffman; cross-block matching) | full | python3 -c "import zlib" |
| Zlib (RFC 1950) | zlib |
.zz |
full | full | python3 -c "import zlib" |
| Gzip (RFC 1952) | gzip |
.gz |
full | full | gzip(1) |
| LZ4 block format | lz4 |
.lz4 |
LZ77 hash matcher | full | — |
| Snappy | snappy |
.sz |
LZ77 hash matcher (raw block format) | full | — |
LZW (compress(1) .Z) |
lzw |
.lzw |
full | full | compress(1) / uncompress(1) |
LZMA (legacy .lzma) |
lzma |
.lzma |
full | full | python3 -m lzma (FORMAT_ALONE) |
| xz | xz |
.xz |
compressed-LZMA2 chunks + uncompressed fallback | full envelope + all reset variants | xz(1) both directions |
| Zstandard (RFC 8478) | zstd |
.zst |
LZ77 + Huffman literals + FSE_Compressed_Mode sequences + repeat offsets + RLE blocks | full Compressed_Block | zstd(1) both directions |
| Brotli (RFC 7932) | brotli |
.br |
LZ77 + length-limited Huffman + 704-symbol IC alphabet + static-dictionary refs | full (with 122 KiB static dictionary) | brotli(1) both directions |
| LZO (LZO1X-1) | lzo |
.lzo |
LZ77 hash matcher | full | python3 -c "import lzo" |
| LZX (Microsoft CAB / WIM) | lzx |
.lzx |
uncompressed blocks only | full (verbatim + aligned-offset + uncompressed; E8 filter) | — |
| Quantum (Stac, old CAB) | quantum |
.q |
Unsupported (no public encoder exists) |
full (libmspack-equivalent) | libmspack regression fixtures |
| LZFSE (Apple) | lzfse |
.lzfse |
Unsupported (decoder-only) |
bvx- raw + bvxn (LZVN); bvx2 returns Unsupported |
hand-built fixtures (no Apple toolchain bundled) |
| ADC (Apple DMG) | adc |
.adc |
LZSS-style greedy match-finder | full | hand-built fixtures |
| RAR 1.x | rar1 |
.rar |
Unsupported (license) |
building blocks only (Huffman tables not license-clean) | — |
| RAR 2.x | rar2 |
.rar |
Unsupported (license) |
full LZ77+Huffman + audio predictor | real rar-2.60 fixtures |
| RAR 3.x | rar3 |
.rar |
Unsupported (license) |
full LZ77+Huffman + E8 filter; PPMd & VM filters refused | libarchive RAR3 fixtures |
| RAR 5.x | rar5 |
.rar |
Unsupported (license) |
full LZ77+Huffman + x86 filter; Delta/ARM refused | RARLAB-CLI fixtures |
The RAR encoders are permanently Unsupported per RARLAB's unRAR
license terms (every clean-room RAR reader — libarchive, The
Unarchiver, 7-Zip — ships decoder-only for the same reason).
Every other algorithm decodes real-world output from its reference toolchain and produces output that the same reference toolchain accepts. Some encoders (zstd, brotli) lag the reference's compression ratio because they skip features like FSE-compressed Huffman weight tables (zstd) or encoder-side static-dictionary lookups for non-English text (brotli); the wire format is always conformant.
Library usage
# Cargo.toml
[]
= { = "0.1", = ["gzip", "factory"] }
The trait
use ;
Streaming a round-trip
use ;
use ;
let input = b"hello world hello world hello world";
// Encode.
let mut enc = new;
let mut buf = ;
let mut encoded = Vecnew;
let p = enc.encode.unwrap;
encoded.extend_from_slice;
loop
// Decode.
let mut dec = new;
let mut decoded = Vecnew;
let p = dec.decode.unwrap;
decoded.extend_from_slice;
let p = dec.finish.unwrap;
decoded.extend_from_slice;
assert!;
assert_eq!;
Runtime selection via the factory
use ;
let mut enc = encoder_by_name
.expect;
let mut out = ;
let p = enc.encode.unwrap;
// ...
println!;
factory::extension(name) returns the conventional file extension for
each algorithm (e.g. "gz" for gzip, "zst" for zstd).
Skipping decompressed bytes
Useful for tar-style archive browsing — read a header, skip past the file body, read the next header:
use Decoder;
use Decoder as _;
let mut dec = new;
// Skip past the first 100 decompressed bytes…
let p = dec.skip.unwrap;
// …then decode the next 50:
let mut out = ;
let p = dec.decode.unwrap;
The default skip implementation just reads-and-discards through a
small scratch buffer, so it works for every algorithm. Individual
decoders are free to override with a smarter implementation when the
format allows it (e.g. fast-forwarding through stored deflate blocks
without LZ77 expansion).
CLI usage
The compcol binary ships with the crate. Install with:
…or pick a subset:
Usage: compcol -t ALGO [OPTIONS] [INPUT]
Required:
-t, --type ALGO Algorithm (use --list to see what's compiled in)
Mode:
-d, --decompress Decompress instead of compress
Output (mutually exclusive):
-c, --stdout Write to stdout, keep input file
-o, --output PATH Write to PATH
(default, INPUT given) Write to <INPUT>.<ext> on compress, or strip
<ext> on decompress; remove INPUT on success
(default, no INPUT) Read stdin, write stdout
Misc:
-k, --keep Keep input file even in in-place mode
-f, --force Overwrite an existing output file
-L, --list List available algorithms and exit
-V, --version Print version and exit
-h, --help Print this help and exit
Examples
# Pipe-style use (gzip via stdin → stdout)
|
# In-place compression (mirrors gzip(1) semantics: removes the original)
# Keep the original
# Decompress
# Force overwrite of an existing output file
# Round-trip into a pager
|
# Mix algorithms
# List what's compiled in
Exit codes: 0 success, 1 runtime / I/O error, 2 usage / argument
error.
Cargo feature topology
[]
= ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]
# Meta-feature: pulls in every algorithm. Equivalent to `--all-features`.
= ["alloc", "factory",
"rle", "deflate", "zlib", "gzip",
"lzma", "xz",
"zstd", "brotli", "lz4", "snappy", "lzw",
"lzo", "lzx", "quantum", "lzfse", "adc",
"rar1", "rar2", "rar3", "rar5"]
= []
= ["alloc"] # by-name lookup, returns Box<dyn …>
= [] # no_std clean (alloc not required)
= ["alloc"]
= ["deflate"]
= ["deflate"]
= ["alloc"]
= ["lzma"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"] # decoder-only, bvx2 returns Unsupported
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
= ["alloc"]
A bare --no-default-features build produces a library with just the
trait surface — useful for the most constrained embedded targets.
Adding rle gives an algorithm that doesn't need alloc. Adding any
other algorithm feature pulls in alloc and the codec.
features = ["all"] enables every algorithm and is the most ergonomic
choice when you don't know in advance which formats you'll see.
The compcol binary is gated on features = ["factory"] so a
--no-default-features library build doesn't try to compile it.
Errors
compcol::Error is a single crate-wide enum so trait objects work
without GATs:
Development
The crate currently ships with ~566 tests across 23 test binaries,
including round-trip tests for every algorithm with an encoder,
cross-validation against system gzip / xz / zstd / brotli /
compress / lz4 / python3 lzo / python3 lzma, and hand-crafted
hex fixtures for every decoder-only format (RAR 2/3/5, Quantum, LZX).
A simple benchmark harness lives at examples/bench.rs. Run it with:
It measures each compiled-in algorithm's encoder/decoder throughput
and compression ratio on a small fixed corpus and compares against
the system reference when one is installed. A snapshot of the output
is kept in BENCH.md.
License
MIT. © 2026 Karpeles Lab Inc.