compcol 0.1.0

A no_std collection of compression algorithms behind a uniform streaming trait, gated per-algorithm by Cargo features.
Documentation

compcol

A collection of compression algorithms in pure Rust.

compcol puts every supported algorithm — RLE, deflate, zlib, gzip, LZMA, LZMA2, xz, Zstandard, Brotli, LZ4, Snappy, LZW — behind one uniform streaming trait, with each algorithm gated by its own Cargo feature so downstream crates only pay for what they pull in. A runtime by-name factory makes algorithms selectable from configuration or a CLI flag, and a compcol binary turns the library into a Unix-style filter.

Design principles

  • Pure Rust. No bindgen, no FFI, no C dependencies. The crate has zero runtime dependencies — nothing in [dependencies].
  • 100% safe. unsafe_code = "forbid" is set crate-wide; the library never opts out.
  • no_std. The library is #![no_std]. alloc is used by everything except the bare-bones rle algorithm; algorithms that need large windows or work buffers pull in alloc automatically.
  • Streaming. The caller owns both buffers; the codec preserves its state across calls. Works in a 1-byte-on-both-sides streaming loop.
  • Per-algorithm features. default = ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]. Everything else is opt-in.

Supported algorithms

Algorithm Feature Extension Encoder Decoder Cross-validation
RLE rle .rle full full
Deflate (RFC 1951) deflate .deflate full (dynamic Huffman) full python3 -c "import zlib"
Zlib (RFC 1950) zlib .zz full full python3 -c "import zlib"
Gzip (RFC 1952) gzip .gz full full gzip(1)
LZ4 block format lz4 .lz4 LZ77 hash matcher full
Snappy snappy .sz LZ77 hash matcher (raw block format) full
LZW (compress(1) .Z) lzw .lzw full full compress(1) / uncompress(1)
LZMA (legacy .lzma) lzma .lzma full full python3 -m lzma (FORMAT_ALONE)
LZMA2 lzma2 .lzma2 LZMA-compressed chunks + uncompressed fallback full (0xE0 + uncompressed) xz --format=raw --lzma2 -d
xz xz .xz compressed-LZMA2 chunks + uncompressed fallback full envelope + all reset variants xz(1) both directions
Zstandard (RFC 8478) zstd .zst LZ77 + FSE sequences (Predefined tables) + Raw literals full Compressed_Block zstd(1) both directions
Brotli (RFC 7932) brotli .br LZ77 + length-limited Huffman trees + 704-symbol IC alphabet full (with 122 KiB static dictionary) brotli(1) both directions

Every algorithm decodes real-world output from its reference toolchain and produces output that the same reference toolchain accepts. Some encoders (zstd, brotli) ship without Huffman-encoded literals or custom FSE tables — they emit valid compressed-format streams that are weaker on compression ratio than zstd -1 / brotli -q1 (typically within 1.3–1.5× on text, falling further behind on highly repetitive input where reference tools use RLE blocks).

Library usage

# Cargo.toml
[dependencies]
compcol = { version = "0.1", features = ["gzip", "factory"] }

The trait

use compcol::{Algorithm, Encoder, Decoder, Progress, Error};

pub struct Progress {
    pub consumed: usize,  // bytes read from input
    pub written:  usize,  // bytes written to output
    pub done:     bool,   // true once finish() has fully drained
}

pub trait Encoder {
    fn encode(&mut self, input: &[u8], output: &mut [u8]) -> Result<Progress, Error>;
    fn finish(&mut self, output: &mut [u8]) -> Result<Progress, Error>;
    fn reset(&mut self);
}

pub trait Decoder {
    fn decode(&mut self, input: &[u8], output: &mut [u8]) -> Result<Progress, Error>;
    fn finish(&mut self, output: &mut [u8]) -> Result<Progress, Error>;
    fn reset(&mut self);

    /// Advance the decompressed stream by up to `n` bytes without
    /// emitting them. Default impl reads-and-discards through a small
    /// scratch buffer; algorithms can override for cheaper skipping.
    fn skip(&mut self, input: &[u8], n: usize) -> Result<Progress, Error>;
}

pub trait Algorithm {
    const NAME: &'static str;
    type Encoder: Encoder;
    type Decoder: Decoder;
    fn encoder() -> Self::Encoder;
    fn decoder() -> Self::Decoder;
}

Streaming a round-trip

use compcol::gzip::{Encoder, Decoder};
use compcol::{Encoder as _, Decoder as _};

let input = b"hello world hello world hello world";

// Encode.
let mut enc = Encoder::new();
let mut buf = [0u8; 256];
let mut encoded = Vec::new();

let p = enc.encode(input, &mut buf).unwrap();
encoded.extend_from_slice(&buf[..p.written]);
loop {
    let p = enc.finish(&mut buf).unwrap();
    encoded.extend_from_slice(&buf[..p.written]);
    if p.done { break; }
}

// Decode.
let mut dec = Decoder::new();
let mut decoded = Vec::new();
let p = dec.decode(&encoded, &mut buf).unwrap();
decoded.extend_from_slice(&buf[..p.written]);
let p = dec.finish(&mut buf).unwrap();
decoded.extend_from_slice(&buf[..p.written]);
assert!(p.done);
assert_eq!(decoded, input);

Runtime selection via the factory

use compcol::{factory, Encoder as _, Decoder as _};

let mut enc = factory::encoder_by_name("gzip")
    .expect("gzip not compiled in");

let mut out = [0u8; 1024];
let p = enc.encode(b"hello", &mut out).unwrap();
// ...

println!("available algorithms: {:?}", factory::names());

factory::extension(name) returns the conventional file extension for each algorithm (e.g. "gz" for gzip, "zst" for zstd).

Skipping decompressed bytes

Useful for tar-style archive browsing — read a header, skip past the file body, read the next header:

use compcol::gzip::Decoder;
use compcol::Decoder as _;

let mut dec = Decoder::new();
// Skip past the first 100 decompressed bytes…
let p = dec.skip(&compressed[..], 100).unwrap();
// …then decode the next 50:
let mut out = [0u8; 50];
let p = dec.decode(&compressed[p.consumed..], &mut out).unwrap();

The default skip implementation just reads-and-discards through a small scratch buffer, so it works for every algorithm. Individual decoders are free to override with a smarter implementation when the format allows it (e.g. fast-forwarding through stored deflate blocks without LZ77 expansion).

CLI usage

The compcol binary ships with the crate. Install with:

cargo install --path . --features "gzip,zlib,deflate,rle,lz4,snappy,lzw,xz,zstd,brotli,lzma,lzma2,factory"

(or pick whichever subset of algorithms you want).

Usage: compcol -t ALGO [OPTIONS] [INPUT]

Required:
    -t, --type ALGO         Algorithm (use --list to see what's compiled in)

Mode:
    -d, --decompress        Decompress instead of compress

Output (mutually exclusive):
    -c, --stdout            Write to stdout, keep input file
    -o, --output PATH       Write to PATH
    (default, INPUT given)  Write to <INPUT>.<ext> on compress, or strip
                            <ext> on decompress; remove INPUT on success
    (default, no INPUT)     Read stdin, write stdout

Misc:
    -k, --keep              Keep input file even in in-place mode
    -f, --force             Overwrite an existing output file
    -L, --list              List available algorithms and exit
    -V, --version           Print version and exit
    -h, --help              Print this help and exit

Examples

# Pipe-style use (gzip via stdin → stdout)
cat README.md | compcol -t gzip > README.md.gz

# In-place compression (mirrors gzip(1) semantics: removes the original)
compcol -t gzip README.md            # → README.md.gz, removes README.md

# Keep the original
compcol -t gzip -k README.md         # → README.md.gz, keeps README.md

# Decompress
compcol -t gzip -d README.md.gz      # → README.md, removes README.md.gz

# Force overwrite of an existing output file
compcol -t gzip -f README.md

# Round-trip into a pager
compcol -t xz -d archive.xz -c | less

# Mix algorithms
compcol -t zstd payload.bin          # → payload.bin.zst
compcol -t brotli payload.bin        # → payload.bin.br

# List what's compiled in
compcol --list

Exit codes: 0 success, 1 runtime / I/O error, 2 usage / argument error.

Cargo feature topology

[features]
default = ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]
alloc   = []
factory = ["alloc"]            # by-name lookup, returns Box<dyn …>
rle     = []                   # no_std clean
deflate = ["alloc"]
zlib    = ["deflate"]
gzip    = ["deflate"]
lzma    = ["alloc"]
lzma2   = ["lzma"]
xz      = ["lzma2"]
zstd    = ["alloc"]
brotli  = ["alloc"]
lz4     = ["alloc"]
snappy  = ["alloc"]
lzw     = ["alloc"]

A bare --no-default-features build produces a library with just the trait surface and the RLE algorithm — useful for the most constrained embedded targets. Adding factory pulls in alloc and the runtime dispatch helpers; adding any individual algorithm feature pulls in whatever it needs.

The compcol binary is gated on features = ["factory"] so a --no-default-features library build doesn't try to compile it.

Errors

compcol::Error is a single crate-wide enum so trait objects work without GATs:

pub enum Error {
    Corrupt,             // generic malformed input
    UnexpectedEnd,       // finish() called mid-stream
    OutputTooSmall,      // codec has a minimum atomic output size
    BadHeader,           // container header malformed
    InvalidBlockType,    // deflate BTYPE=3, etc.
    InvalidHuffmanTree,  // code lengths violate Kraft inequality
    InvalidDistance,     // LZ77 back-reference out of range
    ChecksumMismatch,    // Adler-32 / CRC-32 mismatch
    TrailerMismatch,     // gzip ISIZE doesn't match output length
    Unsupported,         // option / mode this build doesn't implement
}

Development

cargo build                                                      # builds lib + bin
cargo build --no-default-features                                # bare no_std lib
cargo build --no-default-features --features <algo>              # narrow build

cargo test --all-features                                        # full test suite
cargo clippy --all-features --all-targets -- -D warnings         # lint clean
cargo fmt --all --check                                          # format clean

The crate currently ships with ~355 tests across 17 test binaries, including round-trip tests for every algorithm, cross-validation against system gzip / xz / zstd / brotli / compress, and hand-crafted hex fixtures for known corner cases.

License

MIT. © 2026 Karpeles Lab Inc.