compcol 0.6.0

A no_std collection of compression algorithms behind a uniform streaming trait, gated per-algorithm by Cargo features.
# Contributing to compcol

Thanks for your interest. This crate has a few hard constraints that every
change must respect — please read them before opening a PR.

## Non-negotiable crate properties

- **`no_std`.** The library is `#![no_std]`. Use `core` and, when you need
  the heap, `alloc` (gated behind the `alloc` feature). Never reach for
  `std` outside the `std`/`tokio` feature-gated modules (`compcol::io`,
  `compcol::tokio_io`).
- **`#![forbid(unsafe_code)]`.** No `unsafe`, anywhere, ever. It is set
  crate-wide via `[lints.rust] unsafe_code = "forbid"`.
- **Zero runtime dependencies.** `[dependencies]` carries only `tokio`,
  and that is optional (pulled in solely by the `tokio` feature). Do not
  add runtime dependencies. Dev-dependencies for tests are acceptable when
  justified.

## DoS discipline (decoders)

Decoders process **untrusted input** and must never panic, read out of
bounds, or allocate unboundedly on malformed data:

- Use **checked arithmetic** (`checked_add`, `checked_mul`, …) and
  bounds-checked indexing/slicing — no `unwrap()`/`expect()` on
  attacker-controlled lengths, offsets, or counts.
- **Reject** malformed input by returning an appropriate
  [`crate::Error`] variant (`Corrupt`, `BadHeader`, `InvalidHuffmanTree`,
  `InvalidDistance`, `Unsupported`, …) — never by panicking.
- **Bound output.** Don't pre-allocate from an attacker-supplied size
  field without a sanity cap. Output limiting for callers is provided by
  `compcol::limit::LimitedDecoder`; your decoder must still not blow up
  internally.
- Every decoder gets a fuzz target asserting "no panic on arbitrary
  input" (see below).

## Adding a new codec

1. **Module:** create `src/<codec>/` (or `src/<codec>.rs` for a tiny one).
2. **Internal traits:** implement the private `RawEncoder` / `RawDecoder`
   from `src/traits.rs`. A blanket impl auto-derives the public
   `Encoder` / `Decoder` from these — do **not** implement the public
   traits directly. If the format has no encoder (decoder-only, or
   license-restricted), the encoder's `raw_encode`/`raw_finish` return
   `Error::Unsupported`.
3. **Marker type:** add a zero-sized type implementing `Algorithm`
   (`const NAME`, `type Encoder/Decoder`, `type EncoderConfig/DecoderConfig`,
   `encoder_with`/`decoder_with`). Use `()` for configs with no tunables.
   See `src/rle.rs` for the smallest complete example.
4. **Cargo feature:** add a `<codec> = ["alloc"]` entry in `Cargo.toml`
   (or `= []` if genuinely `alloc`-free like `rle`), with a doc comment,
   and add the feature to the `all` meta-feature list.
5. **Declare it:** add `#[cfg(feature = "<codec>")] pub mod <codec>;` to
   `src/lib.rs`.
6. **Register it:** in `src/factory.rs`, add `#[cfg(feature = "<codec>")]`
   arms for the marker type's `NAME` in `encoder_by_name`, `decoder_by_name`
   (and `encoder_by_name_with_level` if it has a level), and add it to both
   `names()` and `extension()` so it shows up in by-name listing and gets a
   CLI output suffix.
7. **Tests:** add `tests/<codec>.rs` (round-trip if there's an encoder;
   reference-tool cross-validation or hex fixtures for decoder-only
   formats).
8. **Fuzz:** add `fuzz/fuzz_targets/decoder_<codec>.rs` driving the
   decoder over arbitrary bytes and asserting it never panics (copy an
   existing target, e.g. `decoder_lz4.rs`, as the template).

## Clean-room / licensing policy

The crate is MIT and stays clean-room:

- Implement codecs from **public specifications** and facts-only
  functional descriptions only.
- **Do not copy code or data tables** from LGPL/GPL sources — notably
  The Unarchiver / XADMaster — or from RARLAB's `unRAR` distribution.
  RARLAB's license forbids using its source to recreate the RAR
  *compression* algorithm, which is why every RAR encoder is permanently
  `Unsupported`.
- If a codec genuinely needs a fixed interoperability table that exists
  only in a license-incompatible source, treat it like the existing
  `rar1` / StuffIt situations: ship the well-defined building blocks
  clean-room, keep any non-clean-room table out of the spec-derived
  material, and either leave the decoder `Unsupported` (the `rar1` case)
  or supply the table as a separately-licensed, maintainer-sanctioned
  adjunct kept out of the clean-room corpus (the `sit13` case). Document
  the provenance in the module docs, as `src/rar1/` and `src/sit13/` do.

## CI gates (must pass)

CI (`.github/workflows/ci.yml`) runs on Linux/macOS/Windows and denies
warnings. Before pushing:

```sh
cargo fmt --all --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-features
cargo build --no-default-features                       # bare no_std
cargo build --no-default-features --features all        # every algo, still no_std
RUSTDOCFLAGS="-D warnings -D rustdoc::broken-intra-doc-links" cargo doc --no-deps --all-features
```

CI also runs clippy on narrow feature subsets (`lz4`-only, `zstd`-only)
to catch dead-code regressions in the shared `bits` / `checksum` /
`huffman` modules — keep those `#[cfg]`-gated correctly.

Do not edit `CHANGELOG.md` in your PR; releases are managed by
release-plz.