# v0.3 trait redesign — design + rollout plan
User-suggested trait redesign (see feedback in the conversation that
prompted this doc) — signed off as "yes go ahead as designed," with
config to be added to every algorithm in one shot.
The first naive attempt at implementation (now stashed locally as
`v0.3-wip-incomplete`) over-reached: the cross-algorithm bulk edits
broke too many test files at once for a single session to recover.
**This doc captures the design so the next round can dispatch parallel
agents — one per algorithm — to do the port cleanly, the same playbook
that worked for the round-1 / round-2 algorithm rollouts.**
## Why this change
The current trait:
```rust
pub struct Progress {
pub consumed: usize,
pub written: usize,
pub done: bool, // only meaningful from finish()
}
pub trait Encoder {
fn encode(&mut self, input: &[u8], output: &mut [u8]) -> Result<Progress, Error>;
fn finish(&mut self, output: &mut [u8]) -> Result<Progress, Error>;
fn reset(&mut self);
}
```
works, but asks the caller to infer "why did we stop?" from byte
counts:
- `consumed == input.len()` → input exhausted? (or coincidence)
- `written == output.len()` → output full? (or coincidence)
- `consumed == 0 && written == 0` → stall (no signal which side)
- `done` only fires from `finish` — dead field on `encode`/`decode`
That inference is exactly the `avail_in`/`avail_out` dance zlib makes
painful. We can do better.
## Proposed API
```rust
pub struct Progress {
pub consumed: usize,
pub written: usize,
// no `done` field
}
pub enum Status {
InputEmpty, // consumed all of input — feed more (or call finish)
OutputFull, // output buffer full — drain and call again
StreamEnd, // codec has reached logical end (encoder fully flushed,
// or decoder consumed the trailer); further calls return
// (Progress { 0, 0 }, StreamEnd). To reuse: reset().
}
pub trait Encoder {
fn encode(&mut self, input: &[u8], output: &mut [u8])
-> Result<(Progress, Status), Error>;
fn finish(&mut self, output: &mut [u8])
-> Result<(Progress, Status), Error>;
fn reset(&mut self);
}
pub trait Decoder {
fn decode(&mut self, input: &[u8], output: &mut [u8])
-> Result<(Progress, Status), Error>;
fn finish(&mut self, output: &mut [u8])
-> Result<(Progress, Status), Error>;
fn reset(&mut self);
/// Advance the decompressed stream by up to `n` decompressed bytes
/// without writing them to the caller. Best-effort: stops at input
/// exhaustion or after `n` bytes discarded.
fn discard_output(&mut self, input: &[u8], n: usize)
-> Result<(Progress, Status), Error>;
}
pub trait Algorithm {
const NAME: &'static str;
type Encoder: Encoder;
type Decoder: Decoder;
type EncoderConfig: Clone + Default;
type DecoderConfig: Clone + Default;
fn encoder() -> Self::Encoder
where Self::EncoderConfig: Default,
{
Self::encoder_with(Self::EncoderConfig::default())
}
fn encoder_with(config: Self::EncoderConfig) -> Self::Encoder;
fn decoder() -> Self::Decoder { /* mirror */ }
fn decoder_with(config: Self::DecoderConfig) -> Self::Decoder;
}
// Convenience layer (alloc-gated, on the crate root)
pub fn compress_to_vec<A: Algorithm>(input: &[u8]) -> Result<Vec<u8>, Error>;
pub fn decompress_to_vec<A: Algorithm>(input: &[u8]) -> Result<Vec<u8>, Error>;
```
### Per-algorithm `EncoderConfig`
Algorithms with no meaningful tunables use `type EncoderConfig = ();`.
For levelled formats:
| `deflate`/`zlib`/`gzip` | 1..=9 | 6 |
| `zstd` | 1..=22 | 3 |
| `brotli` | 0..=11 | 6 |
| `lzma`/`xz` | 0..=9 | 6 |
| `lz4` | 1 (fast) only — `()` config | — |
| `snappy` | 1 only — `()` config | — |
| `lzw` | 1 only — `()` config | — |
| `lzo` | `()` (LZO1X-1 fast path; future: LZO1X-999 high level) | — |
| `rle` | `()` | — |
| decoder-only (LZX, Quantum, RAR\*) | encoder is `Unsupported`; config is `()` | — |
The level field translates to encoder internals: chain-walk depth,
lazy-match threshold, block size, nice-match cutoff, etc. Whatever
the algorithm exposes as a quality knob.
### Other contract documentation
- **Post-error state**: after any `Err(_)` return, the codec is
*poisoned* — subsequent calls without an intervening `reset()` are
unspecified.
- **`reset()` preserves configuration**: levels, dictionaries, etc.
passed at construction time stay. To wipe config too, drop the
codec and construct a new one with `encoder()`.
- **`discard_output` is best-effort**: stops at input exhaustion or
after exactly `n` bytes discarded, whichever comes first; default
impl reads-and-discards through a scratch buffer.
### Bridge pattern (recommended implementation strategy)
Per the WIP attempt, the cleanest way to port without breaking
every algorithm at once is to introduce private `RawEncoder` /
`RawDecoder` traits with the old byte-counts-only shape, then a
blanket impl bridges them to the new public `Encoder` / `Decoder`:
```rust
#[doc(hidden)]
pub trait RawEncoder {
fn raw_encode(&mut self, ...) -> Result<RawProgress, Error>;
fn raw_finish(&mut self, ...) -> Result<RawProgress, Error>;
fn raw_reset(&mut self);
}
impl<T: RawEncoder> Encoder for T {
fn encode(...) -> Result<(Progress, Status), Error> {
let r = self.raw_encode(input, output)?;
let status = if r.consumed >= input.len() { Status::InputEmpty }
else { Status::OutputFull };
Ok((Progress { consumed: r.consumed, written: r.written }, status))
}
// …
}
```
This means the per-algorithm port is roughly:
1. `impl Encoder for X` → `impl RawEncoder for X`.
2. Method names get a `raw_` prefix.
3. `Result<Progress, Error>` → `Result<RawProgress, Error>`.
4. Inside the `Algorithm for X` impl, add `type EncoderConfig`,
`type DecoderConfig`, rename `encoder()` → `encoder_with(_)`,
`decoder()` → `decoder_with(_)`.
5. For levelled algorithms: actually USE the config to set chain
depth / nice-match / etc.
6. Update the algorithm's own tests to destructure
`(Progress, Status)` instead of just `Progress`, and to look
at `Status::StreamEnd` instead of `Progress::done`.
## Rollout plan (next round)
1. Land the trait scaffolding on master:
- New `Status` enum, new `Progress` (no `done`).
- Private `RawEncoder`/`RawDecoder` traits + blanket impls.
- `Algorithm` with new associated config types.
- `compress_to_vec` / `decompress_to_vec` helpers.
- Port `rle` as a pilot — proves the bridge and the test
migration pattern.
- Update `factory.rs`, CLI, bench to consume the new
`(Progress, Status)` return.
2. Push to master.
3. Launch parallel agents — one per algorithm — branched from this
commit, each charged with:
- Mechanical rename to `RawEncoder`/`RawDecoder` impls.
- Define `EncoderConfig`/`DecoderConfig` with at least a `level`
field where applicable; wire the level into the codec internals.
- Update the algorithm's `tests/<algo>.rs` file to destructure
`(Progress, Status)` and use `Status::StreamEnd`.
4. Apply patches.
5. Verify `cargo test --all-features` is green.
6. Bump version to `0.3.0`, write CHANGELOG, commit, push.
This is exactly the round-1 algorithm-rollout playbook applied to
trait surgery instead of algorithm implementation. Mechanical work
fans out cleanly to per-algorithm agents.
## What's in the WIP stash
`git stash list` shows `stash@{0}: On master: v0.3-wip-incomplete`.
That stash has:
- `src/traits.rs` — the new traits with the bridge implemented.
- Algorithm `impl Encoder/Decoder` → `impl RawEncoder/RawDecoder`
renamed across all algorithms.
- Algorithm `impl Algorithm` updated with `EncoderConfig = ()`,
`DecoderConfig = ()`, and `encoder() → encoder_with(_)`.
- CLI ported to the new `Status` enum.
- Most test files mass-edited (correctly for many, incorrectly for
patterns involving `unwrap_err`, `.skip()`, and bare `let x = call()`
capturing the `Result` rather than unwrapping).
The lib + CLI compile in the stash; ~500 errors in the tests/examples
need cleanup. Rather than fight through that in this session, the
agent-fan-out approach above is the right call.