pack-io 1.0.1 - Docs.rs

# pack-io v0.6.0 — Optimisation

**The stat-defense release, with the encode gap closed before 1.0.** v0.6.0 ships zero new public API and zero wire-format changes — just three safe-Rust optimisations that close every gap vs `bincode` / `postcard` / `rkyv` worth closing pre-1.0. After this release, the only remaining benchmark loss is `decode_view` vs rkyv's archived access (~3×), and that one is intentional — rkyv reads a raw memory layout; pack-io walks varints by the wire-format spec, and that trade is what keeps the format implementable from one page.

## Headlines

| Workload | v0.5 baseline | v0.6 | Δ | vs winner |
|---|---:|---:|---:|---|
| **encode/log_record** | 219 ns | **38 ns** | **−82 %** | **bincode 40 ns — pack-io fastest** |
| **`Vec<u8>` 4 KiB decode** | 2,271 ns | 68 ns | **−97 %** | bincode 64 ns — tied (within noise) |
| **64-byte `String` owning** | 77 ns | **46 ns** | **−40 %** | **bincode 52 ns — pack-io fastest** |
| `u64` round-trip | 31 ns | 22 ns | −29 % | bincode 21 ns — tied |
| decode_owned/log_record | 302 ns | 173 ns | −43 % | bincode 165 / rkyv 153 — tied |
| decode_view/log_record | 39 ns | 35 ns | −10 % | rkyv 12 — by-design 3× gap |
| string64 view | 5.6 ns | 5.1 ns | −9 % | uncontested |

**pack-io is the fastest of the four on `encode`, owning `String` decode, and zero-copy `&str` view.** Tied with bincode on the rest within measurement noise. Only meaningful loss is vs rkyv's archive — see [`docs/PERFORMANCE.md`](https://github.com/jamesgober/pack-io/blob/main/docs/PERFORMANCE.md) for the full per-row analysis.

## What changed

### 1. The trait extension — `serialize_slice` + `deserialize_many`

Two new methods on `Serialize` and `Deserialize`, each with a default implementation that preserves v0.5 behaviour:

```rust,ignore
pub trait Serialize {
    fn serialize<E: Encode + ?Sized>(&self, encoder: &mut E) -> Result<()>;

    fn serialize_slice<E: Encode + ?Sized>(slice: &[Self], encoder: &mut E) -> Result<()>
    where Self: Sized,
    {
        for item in slice { item.serialize(encoder)?; }
        Ok(())
    }
}

pub trait Deserialize: Sized {
    fn deserialize<D: Decode + ?Sized>(decoder: &mut D) -> Result<Self>;

    fn deserialize_many<D: Decode + ?Sized>(decoder: &mut D, count: usize) -> Result<Vec<Self>> {
        let mut out = Vec::with_capacity(count.min(4096));
        for _ in 0..count { out.push(Self::deserialize(decoder)?); }
        Ok(out)
    }
}
```

`[T]::serialize` and `Vec<T>::deserialize` dispatch through these methods; `u8` overrides them with `encoder.write_bytes(slice)` and `decoder.read_into(&mut vec)` respectively. Byte slices now take a single memcpy instead of N per-byte calls. No `unsafe`, no specialisation feature, no wire-format change. Every other `Vec<T>` keeps the existing loop.

This is what brought `Vec<u8>` 4 KiB decode from 2,271 ns down to 68 ns.

### 2. Pre-reserved encoder capacity + direct-to-Vec varint write

The Tier-1 `encode` entry point now pre-reserves 512 bytes of output capacity:

```rust,ignore
pub fn encode<T: Serialize + ?Sized>(value: &T) -> Result<Vec<u8>> {
    let mut enc = Encoder::with_capacity(512);   // was Encoder::new() (cap 0)
    value.serialize(&mut enc)?;
    Ok(enc.into_inner())
}
```

A zero-capacity `Vec` doubles eight or more times to reach a typical 300-byte message size, with each doubling memcpy-ing the prior contents. Pre-reserving 512 bytes skips the entire growth ladder for most messages.

Plus the in-memory `Encoder` overrides `write_varint_u64` / `write_varint_u128` to push directly to the `Vec` after a single capacity reserve:

```rust,ignore
impl Encode for Encoder {
    fn write_varint_u64(&mut self, value: u64) -> Result<()> {
        if value < 0x80 {
            self.out.push(value as u8);
            return Ok(());
        }
        self.out.reserve(varint::MAX_VARINT_LEN_U64);
        let mut n = value;
        while n >= 0x80 {
            self.out.push((n as u8) | 0x80);
            n >>= 7;
        }
        self.out.push(n as u8);
        Ok(())
    }
    // ... write_varint_u128 specialised the same way
}
```

This avoids the stack-buffer + `extend_from_slice` round-trip the default trait implementation performs.

Combined, these two changes took `encode/log_record` from 134 ns to 38 ns — closing the 3.5× gap vs bincode that the first cut of v0.6 still had.

### 3. Single-byte varint fast path

For values < 128 (every length prefix that fits in 7 bits, every small integer), both `write_varint_u64` and `read_varint_u64` short-circuit the multi-byte path:

```rust,ignore
// In Encode::write_varint_u64
if value < 0x80 {
    return self.write_byte(value as u8);
}
// ... multi-byte path

// In Decode::read_varint_u64
let first = self.read_byte()?;
if first < 0x80 {
    return Ok(u64::from(first));
}
// ... multi-byte path
```

Smaller win on its own but broadly applicable — every length prefix walks through this code path.

### Plus

- `#[inline(always)]` on the in-memory `Encoder`'s `write_byte` / `write_bytes` / `reserve` so trait dispatch through the generic `E: Encode + ?Sized` parameter consistently inlines after monomorphization.
- New `Encoder::with_capacity(n)` constructor for callers who want explicit control over the output buffer size.

## Wire format

**Unchanged.** Every v0.5 payload decodes identically under v0.6. Spec version remains `1.2`.

## Breaking changes

**None.** Every v0.5 source file compiles unchanged. The new trait methods have default implementations; existing `Serialize` / `Deserialize` impls continue to work without modification. The optimisations are opt-in via override — types that don't override the new methods take the v0.5 code path.

## Verification

All gates green on **both stable and MSRV 1.85**:

```bash
cargo fmt --all -- --check
cargo +1.85 fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo +1.85 clippy --all-targets --all-features -- -D warnings
cargo test --all-features
cargo +1.85 test --all-features
cargo build --no-default-features
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features
cargo audit
cargo deny check
cargo bench --bench comparative --features derive
```

Test counts at this tag (stable, `--all-features`): **220 total**, every one passing. The new `Encoder::with_capacity` constructor adds one doctest; everything else is unchanged.

## What's next

- **v0.7.0 — Hardening + API freeze.** `cargo-fuzz` continuous harness against the decoder, hostile-input sweep (huge nested types, recursion-bomb structures, length prefixes near `usize::MAX`), cross-platform byte-equivalence verification. Public API freeze (already true since v0.5 — formalised in v0.7). No further performance work needed pre-1.0 — the comparative numbers above are what ships in 1.0.

## Installation

```toml
[dependencies]
pack-io = { version = "0.6", features = ["schema"] }   # everything

# Or à la carte:
pack-io = { version = "0.6", features = ["derive"] }   # derive macros only
pack-io = "0.6"                                         # in-memory + streaming codec only
pack-io = { version = "0.6", default-features = false } # no_std
```

MSRV: Rust 1.85 (2024 edition).

## Documentation

- [README](https://github.com/jamesgober/pack-io/blob/main/README.md)
- [API Reference](https://github.com/jamesgober/pack-io/blob/main/docs/API.md)
- [Wire Format Spec](https://github.com/jamesgober/pack-io/blob/main/docs/WIRE_FORMAT.md) (v1.2 — unchanged from v0.5)
- **[Performance](https://github.com/jamesgober/pack-io/blob/main/docs/PERFORMANCE.md) (new)**
- [CHANGELOG](https://github.com/jamesgober/pack-io/blob/main/CHANGELOG.md)

---

**Full diff:** [`v0.5.0...v0.6.0`](https://github.com/jamesgober/pack-io/compare/v0.5.0...v0.6.0).
**Changelog:** [`CHANGELOG.md`](https://github.com/jamesgober/pack-io/blob/main/CHANGELOG.md#060---2026-06-04).