rayfish 0.1.4 - Docs.rs

# Data-path microbenchmarks

Criterion benchmarks that time the CPU/allocation work rayfish does **per
forwarded packet**, in isolation from the network. They complement the Scaleway
end-to-end harness (`tests/bench/`): on a shared-vCPU box single-stream TCP is
loss/congestion-bound, which hides per-packet CPU savings, so these hold
everything else constant and measure only the data plane.

```bash
cargo bench                       # all benches
cargo bench --bench forward       # just this one
cargo bench --bench forward -- handoff   # filter by group/id
```

Criterion writes HTML reports + regression baselines under `target/criterion/`;
a second run prints `change: [...]` deltas vs the stored baseline.

## Groups (`benches/forward.rs`)

- **`handoff`** — the packet ownership transfer the zero-copy change touched,
  TX (TUN→peer) and RX (peer→TUN), old copy path vs current zero-copy, at 64 B
  and 1280 B (TUN MTU):
  - `tx_copy` = old `Bytes::copy_from_slice` (allocate + copy) ·
    `tx_zerocopy` = pooled `BytesMut::split_to(n).freeze()`
  - `rx_copy` = old `datagram.to_vec()` · `rx_zerocopy` = `Bytes` clone (refcount)
- **`firewall`** — `parse_packet_info` + `evaluate_packet`, the unavoidable
  per-packet work run once per direction. A regression guard
  (`parse_only`, `parse_eval_out_allow`, `parse_eval_in_whitelist`).

The `*_copy` variants reproduce the pre-optimization code so the delta to the
`*_zerocopy` variant is exactly the saving; they are bench-only fixtures, not
live code paths.