oboron 0.9.0 - Docs.rs

# obcrypt-refactor: Performance Comparison

Side-by-side benchmark of `oboron` before and after the refactor that
moves the cryptographic core out into a separate `obcrypt-rs` crate.

The refactor keeps obtext encoding, format-string parsing, UTF-8
validation, and z-tier inside `oboron`; only the byte-level crypto
(per-scheme encrypt/decrypt for `a`-tier and `u`-tier) is delegated to
`obcrypt`. Framing (the 2-byte scheme marker XOR'd with the first
ciphertext byte) is applied inline in oboron's `enc_to_format` /
`dec_from_format` / `dec_any_scheme` and in the `codec.rs`
static-dispatch macros — both call `obcrypt::schemes::*::{encrypt,
decrypt}` directly rather than going through obcrypt's framed
`encrypt_into` / `decrypt_into`, because the framed APIs add a small
amount of indirection that shows up on this hot path even with LTO.

## Build profile

The refactor introduces a workspace → path-dep boundary
(`obcrypt = { path = "../obcrypt-rs" }`). Without LTO, the compiler
can't inline across that boundary, which is fatal on a hot path where
the AEAD call is just a few hundred nanoseconds. Pre-refactor, the
`oboron` package set `[profile.release] lto = true` — but Cargo
ignores per-package profiles on non-root members, so it was a no-op.
The refactor hoists the profile to **workspace level**, so LTO is
genuinely active for both release and bench builds:

```toml
# Cargo.toml (workspace root)
[profile.release]
opt-level = 3
lto = true
codegen-units = 1

[profile.bench]
inherits = "release"
```

All benchmark numbers below are measured under this profile on both
sides of the comparison.

## Method

- Hardware: same machine, runs back-to-back.
- Harness: criterion, `--warm-up-time 1 --measurement-time 2
  --sample-size 30`.
- 16-byte plaintext, Crockford base32 (`c32`) encoding.
- Two probe schemes:
  - `aasv` — deterministic AES-SIV, representative AEAD scheme.
  - `mock1` — identity (no crypto), isolates pure layering /
    cross-crate overhead from AEAD cost.

## Results — `Omnib` dynamic path (post-bypass + workspace LTO)

| Scheme | Op      | master+LTO | branch+LTO | Δ %    | Verdict           |
|--------|---------|-----------:|-----------:|-------:|-------------------|
| aasv   | enc     |   338.2 ns |   340.0 ns |  +0.5% | noise             |
| aasv   | dec     |   344.9 ns |   338.8 ns |  −1.8% | improved (p<.05)  |
| aasv   | autodec |   443.0 ns |   438.3 ns |  −1.1% | noise             |
| mock1  | enc     |    78.8 ns |    75.4 ns |  −4.3% | noise             |
| mock1  | dec     |    40.8 ns |    43.9 ns |  +7.5% | regressed (p<.05) |
| mock1  | autodec |    78.6 ns |    81.0 ns |  +3.0% | noise             |

For real-world AEAD schemes (`aasv`) the crypto cost dominates and
the layering overhead is invisible — all three ops within ~2% of
master. For `mock1` (no crypto, pure layering signal) the worst case
is dec at +7.5%, which is **+3 ns absolute** — the residual cost of
the cross-crate call into `obcrypt::schemes::mock1::decrypt` that LTO
doesn't fully erase, plus criterion noise.

## Why the path was re-shaped four times

This benchmark went through several rounds, each diagnostic of one
specific cost source:

1. **Initial refactor** (branch commit `4de102b`): dynamic path
   routed through `obcrypt::encrypt_into` / `decrypt_as` /
   `decrypt`. Caused ~5–11% regression on `Omnib::autodec` across
   all schemes. The framed obcrypt API adds an extra dispatch layer
   that the inliner couldn't collapse under default Cargo profile.
2. **Bypass framed API** (this commit's `enc.rs` / `dec.rs` /
   `dec_auto.rs`): call per-scheme `obcrypt::schemes::*::encrypt` /
   `decrypt` directly + apply marker framing inline. Recovered most
   of the dec/autodec regression but enc still regressed +5–7% on
   deterministic schemes.
3. **Per-scheme split in obcrypt** (obcrypt-rs commit `1cafca5`):
   per-scheme owned `encrypt` / `decrypt` had been routed through
   their `_into` counterparts, paying the `TailBuffer` indirection
   cost even for "give me a Vec" callers. Split each form so it
   uses the right primitive: owned calls the AEAD's own `encrypt` /
   `decrypt` (exact-capacity Vec, no adapter); `_into` keeps
   `TailBuffer` + `encrypt_in_place` for zero-extra-allocation.
4. **Workspace LTO** (this commit's `Cargo.toml`): moved
   `[profile.release] lto = true` from the (silently-ignored)
   `oboron` package profile to the workspace profile. This is the
   change that finally lets the inliner cross the
   workspace → path-dep boundary.

The mock1-isolation benchmark (per user suggestion) made the
diagnosis tractable — each round's residual cost was visible on
mock1 even when AEAD-scheme noise hid it.

## Decision

**Within noise on real schemes; ~7% worst-case (3 ns absolute) on
the mock1-only layering test.** Refactor is safe to merge once the
user is satisfied with the numbers. The branch stays unmerged until
that confirmation.

## Reproducing

```bash
# Baseline (master with the LTO workspace profile so it's a fair
# fight — the package-level [profile.release] in oboron/Cargo.toml
# is silently ignored).
git checkout master
# Temporarily add the same workspace-level profile to the workspace
# Cargo.toml; also add the mock1 specs to benchmarks_omnib.jsonl
# (see this branch's diff for both).
cargo bench -p oboron --bench omnib -- --warm-up-time 1 \
  --measurement-time 2 --sample-size 30 '(aasv|mock1)\.c32/16B$'

# Refactor
git checkout obcrypt-refactor
cargo bench -p oboron --bench omnib -- --warm-up-time 1 \
  --measurement-time 2 --sample-size 30 '(aasv|mock1)\.c32/16B$'
```