pbfhogg 0.3.0 - Docs.rs

# Testing

Live tracker for pbfhogg's open test-infrastructure work. Cross-ref
`reference/performance.md` for perf baselines and TODO.md's
"Important: ignored tests" for the runbook on tests that don't run
by default. The historical narrative (what landed when, which
reorgs happened) lives in `git log -- notes/testing.md`.

## Test placement

Tests go where they couple least.

- **Inline unit tests** in `src/**/*.rs` `#[cfg(test)] mod tests` -
  module internals; die with the module on rewrite, which is the
  point.
- **Stable-API integration tests** in `tests/<topic>.rs` (e.g.
  `tests/roundtrip.rs`, `read_paths.rs`, `edge_cases.rs`) - use
  only the stable allowlist below.
- **CLI integration tests** in `tests/cli_*.rs` - drive the
  `pbfhogg` binary via `CliInvoker`; internal modules are invisible.
- **Fault-injection tests** in `tests/fault_*.rs` (one per binary) -
  own their `PANIC_AT_*` hooks per-process. Partially coupled to
  internal hook surfaces by design.
- **External cross-validation** in `brokkr verify`
  (`verify_<command>.rs` modules in brokkr) - compares pbfhogg
  output against osmium / osmosis / osmconvert.

Placement and tiering are independent axes.

## Validation tiers

Runtime-ranked. Each tier subsumes the cost of the previous; higher
tier = more expensive = run less often. `#[ignore]` is only a Cargo
mechanism, not the tiering model.

| Tier | Cost | When | Driven by |
|---|---|---|---|
| 1. Fast contracts | seconds | Every edit | `brokkr check` (default) |
| 2. Command slice | tens of seconds | While working on that command | `brokkr check --profile <cmd>` |
| 3. Full in-project | minutes | Before merge | `brokkr check --profile full` |
| 4. Scale/perf | hours | Performance work, release evidence | `brokkr bench`, `brokkr suite` |
| 5. External cross-validation | depends | Release gate | `brokkr verify` |

Tiers 1-3 are `brokkr check` profiles. Tiers 4-5 are separate
brokkr commands.

`mod platform` and `mod serial` are orthogonal config overlays, not
tiers. Same for `#[ignore = "external"]` (escape hatch for in-tree
osmium checks).

### Development contract

- **During refactor:** edit code + inline tests + any fault test
  whose hook moved. Run `brokkr check`. Green ⇒ refactor
  structurally landing. Tiers 2-5 stay silent during iteration.
- **Before merge:** run tiers 2-3 to confirm CLI behaviour through
  the stable surface.
- **Before release:** run tier 4 for performance regression and
  tier 5 for external parity.

Tier 1 must be both fast AND structurally complete: every internal
contract a refactor could break needs a tier-1 test (inline unit OR
fault hook). Tier 1 must not carry real-dataset cost (a 54 s
Denmark roundtrip belongs in tier 3).

Tiers 2-5 are by construction internal-refactor-immune: every test
goes through the stable allowlist or the CLI surface, so a rewrite
of `src/commands/<X>/` cannot break them by type changes alone.
That is the load-bearing property of the test reorg.

### Stable allowlist

Imports from this set do not couple a test to an internal module
shape:

- `pbfhogg::block_builder::{BlockBuilder, HeaderBuilder, MemberData, Metadata}`
- `pbfhogg::writer::{PbfWriter, Compression}`
- `pbfhogg::{BlobDecode, BlobError, BlobReader, BlobType, Element, ElementReader, ErrorKind, HeaderOverrides, MemberId, MemberType}`

Everything else is non-stable and requires CLI conversion.

### Per-command CLI test split

Each `tests/cli_<command>.rs` splits intent inside the file:

- **Root or `tier1` module**: small CLI contracts; runs in
  `brokkr check`.
- **`tier2`**: expanded matrices, `-j` parity, adversarial
  fixtures, scratch cleanup, command-local fault injection.
- **`tier3`**: slow self-contained correctness, larger fixtures.
- **`platform`**: `--direct-io`, `--io-uring`, MEMLOCK, feature
  surface - keep out of tier 1 unless cheap and reliable on the
  reference host.
- **`serial`**: requires `--test-threads=1` or static
  fault-injection state.

`tests/cli_sort.rs` is the canonical template for new `cli_*.rs`
files.

## External cross-validation

Lives in `brokkr verify`, not the in-tree test crate:

- The `VerifyHarness` template (`run_pbfhogg`, `run_tool`,
  `diff_pbfs`, `check_sorted`, dataset config, variant matrix,
  results storage) already exists. Each new comparison is
  `verify_<command>.rs`, ~50 lines.
- External tools (osmium, osmosis, osmconvert) are operationally a
  brokkr concern. Contributors don't need them installed for clean
  `brokkr check`.
- An in-tree `mod external` tier would duplicate brokkr's verify
  machinery and act as a gravity well.

**Two in-tree tests are migration candidates:**

1. `tests/merge.rs::merge_cross_validate_osmium` - real Denmark
   data, same inputs as `brokkr verify merge`. Retire once
   `verify_merge.rs` handles the version-vs-unconditional-delete
   tolerance that `tests/merge.rs:1271-1295` does explicitly
   (osmium uses version-based deletes; pbfhogg/osmosis/osmconvert
   delete unconditionally, so osmium-only elements that fall in
   the OSC delete set are not real failures).
2. `tests/cli_sort.rs::sort_cross_validate_osmium` - handcrafted
   overlapping-blob fixture; `brokkr verify sort` runs against
   real data only and doesn't exercise the streaming sweep merge's
   overlap-run path. Migration requires `brokkr verify <command>
   --input <path>` plus an `examples/overlapping_fixture.rs`
   builder, then move the comparison into `verify_sort.rs`.

**Escape hatch.** A contributor mid-PR can write an osmium check
next to a fixture for the duration of that PR as
`#[ignore = "external"]` in-tree with a runbook comment, then
convert to a `verify_*.rs` PR against brokkr afterward. Permitted
exception, not a tier - should not survive across many PRs.

## Conventions

- **`test-hooks` Cargo feature.** Gates fault-injection hooks across
  every parallel pipeline. Off by default; enabled under
  `--all-features` (which `brokkr check` uses). Release builds
  never see the hook code.
- **Two hook shapes.** Per-instance field on a public config struct
  (race-free with sibling tests) vs. process-global static atomics.
  Picker: per-instance when the pipeline has a public config struct
  on its entry path, static atomics otherwise. Per-binary
  isolation via the fault-injection split makes static-atomic hooks
  race-free without `#[ignore]` or `--test-threads=1`.
- **CliInvoker for CLI-driven tests.** `tests/common/cli.rs`. Every
  new `tests/cli_*.rs` goes through it. 60 s default wall-clock
  timeout, override via `.timeout(Duration)`. Platform-skip
  predicates `is_o_direct_unsupported`, `is_uring_unsupported`.
  Binary located via `BROKKR_TEST_BIN_DIR` (set per sweep) with
  fallback to `CARGO_TARGET_DIR + cfg!(debug_assertions)`.
- **Adversarial fixtures.** `tests/common/adversarial.rs` provides
  byte-level mutation primitives: `locate_blobs`,
  `mutate_blob_header_indexdata`, `mutate_blob_payload`,
  `truncate_to`. Hand-rolled varint reader, no internal pbfhogg
  imports.
- **Negative-id generators.** `tests/common/mod.rs` provides
  `generate_nodes_with_negatives`, `generate_ways_with_negatives`,
  `generate_relations_with_negatives` for mixed-sign fixtures (per
  `decisions/0002-negative-ids-rejected-project-wide.md`).
- **Scratch tracking.** `tests/common/mod.rs` exports `snapshot_dir`
  and `assert_scratch_unchanged` for before/after comparisons
  around error paths, plus `with_tracked_scratch_dir(scratch_root,
  expected_new_paths, f)` wrapper.
- **Property tests.** `tests/proptests.rs`, 64 cases per property.
  `proptest-regressions/` is gitignored to avoid committing
  case-specific reproducers.
- **Hook consolidation (explicitly don't).** Static-atomic submodules
  across parallel_writer / parallel_gzip / uring_writer /
  diff-parallel / derive-parallel / altw-stage3 / geocode-pass3
  must stay per-module. Per-binary isolation depends on each binary
  owning its own copy of the atomics.
- **Policy proposal (not-yet-adopted).** Every new parallel pipeline
  should ship with three tests: a worker-panic test, a `-j N` vs
  `-j 2` parity test, and a scratch-leak test. Worth considering as
  a CI gate.

## Open work

Work item IDs are stable. Cite by ID in commits / ADRs / other
notes.

### T02 - Indexdata-trust sites without regression tests

Three cluster-2 fixes ship regression tests; seven additional
indexdata-trust sites still lack direct coverage. Each can pick up
a regression test using the byte-level adversarial primitives in
`tests/common/adversarial.rs` when a regression appears or that
area gets touched:

- `renumber/pass1.rs:179`
- `renumber/wire_rewrite.rs:272`
- `renumber/stage2.rs:226-231`
- `altw/external/stage4.rs:438-478`
- `apply_changes/scanner.rs:162,188`
- `apply_changes/streaming.rs:496`
- `commands/inspect/show_element.rs:53-57`

**Sub-item: `count_varints_strict` surgical mutation primitive.**
`renumber_rejects_truncated_relation_blob_payload` currently chops
the last byte of the relation blob's PrimitiveBlock; renumber
rejects via the upstream protobuf walk before reaching
`count_varints_strict` (`src/commands/renumber/wire_rewrite.rs:556`).
To pin `count_varints_strict` specifically, extend
`tests/common/adversarial.rs` with a
`truncate_relation_memids(pbf, blob_idx, relation_idx)` primitive
(~30 lines walking PrimitiveBlock -> PrimitiveGroup -> Relation
field 9), then strengthen the test to assert the
`reframe_relations: ... memids|types` substring.

### T05 - Deferred parity surfaces

Three sites need lib-API plumbing of `jobs` before parity tests
can be added:

- ALTW external stage 4 worker count (currently hard-coded)
- Geocode Pass 1.5 / Pass 3 Stage A parallel degree
- `check --refs` (T09 below)

A future PR can plumb a `jobs` arg through whichever pipeline is
the focus and add the matching parity test then.

### T07 - Proptest backlog

Three deferred extensions:

- `parse_osc_file` proptest - the symbol takes a `&Path`, not
  bytes; needs a wrapper that writes bytes to tempdir and parses,
  or a different entry point.
- apply/derive inverse property - needs more scaffolding than fits
  the current batch.
- Header flag combinations - small additional set, low priority.

### T08 - Boundary-twin scan across modules

Practice memo, no discrete deliverable. When landing a fix in one
module, add one regression test per twin site in the same commit:

- `commands/sort/mod.rs:178-181` is the same overlap-run
  kind-boundary bug as the just-fixed `cat/dedupe.rs:225`.
- `write/parallel_writer.rs` and `write/parallel_gzip.rs` both
  silently swallow `Drop`-path errors.
- The kind-placeholder-on-non-indexed pattern from apply-changes
  recurs in altw, extract-multi, getid, cat, tags-filter.

Cheaper than chasing each finding individually; prevents the next
regression of the same pattern.

### T09 - check --refs parity

`check_refs` has no `jobs` override in its public signature
(`src/commands/check/refs.rs:141`); a parity test needs either a
plumbed `jobs` argument or a CLI-level worker-count probe (hard to
observe from outside). Not urgent - worker-count-independent
correctness is implicitly covered by existing single-blob tests.
Unblocks the final entry in T05.

### T10 - Fuzz testing via `cargo-fuzz`

Optional follow-up to T07; only worth the setup if someone wants
to run weekend campaigns. PBF parsing, OSC parsing, and wire-format
decoders all accept untrusted input. Targets at those entry points
would catch panics, OOM, and logic errors on malformed data.

**Cost:** `fuzz/corpus/` grows to hundreds of MB, low GB per target
over long campaigns; `fuzz/target/` is ~500 MB - 1 GB of build
artifacts. Both must be gitignored.

**Schedule:** smoke runs (60 s) only verify the harness; real
bug-hunting needs hours to days per target. Skip until T07 exposes
a gap that only coverage-guided fuzzing can fill.

### Truncation handling alignment [LANDED 2026-04-26]

Stance: [`reference/truncation-handling.md`](../reference/truncation-handling.md).
Every truncation shape except a clean cut at a frame boundary
(0-3 leftover bytes from an incomplete next length prefix) is a
hard error.

Aligned contract sites:

- `BlobReader::next` header short-read check (`blob.rs:400`)
- `BlobReader::next` payload short-read check (`blob.rs:528`)
- `BlobReader::next` length-prefix tolerance for 1-3 leftover
  bytes (`blob.rs:381`)
- `BlobReader::skip_blob_body` post-skip 1-byte sentinel read
  (`blob.rs:697`) - keeps the BufReader seek optimization for
  in-range targets, hard-errors on shape 4
- `HeaderWalker::next_header` probe-pread `UnexpectedEof =>
  Ok(None)` removal (`header_walker.rs:161`)
- `HeaderWalker::next_header` payload-extent check
  (`header_walker.rs:200`)
- `FileReader::skip` post-skip 1-byte sentinel read
  (`file_reader.rs:71`) - the `read_blob_header_only` caller-
  side path used by `has_indexdata`, `diff`, `cat::dedupe`, and
  `altw::passthrough` was originally missed by the audit and
  surfaced as a silent shape-4 hole during post-pass review.
  Same fix shape as `skip_blob_body`. Promoted to a first-class
  contract site in `reference/truncation-handling.md`.

`read_raw_frame` already aligned (uses `read_exact` end-to-end);
gold-standard pattern.

`PrimitiveBlock::new` already aligned via the Cursor-based
protobuf walk in `WireBlock::parse_and_inline`.

Test coverage layered:

- Reader-level unit tests
  (`tests/read_paths.rs::trailing_partial_length_prefix_*`,
  `tests/corrupt_input.rs::truncated_header_size`,
  `truncated_header_data`) pin the `BlobReader::next` tolerance
  contract directly.
- `tests/cli_truncation_sweep.rs` is shape-aware: tolerated
  offsets (within 0-3 bytes of any frame_start) pin no-panic +
  bounded stderr only at the command level (sort may
  legitimately reject a tail truncation even when the reader's
  contract holds); shape-2-4 offsets assert non-zero exit.
- Truncation errors at shapes 3 and 4 in `BlobReader::next` /
  `skip_blob_body` include the byte offset and shape in stderr
  per the reference doc.

Caller-side defensive checks (e.g. `src/scan/classify.rs:90`)
remain as belt-and-braces but no longer load-bearing.

`cli_truncation_sweep.rs` covers `cat`, `inspect`, `sort`,
`getid`, `add-locations-to-ways`, and `renumber`. The reader
contract is pinned by unit tests so any other command going
through `BlobReader` / `HeaderWalker` / `FileReader::skip`
inherits it for free; the six commands chosen exercise distinct
read-path shapes (passthrough, header-only fast scan,
indexed-decode, header-walk-with-pread, altw passthrough through
`FileReader::skip`, full-read + pass-2 reframe). Sweep wall-time
~15 s for ~50 truncation offsets per sweep.

### Negative-id handling - decided

Resolved 2026-04-26 by walking the osmium reference vendored under
`research/`:

- **getid**: hard reject at parse time (`parse_id_spec` in
  `src/commands/getid/mod.rs`). Matches osmium exactly - osmium's
  getid rejects identically and its man page is explicit about it.
  Pinned by
  `cli_negative_id_invariants.rs::getid_rejects_negative_ids_with_named_id_and_kind`.
- **cat / sort / inspect**: stay as passthrough (osmium does not
  validate either). Pinned by the existing `_preserves_mixed_sign_ids`
  / `_handles_mixed_sign_ids` tests.
- **tags-filter**: silent-drop status quo stays pinned. Osmium
  silently abs-converts via `positive_id()` - both are silent, so
  neither is a clean precedent for a forced change. Documented as a
  residual divergence in `DEVIATIONS.md`.
- **IdSet::set / set_if_new**: silent no-op on negatives kept.
  libosmium enforces unsigned-only at the type layer (its id-set
  template carries a `static_assert(std::is_unsigned<T>::value)`),
  not as a runtime rejection. The architectural lesson:
  negative-id handling belongs at the input boundary, not the
  storage layer.

DEVIATIONS.md "Negative input IDs rejected project-wide" now names
three enforcement sites: renumber, diff/derive shard planners,
getid.