pbfhogg 0.3.0

Fast OpenStreetMap PBF reader and writer for Rust. Read, write, and merge .osm.pbf files with pipelined parallel decoding.
Documentation
# `degrade` - command design

New subcommand: produce a valid-but-adversarial PBF by stripping
properties or perturbing structure. A "make our lives difficult" tool
for exercising code paths that require less-optimised inputs
(unsorted, missing indexdata, scattered coords, etc).

Drafted 2026-04-24 as scaffolding before implementation. Will drift.

## Purpose

Production PBFs (Geofabrik, `planet.osm.org`) are well-formed, sorted,
indexed, and include all the hints pbfhogg is built to exploit. That
makes it hard to benchmark the non-optimal code paths that handle
real-world-ish edge cases:

- `sort` overlap-rewrite needs an unsorted input.
- `add-locations-to-ways` exercises nothing interesting on inputs that
  already have `LocationsOnWays` set.
- Commands with `--force` fallbacks (getid, sort, etc) need PBFs
  without indexdata to trigger those paths.
- `tags-filter` tag-index fast path needs to be measured against the
  no-tag-index fallback.

`degrade` is the knife-drawer of "take a clean PBF and make it
harder" transformations, unified under one command so the flags
compose.

Non-goals:

- No element filtering (`getid`, `tags-filter`, `extract`).
- No blob-size re-encoding (that's `repack`).

## API

```
pbfhogg degrade <input> -o <output> [--unsort]
                                     [--strip-locations]
                                     [--strip-indexdata]
                                     [--strip-tagdata]
                                     [--strip-bbox]
                                     [--recompress C]
                                     [--drop-ids N:SEED]
                                     [--compression C]
                                     [--direct-io]
                                     [--io-uring]
```

Flags are composable. Order of effects is stable (documented below)
so `--unsort --strip-locations` is equivalent to
`--strip-locations --unsort` on output.

## Transformations

Priority-ordered. v1 ships (1)-(3); the rest follow as need arises.

### (1) `--unsort` - produce unsorted input

Target: exercise `sort` pass 2's overlap-rewrite path (opp #3 in
`notes/sort.md` landed without a planet bench because we lacked an
unsorted planet). Output must:

- Have at least one overlapping blob pair per element kind (so
  `detect_overlaps` flags it).
- Remain a valid PBF (elements themselves unchanged, just blob
  grouping/order perturbed).
- Clear the `Sort.Type_then_ID` header feature.

Minimum-viable approach: take the input's sorted element stream,
rotate each kind's element-ID-sorted run by one blob's worth of
elements. This creates exactly one overlap band between adjacent
blobs per kind - enough to trigger the overlap path without
chaos-ifying the file.

Configurable chaos deferred: later `--unsort=[rotate|shuffle|reverse]`
or similar once the simple mode exists.

### (2) `--strip-locations` - remove LocationsOnWays

Target: `add-locations-to-ways` benchmarks on PBFs whose input
already carries inline coordinates (redundant starting point).
Output:

- Removes `LocationsOnWays` from header features.
- For each Way element: drop the inline node coordinates, keep only
  the `refs` list.

Simple structural transformation; no re-sorting needed.

### (3) `--strip-indexdata` - remove BlobHeader indexdata

Target: force commands into their `--force`/non-indexed fallback
paths (`sort`, `getid`, `tags-filter`, etc).

- Zeros or omits the BlobHeader `indexdata` field on every OsmData
  blob.
- Element bodies are bit-identical; only framing metadata changes.

Ideally does not re-compress blobs (if the output writer path can
reframe with the new header, it's nearly free - same idea as the
passthrough path `sort` uses today).

### (4) `--strip-tagdata` - remove tag-index hints (deferred)

Target: `tags-filter`'s no-hint fallback path.

### (5) `--strip-bbox` - clear HeaderBlock bbox (deferred)

Target: spatial-scan fallback in `extract`.

### (6) `--recompress C` - re-encode at a different codec (deferred)

Target: codec-boundary behavior; overlaps with `repack
--compression` but done without changing blob size.

### (7) `--drop-ids N:SEED` - introduce referential dangles (deferred)

Target: `check --refs`'s slow path and error recovery.

- Deterministically drop N elements' IDs from a given seed.
- Output is a PBF whose ways/relations reference missing nodes/ways.
- Useful for stress-testing `check --refs` and validating its error
  reports.

## Composability

When multiple flags are set, transformations apply in this order:

1. `--strip-indexdata` (structural, changes framing only)
2. `--strip-tagdata` (structural, changes framing only)
3. `--strip-bbox` (header-level)
4. `--strip-locations` (element-level, changes ways)
5. `--drop-ids` (element-level, removes elements)
6. `--unsort` (element/blob reordering, always last)
7. `--recompress` (during the final write, irrespective of prior
   flags)

This order ensures element-level transformations see consistent
input and that `--unsort` sees the final element set after drops.

## Implementation sketch

Unlike `repack`, which can be a single pipelined-read →
re-encode pass, `degrade` is transformation-driven. v1 does one pass
per transformation, materialising via BlockBuilder + PbfWriter.

For v1 (unsort + strip-locations + strip-indexdata):

- `--strip-indexdata`: a blob-level passthrough; reframe each blob
  with cleared indexdata. Mirrors `sort`'s passthrough path but
  drops the indexdata field.
- `--strip-locations`: element-level; decode each way, re-emit
  without inline coords, re-encode.
- `--unsort`: element-level; buffer a full rotation window per kind,
  emit rotated.

Flags can interact - `--strip-indexdata --unsort` means we need to
fully decode anyway (no blob-level passthrough possible when element
order changes). Detect combinations upfront and pick the minimal
decode path.

## Correctness criteria

**Output is a valid PBF.** Parseable by osmium and by pbfhogg's own
reader. `pbfhogg inspect` on the output succeeds.

**Only the declared property is changed.** Elements not targeted by
flags are bit-identical semantically. Tags multiset, ID set, and
per-element metadata all preserved.

**Header features reflect reality.** After `--strip-locations`, the
`LocationsOnWays` feature is gone. After `--unsort`, the
`Sort.Type_then_ID` feature is gone. Consumers that check these
features see the truth.

## Tests

Per-flag, minimal:

1. `--unsort` on a small sorted PBF → output has at least one
   overlap run (verifiable via `sort`'s own overlap counter when
   the output is piped back through sort).
2. `--strip-locations` → output header lacks `LocationsOnWays`;
   way elements have no inline coords.
3. `--strip-indexdata` → BlobHeader.indexdata is absent on every
   OsmData blob; `inspect --indexed` returns non-zero.

Round-trip:

4. `degrade --strip-locations` then `add-locations-to-ways`   element-by-element recovery of the original.
5. `degrade --strip-indexdata` then `cat` (which re-generates
   indexdata) → inspect --indexed succeeds on the recovered file.
6. `degrade --unsort` then `sort` → sorted output equivalent to the
   original.

Combination:

7. `--unsort --strip-indexdata` → output is unsorted + unindexed.

Benchmarks:

8. `degrade --unsort --dataset planet` produces a file suitable
   for `brokkr sort --dataset planet-unsorted --bench 1`. Register
   the degraded planet in `brokkr.toml` once generated.

## Scope for v1

- `--unsort`
- `--strip-locations`
- `--strip-indexdata`
- `--compression`

Everything else deferred. Flag set grows as consumers show up.

## Cross-references

- [`reference/blob-density.md`]../reference/blob-density.md -
  the parallel insight about blob density; `degrade` is orthogonal
  but shares the "generate adversarial test data" framing.
- [`notes/repack.md`]repack.md - companion command; shares input-
  read + output-write plumbing.
- [`notes/sort.md`]sort.md - primary consumer of `--unsort` for
  opp #3 parallel-overlap benchmarking.
- [`src/commands/altw/`]../src/commands/altw/ - primary consumer
  of `--strip-locations` (once that path has a reason to land).
- [`src/write/block_builder.rs`]../src/write/block_builder.rs -
  BlockBuilder; re-used for element-level transformations.
- [`src/commands/sort/mod.rs`]../src/commands/sort/mod.rs -
  structural template for the `--strip-indexdata` passthrough path.