bbnorm-rs
bbnorm-rs is a Rust implementation of the practical BBNorm read-depth
normalization workflow from BBTools. It focuses on local FASTA/FASTQ
normalization, histogram output, paired/interleaved routing, bounded memory
counting, and reproducible Java-parity behavior for covered modes.
This is an early working release, not a complete BBTools replacement. The Git repository includes a vendored BBTools snapshot for parity tests; crates.io packages intentionally exclude that snapshot to keep the package small.
Install
From crates.io, once published:
From a checkout:
Basic Usage
Common outputs:
Supported inputs and outputs include plain and gzip FASTA/FASTQ, single-end,
paired two-file, explicit interleaved, auto-interleaved, BBTools-style #
paired filename expansion, null output sinks, and common BBNorm-style
key=value aliases.
Current Status
Verified locally:
cargo fmt --all -- --checkcargo clippy --all-targets --all-features -- -D warningscargo test --all
Current tests cover 242 library tests, 8 basic integration tests, and 108 Java-parity tests against the vendored BBTools snapshot.
Implemented working areas include:
- BBNorm-style CLI parsing for common normalization options and aliases.
- Plain and gzip FASTA/FASTQ I/O.
- Single-end, paired, and interleaved read routing.
- Short canonical k-mers and long BBTools-shaped hashed k-mers.
- Exact and bounded count-min counting paths.
- Depth histograms, read-depth histograms, and peak output.
- Deterministic normalization decisions for covered modes.
- Multipass, count-up, and ECC behavior for tested subsets.
- Guarded benchmark and parity harnesses in the source repository.
Known gaps remain:
- Full BBTools sketch, prefilter, and cardinality/loglog collision parity is not complete.
- ECC and overlap behavior is covered by compact and biological stress tests, but not every BBMerge/BBNorm edge case.
- Large human-read benchmarks show improved deterministic bounded counting and excellent memory usage, but input counting remains the main speed bottleneck in some comparable modes.
See docs/parity.md and docs/component_buildout.md for the detailed ledger. The acceptance matrix in docs/parity_matrix.md is the current front-door workflow for deciding whether a mode is exact parity, bounded approximate parity, accepted Rust-over-Java divergence, or still a gap.
Benchmark Snapshot
The acceptance matrix is the publishable benchmark source of truth. The latest
local matrix run at
tmp/parity_acceptance_publish_ready_20260508/acceptance_summary.tsv verified
9 bundled phiX exact-output modes and 6 local human bounded-sketch modes.
Exact bundled rows:
default,k=40,k=40 fixspikes=tpasses=2keepall=tecc=t markuncorrectableerrors=tqtrim=r trimq=10minlen=100passes=2 ecc=t markuncorrectableerrors=t
Local human bounded-sketch rows:
| Row | Mode | Verdict | Java time | Rust time | Java RSS | Rust RSS | Hist drift ppm | Rhist drift ppm |
|---|---|---|---|---|---|---|---|---|
| 50k | default | bounded_drift | 1.54s | 2.57s | 3.35 GiB | 3.30 GiB | 4 | 840 |
| 50k | prefilter | bounded_drift | 2.05s | 2.68s | 3.35 GiB | 3.10 GiB | 4 | 840 |
| 50k | k40_fixspikes | bounded_drift | 1.64s | 2.47s | 3.34 GiB | 3.12 GiB | 2 | 140 |
| 500k | default | bounded_drift | 9.03s | 30.12s | 3.38 GiB | 3.38 GiB | 1227 | 1492 |
| 500k | prefilter | bounded_drift | 10.53s | 31.81s | 3.26 GiB | 3.14 GiB | 49 | 998 |
| 500k | k40_fixspikes | bounded_drift | 10.82s | 28.60s | 3.39 GiB | 3.25 GiB | 245 | 982 |
The conservative publishable claim is: exact covered fixture modes match the
vendored Java oracle byte-for-byte, bounded human rows stay within the matrix
drift gate, and large-slice Rust speed still needs work in the input-counting
hot path. countup=t is tracked separately as an accepted Rust-over-Java
divergence guard rather than normal Java parity.
For high-throughput bounded approximate runs where byte-stable collision order
is less important than speed, deterministic=f enables direct atomic packed
sketch updates and fuses input histogram collection into the normalization pass.
On the local 500k paired-human packed 16-bit lane at
tmp/fastlane_atomic_packed_fusedhist_500k_compare_20260515_115057, the
3-repeat median was 6.769s / 2.79 GiB RSS for Rust versus 7.814s / 3.39 GiB
RSS for Java. That is 13.4% faster wall time and 17.8% lower peak RSS on the
same input, read limits, bits=16, null read outputs, hist, and rhist
benchmark lane.
For repeatable current baselines, use
scripts/benchmark_trustworthy_baseline.py.
It records git/tool/input metadata, command lines, raw run data, stage timings,
Java/Rust histogram drift, and aggregate p10/median/p90 summaries. See
docs/trustworthy_benchmarking.md for the
Java-default and packed 16-bit benchmark lanes.
The v0.1.3 performance patch adds the high-throughput bounded approximate
fast lane above. The v0.1.2 performance patch improved deterministic packed
bounded counting on the local 500k paired-human packed 16-bit lane. A final
3-repeat refresh at
tmp/trustworthy_baseline_500k_bits16_final_20260508 measured the current
deterministic Rust median at 19.786s wall time, 14.726s input counting, and
3.04 GiB RSS; the Java median for the same lane was 8.387s wall time and
3.41 GiB RSS.
| Variant | Before | After | Change |
|---|---|---|---|
| Rust deterministic wall time | 24.104s | 19.786s | 17.9% faster |
| Rust deterministic input counting | 19.038s | 14.726s | 22.7% faster |
| Rust deterministic max RSS | 3.41 GiB | 3.04 GiB | 10.9% lower |
Those measurements compare tmp/trustworthy_baseline_500k_bits16_20260508
against tmp/trustworthy_baseline_500k_bits16_final_20260508, with 3 repeats
per variant, null read outputs, bits=16, reads=500000, and
tablereads=500000.
Experimental GPU counting is documented in
docs/gpu_counting_integration.md. The
parity-safe GPU path must preserve deterministic chunk replay order; naive
global GPU reduction is faster-looking but semantically wrong for conservative
count-min updates. The current persistent CUDA helper is byte-identical to Rust
CPU on the tested lanes but remains slower than the CPU path, so it is kept
behind explicit gpucounting=t gpupersistent=t flags.
Repository Layout
src/: Rust library and CLI implementation.tests/basic.rs: package-friendly integration tests.tests/java_parity.rs: repository-only Java parity tests requiringvendor/BBTools-master.docs/: implementation status and parity notes.scripts/: local parity, benchmark, and stress harnesses.vendor/: BBTools reference snapshot for repository testing only.
License
bbnorm-rs is licensed under the BSD 3-Clause License. The vendored BBTools
reference snapshot in the source repository is distributed under its own license
at vendor/BBTools-master/license.txt and is not included in crates.io
packages.