bbnorm-rs 0.1.3

# BBNorm Rust Working Buildout

Goal: make `bbnorm-rs` work end-to-end for practical local normalization workloads in Rust. This is an ASAP working checklist, not a promise of full BBTools parity.

## Core Components

| Component | Working target | Current status | Fastest next action |
| --- | --- | --- | --- |
| CLI parser | Accept common BBNorm `key=value` options and reject unsafe unsupported modes | Broad single-pass parser coverage exists, including BBTools-style `config=<file>` expansion, shared `input`/`output` aliases, extension hints, I/O worker hints, temporary-directory controls wired into managed Rust temp-file paths, kmer-table runtime controls, genome-build context controls, diagnostic sizing flags, disabled break-length controls, and disabled recalibration aliases | Keep adding guards for unsupported behavior-changing options |
| FASTA/FASTQ I/O | Read/write plain and gzip FASTA/FASTQ for single-end and paired data | Implemented for covered paths, including BBTools-style case-insensitive `null` output sinks, practical `stdin`/`-` streamed input via temp materialization that honors enabled `tmpdir` settings, and built-in zlib-rs gzip plus parallel gzip output for `.gz` outputs and BBTools-style pigz/unpigz input hooks when `threads`, `zipthreads`, `pigz`, or `unpigz` controls request more than one worker | Keep smoke coverage on generated and bundled fixtures; generic ordinary gzip still cannot be split into independent deflate work without block/index support, but Rust now uses zlib-rs and will stream `.gz` input through pigz/unpigz when available and requested |
| Paired/interleaved routing | Handle `in`, `in2`, `#` patterns, interleaved input, split/interleaved output | Implemented for covered paths | Add smoke cases only when failures appear |
| Quality handling | Support q33/q64 conversion, fake FASTA qualities, quality clamps | Implemented for covered paths; quality recalibration controls, including `_p1`/`_p2` pass suffixes and disabled `recalibrate` aliases, are accepted as output-preserving no-ops | Preserve existing Java-golden tests |
| Base cleanup | Normalize or reject junk/IUPAC/lowercase bases according to options | Implemented for covered paths | Keep as regression-tested parser/engine behavior |
| K-mer encoding | Count short canonical k-mers and long hashed k-mers | Implemented; biological `k=40` paired stress now covers plain and `fixspikes=t` long-kmer paths on local S. cerevisiae and E. coli with byte-identical `threads=1/2/auto` outputs | Keep the long-kmer biological stress in the working-pipeline gate |
| K-mer table build | Build exact Rust count table with `minq`, `minprob`, `rdk`, read limits, extra inputs | Working single-pass exact map with Rayon batch/merge counting remains the default for small inputs, while large inputs automatically route to bounded count-min sketches; default `bits=32` bounded sketches now use a shared atomic table with BBTools-style conservative updates (raise all hash cells to `min+count`) to reduce collision inflation while staying chunk-memory-bounded, while non-32-bit constrained sketches keep the packed fixed-memory fallback; kept-output side counts for `histout`/`rhistout`/`peaksout` use the same bounded sketch path, count/sketch hot paths reuse bounded chunk maps and duplicate-removal buffers, long-kmer hashing follows BBNorm's scalar `Kmer.xor()` path and now uses rolling BBTools-style Java word state for `k>31`, avoiding both per-window vector allocation and full-window rescans on common layouts, and `countup=t` uses a bounded kept-count sketch when sketch sizing is requested; `initialsize` and `prealloc`/`preallocate` still reserve exact-count map capacity when practical without changing exact-mode outputs | Profile chunk size and reserve heuristics on larger local fixtures |
| Count-min/Bloom sketch | Accept BBTools sketch, prefilter, table-sizing, and cardinality/loglog sizing knobs without blocking useful runs | Constrained count-min sizing and large-input automatic sizing now build direct fixed-memory sketches for `cells`/`matrixbits` with `bits`/`hashes` controls, plus Rust's explicit `sketchmemory`/`countminmemory` byte budget, instead of requiring an exact input table first; explicit `cells`/`matrixbits` now follow BBTools' total-cell memory semantics as one shared KCountArray-style cell universe instead of multiplying memory by hash count; `bits=32` uses a deterministic conservative atomic table and smaller bit widths use conservative packed updates rather than blind per-row increments; configured prefilter controls now build a real two-stage prefilter-plus-main bounded input sketch with an explicit KCountArray-style prefilter-limit gate and locked/conservative updates by default, with `lockedincrement=f`/`symmetricwrite=f` switching bounded main and prefilter sketches to independent row increments; the atomic independent path skips allocating the conservative lock stripe table, while the atomic conservative path now uses KCountArray7MTA-style key-striped locks around direct read-min/raise-all updates, returns the previous minimum for parity with `incrementAndReturnUnincremented`, and replays already-merged engine chunk maps through the same conservative update body without redundant per-key mutex traffic; `prefilterfraction`/`prefiltersize` partition the main table budget instead of adding an unbounded second table, and bare `prefilter=t` now applies BBTools' default 35% partition whenever bounded count-min counting is selected; bounded bucket masks now use `KCountArray7MTA` FastRandomXoshiro mask generation with per-table Java-style seed stepping, so two-stage prefilter/main sketches no longer reuse the same mask table, and large rows use KCountArray-style internal array slot placement before the prime cell modulus, with configured or active Rayon worker counts rounded up to BBTools-style minimum internal array counts; exact-mode collision-estimate fallbacks now replay through the same bounded sketch objects instead of materializing hash-bucket maps; `buildpasses>1` applies deterministic trusted-kmer depth reduction; bounded sketch `unique_kmers` reports now use BBTools-style hash-adjusted used-fraction estimates, including thresholded `mindepth` estimates for prefilter/main high-depth splits, rather than raw occupied-cell counts; cardinality/loglog knobs now drive fixed-memory HyperLogLog-style input/output estimates with bounded register arrays and stderr/benchmark extraction; compatible kmer-table reserve knobs are wired into exact Rust maps | Keep pushing toward fuller BBTools sketch/prefilter/cardinality behavior |
| Read depth scoring | Compute per-read k-mer depths and percentiles | Implemented for covered paths; histogram analysis, long paired-read analysis, large coverage sorts, and `k=40 fixspikes=t` biological stress use Rayon-safe deterministic paths | Consider parallelizing normalization decisions if profiling warrants it |
| Normalization decisions | Keep/toss reads with `target`, `max`, `min`, `minkmers`, percentile controls | Working deterministic single-pass subset with Rayon-batched decision/rename work and ordered output writes | Profile larger fixtures before parallelizing writer/output-count stages |
| Error detection | Support current toss-bad/read-depth behavior | Partially working in covered cases | Expand noisy fixture smoke before adding new modes |
| Error correction | Correct reads and emit correction-driven outputs | First table-based short-kmer `ecc=t` path works: Rust corrects representative single substitutions, supports mark-only quality reduction, preserves Java-style `markuncorrectableerrors` quality marking in both keep and `outuncorrected` outputs, matches Java `markerrors` + `trimaftermarking` output on a representative tail-error fixture, routes forced single-end and paired uncorrectable reads to `outuncorrected` like Java, verifies real-derived paired `markuncorrectableerrors` routing plus quality marking on local S. cerevisiae and E. coli across `threads=1/2/auto`, honors `ecc1`/`eccf` correction staging in covered `passes=2` runs including a noisier mixed-mutant fixture, matches Java on ECC-enabled normalization/toss routing with and without `tossbadreads`, corrects controlled substitutions in generalized real-derived paired stress fixtures including bundled phiX plus local S. cerevisiae and E. coli biological datasets, parses Java ECC tuning knobs, preserves paired phiX no-error output, matches Java's strict overlap short-read gate below 35 bp, now applies Java-shaped entropy-driven minimum overlap gating plus the expected-mismatch and probability post-filters, rejects compact repetitive ambiguous, high-confidence mismatch, and competing short-overlap ambiguity fixtures like Java, and matches Java sequence plus quality output on compact accepted-overlap high-entropy fixtures | Keep tightening overlap-ECC behavior toward broader BBMerge heuristic parity on edge cases beyond the current compact guards |
| Multipass/count-up | Default BBNorm two-pass and `countup=t` behavior | Covered `passes=2/3/4` paired phiX runs use managed temp-file orchestration and match Java keep/hist outputs; bounded real-data Java parity now also passes for `passes=2/3/4`, including keep/bin/hist outputs on a 1k paired S. cerevisiae slice, and bounded `passes=2 ecc=t` plus `markuncorrectableerrors=t` biological parity also passes on the same slice; intermediate passes now apply Java-shaped bad-read target tightening from `targetbadpercentilelow/high`; covered `passes=2` ECC staging honors `ecc1`/`eccf` on simple/noisy correction fixtures and real-derived paired phiX, S. cerevisiae, and E. coli stress fixtures with byte-identical `threads=1/2/auto` outputs; multipass keep/toss routing follows Java final-stage toss output semantics; covered multipass `outuncorrected` follows Java's final-stage-only behavior rather than carrying intermediate ECC fragments; `countup=t` now runs Rust count-up keep/toss logic with Java-shaped relaxed prepass inclusion, paired prepass `require_both_bad` semantics, byte-budgeted external sorted temp runs, bounded fan-in run compaction, low-kmer error presort, read length, expected errors, numeric ID, read ID, `tossbadreads=t` final spike rejection, `addbadreadscountup` table/prepass carry-forward behavior, bounded kept-count sketches when `cells`/`matrixbits`/`sketchmemory` are requested, renamed headers, minlen filtering, table-based ECC on kept reads, overlap-only mate repair for `ecco=t`, mark/trim-after-marking ECC, depth-bin side streams, correction-driven `outuncorrected` routing, and biological-data stress on local S. cerevisiae and E. coli paired datasets with byte-identical `threads=1/2/auto` outputs | Expand sketch/collision parity and true giant-dataset validation |
| Histograms | Emit k-mer and read/base depth histograms | Working with Rayon-batched sparse analysis reducers; `qhist`/`qualityhistogram`, `bqhist`/`basequalityhistogram`, `qchist`/`qualitycounthistogram`, `aqhist`/`averagequalityhistogram`, `obqhist`/`overallbasequalityhistogram`, `lhist`/`lengthhistogram`, `gchist`/`gchistogram`, `bhist`/`basehistogram`, `enthist`/`entropyhistogram`, sequence-input fallback `idhist`, no-alignment fallback `mhist`/`ihist`/`qahist`/`indelhist`/`ehist`, and read-header `barcodestats`/`barcodecounts` now emit covered primary quality-family, read-length, GC-bin, base-content, entropy, identity-shaped, match-shaped, error-shaped, and barcode-count side outputs from one shared read-local scan with `maxhistlen`, `gcbins`, `entropybins`, `idbins`, and entropy controls applied where relevant; local S. cerevisiae biological stress validates all covered side-output histograms across `threads=1/2/auto`; remaining shared BBTools side-output stats histogram knobs are accepted as non-emitting ASAP fallbacks | Profile memory/thread tradeoffs on larger fixtures |
| Peak calling | Emit peak summaries from histogram data | Working on representative tests; CallPeaks short aliases map to Rust peak filters | Keep existing golden tests; add smoke only if cheap |
| Depth bins | Emit low/mid/high read bins and paired bin streams | Working for covered paths; local S. cerevisiae and E. coli biological side-routing stress verifies paired low/mid/high totals and byte-identical `threads=1/2/auto` outputs | Keep biological stress in the working-pipeline gate |
| Trimming | Apply active qtrim/trim controls before decisions | Working for covered paths; comma trim thresholds fall back to the first threshold with a note | Keep golden tests; add smoke if changing trimming code |
| Output fanout | Keep/toss/uncorrected/bin outputs, comma-list outputs, append behavior, case-insensitive `null` output sinks | Working for covered paths; no-ECC uncorrected behavior is Java-covered, forced ECC single-end plus paired uncorrectable-read routing now matches Java, biological side-routing stress covers paired keep/toss/depth-bin plus real-derived ECC `outuncorrected` and `markuncorrectableerrors` quality-marked output across `threads=1/2/auto`, and multipass keep/toss outputs follow Java final-stage toss routing; output writes stay ordered | Expand only if new side-output modes are added |
| Rename/header rewriting | BBTools-style renamed read IDs | Working for covered single-end, paired, interleaved, ECC-active, and exact count-up paths, including Java-shaped ECC-active `e1=0`/`e2=0` header fields | Preserve Java-golden tests |
| Random/deterministic read selection | Preserve reproducible default output and support nondeterministic downsampling when requested | Default deterministic mode uses Java-shaped per-read coin values; `deterministic=f` now stays enabled and seeds the same fast RNG from wall clock/process/counter for true per-run random selection while preserving ordered writes | Stress larger non-keepall random-selection fixtures if users need distribution checks |
| Performance/threading | Use CPU threads effectively in Rust | Rayon-backed exact table build, conservative atomic `bits=32` bounded sketch insertion, normalization decisions, kept-output counting, sparse histogram/read-hist analysis, and guarded long-read analysis are implemented; `threads=max`/`threads=all` now explicitly select all available Rayon workers while `threads=auto` leaves Rayon on its default; `.gz` outputs can now use built-in parallel gzip compression when write-thread/compression controls request it; the biological benchmark harness now accepts mode-specific extra args so default exact-count, `countup=t`, correction-heavy `countup=t ecc=t markuncorrectableerrors=t`, plain `k=40`, `k=40 fixspikes=t`, bounded `passes=2`, and bounded `passes=2 ecc=t markuncorrectableerrors=t` can all be thread-checked through one smoke, and the biological matrix benchmark now forwards `MATRIX_EXTRA_ARGS` so the same Rust-only mode can be compared across multiple real datasets | Use low-concurrency validation on desktops; use `scripts/benchmark_working_modes_smoke.sh` for quick working-mode validation, `scripts/benchmark_biological_dataset.sh` for one larger local dataset, and `scripts/benchmark_biological_matrix.sh` plus `MATRIX_EXTRA_ARGS` for multi-dataset mode checks |
| Validation | Fast Rust smoke plus targeted Java-golden parity scripts | Rust tests, parity scripts, phiX thread-scaling, truncated biological Java parity for the default path, explicit truncated biological Java guards for `countup=t`, the local paired E. coli PairStreamer crash, and the local paired S. pombe partial-output error-state crash, truncated biological working-mode Java parity that now matches for bounded `k=40` and `k=40 fixspikes=t`, and larger biological-data benchmarks exist | Run `scripts/component_smoke.sh` before deeper parity runs |

## ASAP Completion Definition

- `cargo test --all` passes.
- `scripts/component_smoke.sh` passes on generated data.
- `scripts/component_smoke.sh` also passes on bundled phiX data when `vendor/BBTools-master/resources/sample1.fq.gz` and `sample2.fq.gz` are present.
- Common single-pass Rust runs produce keep/toss/histogram/bin outputs without Java.
- Covered table-based ECC, multipass, and count-up paths now run real Rust behavior; remaining unsupported overlap-ECC/staging paths must be explicit.
- Rayon-backed parallelism is active on the highest-impact safe path without changing output ordering.
- `scripts/benchmark_thread_scaling.sh` passes byte-identical output checks across configured Rust thread counts, defaulting to `1 2 auto`.
- `scripts/parity_biological_dataset_truncated.sh` passes on the default local paired biological dataset with bounded `reads`/`tablereads`, comparing Java and Rust keep/bin/hist outputs byte-for-byte on a small real-data slice.
- `scripts/parity_biological_countup_java_guard.sh` now captures the current vendored Java limitation on bounded biological `countup=t`: Java still crashes in `normalizeInThread`, while Rust completes and emits full keep/bin/hist outputs on the same real-data slice.
- `scripts/parity_biological_pairstreamer_java_guard.sh` now captures a second vendored Java limitation on the local paired E. coli slice: Java crashes in `PairStreamer.nextList` for both default and `passes=2`, while Rust completes and emits usable keep/bin/hist outputs.
- `scripts/parity_biological_errorstate_java_guard.sh` now captures a third vendored Java limitation on the local paired S. pombe slice: Java writes substantial keep/bin/hist outputs and then terminates in an error state after a `PairStreamer.nextList` failure for both default and `passes=2`, while Rust completes and emits usable outputs.
- `scripts/parity_biological_working_modes_truncated.sh` now reuses the bounded biological Java-parity harness for the default working mode and verifies that both plain `k=40` and `k=40 fixspikes=t` match Java on the bounded biological slice.
- `scripts/benchmark_biological_dataset.sh` passes byte-identical output checks on a larger real paired dataset from `~/Projects/biological data`; it defaults to `1 2 auto`, bounds both `reads` and `tablereads` by default, `KEEP_OUTPUTS=0` removes bulky FASTQ outputs after comparison or failure, and optional `MAX_RSS_KB=<kb>` fails fast on memory regressions.
- `scripts/benchmark_biological_matrix.sh` now also accepts `MATRIX_EXTRA_ARGS`, forwarding one Rust working mode across all selected biological presets so bounded multi-dataset checks can stress long-kmer or ECC-heavy paths without a second benchmark harness.
- `scripts/benchmark_working_modes_smoke.sh` passes on the default local paired dataset, exercising the shared biological benchmark harness across default exact-count, `countup=t`, correction-heavy `countup=t ecc=t markuncorrectableerrors=t`, plain `k=40`, `k=40 fixspikes=t`, bounded `passes=2`, and bounded `passes=2 ecc=t markuncorrectableerrors=t` modes with byte-identical thread checks.
- `scripts/benchmark_biological_guard_smoke.sh` validates the pass path across `1 2 auto` thread cases by default, validates the fail path for `MAX_RSS_KB`, and checks that `KEEP_OUTPUTS=0` cleanup works in both cases. Override with `PASS_THREAD_CASES='<cases>'` or `FAIL_THREAD_CASES='<cases>'` when a narrower smoke is needed.
- `scripts/benchmark_biological_matrix.sh` runs the biological benchmark harness over multiple S. cerevisiae, S. pombe, E. coli, and Bacillus paired-data presets and writes a compact cross-dataset summary; use `MATRIX_READS=<n>`, `MATRIX_THREAD_CASES='1 2 auto'`, and `MATRIX_CASES='<case names, all, or list>'` to scale it up or inspect presets.
- `scripts/fallback_smoke.sh` validates the ASAP fallback/compatibility modes plus real covered behavior for omitted/default `passes`, paired `passes=2`, `countup=t`, table-based `ecc=t`, real-derived ECC stress, real-derived multipass ECC staging stress including staged marked-uncorrectable `outuncorrected` routing, constrained count-min, prefilter sketch, and build-pass trusted-filter behavior plus bounded cardinality/loglog estimates, kmer-table runtime controls, wrapper sampling knobs, `deterministic=f`, Rust support for CallPeaks short aliases, comma-separated trim-quality fallback, local fallback for MPI execution controls, explicit-routing fallback for global pairing runtime controls, BBTools preparser runtime controls, pass-suffixed quality recalibration controls, BBTools-style config-file expansion, shared SAM/readgroup runtime controls, shared `input`/`output` file aliases, practical `stdin`/`-` input materialization, BBTools-style case-insensitive `null` output sinks, temporary-directory controls, shared extension/I/O worker hints, genome-build context controls, diagnostic sizing and disabled read-breaking controls, and side-output stats histogram controls including Rust `lhist` output plus non-emitting fallbacks, stderr notes, and representative output checks.
- `scripts/working_pipeline_smoke.sh` orchestrates component smoke, fallback behavior smoke, bundled phiX thread scaling, count-up biological stress, side-routing biological stress, side-output stats biological stress, long-kmer biological stress, overlap-ECC biological stress, biological RSS guard smoke, and biological matrix smoke as a one-command working-pipeline check, with top-level `summary.tsv` and `config.tsv` artifacts recording stage status, output paths, run knobs, optional count-up, side-routing, side-output stats, long-kmer, and overlap-ECC stress thread/read knobs, and optional `MAX_RSS_KB` biological matrix guard.
- `scripts/working_pipeline_smoke.sh` also runs truncated biological Java-parity smoke on the default local paired dataset for the default path, bounded biological Java parity for `passes=2/3/4`, bounded biological multipass ECC Java parity for `passes=2 ecc=t` with and without `markuncorrectableerrors=t`, explicit truncated biological Java guards for `countup=t` and the local paired E. coli PairStreamer crash, a biological working-mode Java-parity probe that now passes for both plain `k=40` and `k=40 fixspikes=t`, plus a compact working-mode benchmark smoke over default exact-count, `countup=t`, and `k=40 fixspikes=t`, using the same biological benchmark harness and thread-identity checks as the larger dataset benchmarks.

## Quick Commands

- Fast local health check: `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_smoke_quick`
- Default working-pipeline check: `scripts/working_pipeline_smoke.sh tmp/working_pipeline_smoke_full_default`
- List biological matrix presets: `MATRIX_CASES=list scripts/benchmark_biological_matrix.sh`
- Run a selected biological matrix: `MATRIX_CASES='spombe_972_srr17530188 ecoli_mg1655_drr023054 ecoli_mg1655_drr217208' MATRIX_READS=10000 scripts/benchmark_biological_matrix.sh tmp/biological_matrix_spombe_ecoli_10k`
- Run a memory-guarded biological benchmark: `READS=250000 KEEP_OUTPUTS=0 MAX_RSS_KB=4000000 scripts/benchmark_biological_dataset.sh tmp/biological_guarded_250k`
- Full regression suite: `cargo test --all`

## Recent Local Validation

- `cargo fmt --all --check`, `cargo test packed_count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, and `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=5000000000 MODE_CASES='prefilter' REQUIRE_IDENTICAL_COMPARISONS=0 DRIFT_GATE_PROFILE=bounded ALLOW_MODE_FAILURES=1 READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=768m TIMEOUT=8m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=6000000 scripts/benchmark_java_rust_modes.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/java_rust_prefilter_2hash_special_50k_20260508` passed after specializing the deterministic packed prefilter's common 2-bit/2-hash conservative update path. The guarded prefilter matrix improved from `tmp/java_rust_modes_after_atomic_optim_50k_20260508/prefilter` Rust 3.279606s / 3159656 KB RSS with `input_prefilter_counting=2.358783s` to Rust 2.624046s / 3218868 KB RSS with `input_prefilter_counting=1.705787s`; bounded drift stayed `ok` with hist deltas 3/3 ppm and rhist deltas 840/840 ppm.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, and `READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=768m WRITE_OUTPUTS=0 TIMEOUT=8m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/det_three_hash_special_50k_20260508` passed after specializing the default 3-hash atomic count-min update path and removing saturating arithmetic from the validated bucket-index calculation. The guarded human slice improved from `tmp/det_unlocked_replay_50k_20260508` 2.677431s / 951528 KB RSS to 2.624882s / 971660 KB RSS, with `input_main_counting=2.040546s`; `hist` and `rhist` were byte-identical to the previous optimized run.
- `cargo fmt --all --check`, `cargo test atomic_count_min_bulk_replay_matches_locked_sequential_updates -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, and `READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=768m WRITE_OUTPUTS=0 TIMEOUT=8m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/det_unlocked_replay_50k_20260508` passed after removing per-key mutex acquisition from deterministic single-threaded atomic-sketch bulk replay. The guarded human slice improved from `tmp/det_capacity_hint_50k_cap131k_20260508` 3.035462s / 918388 KB RSS to 2.677431s / 951528 KB RSS, with `input_main_counting=2.136673s`; `hist` and `rhist` were byte-identical to the previous optimized run.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, `cargo test --all`, `cargo clippy --all-targets --all-features -- -D warnings`, and `READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=768m WRITE_OUTPUTS=0 TIMEOUT=8m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/det_capacity_hint_50k_cap131k_20260508` passed after adding bounded per-worker capacity hints for deterministic chunk-local count maps. The guarded human slice improved from the prior `tmp/gap_dett_50k_20260506` 4.203752s / 809480 KB RSS to 3.035462s / 918388 KB RSS, with `input_main_counting=2.511175s`; approximate count-min histogram drift versus the prior artifact stayed small (`hist` raw 61 ppm / unique 40 ppm, `rhist` reads and bases 180 ppm).
- `cargo test cardinality -- --test-threads=1`, `scripts/parity_cardinality_fallback_real_dataset.sh tmp/cardinality_estimate_heartbeat_20260425`, `cargo test count_min -- --test-threads=1`, `cargo test --all -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py`, and `bash -n scripts/parity_cardinality_fallback_real_dataset.sh scripts/fallback_smoke.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh` passed after replacing the cardinality/loglog no-op surface with fixed-memory HyperLogLog-style input/output estimates. The paired phiX guard still matches local Java FASTQ/hist/rhist outputs byte-for-byte while stderr now includes `Cardinality estimate:` rows. A guarded human-slice smoke (`READS=5000 TABLE_READS=5000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 MAX_RSS_KB=1000000 TIMEOUT=2m EXTRA_ARGS='cardinality=t cardinalityout=t loglogk=21 buckets=1k' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_cardinality_human_5k_20260425`) completed in 0.404177s at 275920 KB RSS with no guard trip, input cardinality k=21/buckets=1000/unique=1077675, output cardinality unique=3726, and `input_cardinality=0.017995s`.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, `cargo test input_count_layout_summary_reports_prefilter_and_main_tables -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `bash -n scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py` passed after adding Rust `Stage timing:` stderr rows plus `scripts/extract_stage_timings.py` and benchmark `stage_timings.tsv` output. The stage extractor records Java `table_creation`, `table_read`, `total`, and wall-clock timings where available plus Rust engine stages including `input_counting`, `input_hist`, `input_rhist`, `normalize`, and `summary_counts`.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after promoting benchmark stage timings into mode-matrix `summary.tsv` columns. The two-mode sample matrix `tmp/java_rust_modes_stage_columns_sample_20260425` now records Java `table_creation/table_read/total` and Rust `input_counting/input_hist/input_rhist/normalize/summary_counts/output_hist/output_rhist/countup_*` columns in the top-level summary; default stayed Java/Rust `hist`/`rhist` identical with Rust `input_counting=0.020981s`, while prefilter stayed identical and exposed `summary_counts=0.561867s` from scanning the Java-sized prefilter/main sketches.
- `MODE_CASES='countup' EXPECTED_FAILURE_MODES='countup' SKIP_EXPECTED_FAILURE_JAVA=1 REQUIRE_IDENTICAL_COMPARISONS=0 ALLOW_MODE_FAILURES=1 READS=100 TABLE_READS=100 THREADS=2 ZIPTHREADS=1 JAVA_XMX=512m MEM=128m RUST_EXTRA_ARGS='passes=1 bits=32' TIMEOUT=2m WRITE_OUTPUTS=0 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh vendor/BBTools-master/resources/sample1.fq.gz vendor/BBTools-master/resources/sample2.fq.gz tmp/java_rust_modes_stage_columns_countup_sample_20260425` passed and verified skipped-Java count-up rows now surface Rust-specific `countup_work_source` and `countup_normalize` timing columns in `summary.tsv`.
- `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=700000000 MODE_CASES='prefilter' READS=100 TABLE_READS=1000000 THREADS=2 ZIPTHREADS=1 JAVA_XMX=512m MEM=128m RUST_EXTRA_ARGS='passes=1 bits=32' TIMEOUT=2m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=1500000 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh vendor/BBTools-master/resources/sample1.fq.gz vendor/BBTools-master/resources/sample2.fq.gz tmp/java_rust_modes_stage_timings_prefilter_sample_20260425` passed with identical Java/Rust `hist` and `rhist`; the new `stage_timings.tsv` showed Java table creation/read/total/wall timings and Rust stage rows, with Rust `summary_counts=0.545545s` dominating this intentionally tiny read-count plus Java-sized sketch smoke.
- `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=5000000000 MODE_CASES='prefilter' REQUIRE_IDENTICAL_COMPARISONS=0 DRIFT_GATE_PROFILE=bounded ALLOW_MODE_FAILURES=1 READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=768m TIMEOUT=8m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=6000000 scripts/benchmark_java_rust_modes.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/java_rust_modes_human_prefilter_stage_timings_50k_20260425` completed safely: Java used 2.091086s / 3691184 KB RSS, Rust used 6.708739s / 3361308 KB RSS with `mem=4344m`, drift gate stayed `ok`, and Rust stage timing split the remaining cost into `input_counting=4.153072s`, `summary_counts=1.686825s`, `input_hist=0.278094s`, `input_rhist=0.276445s`, and `normalize=0.277088s`.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after adding lazy zeroed sketch backing allocation and BBTools-style parallel occupied-cell/depth-hist scans for packed sketches. The guarded 50k human prefilter auto-memory run improved from Rust 13.677177s / 3376820 KB RSS under `tmp/java_rust_modes_human_prefilter_auto_mem_lazyalloc_50k_20260425` to Rust 6.511306s / 3373564 KB RSS under `tmp/java_rust_modes_human_prefilter_auto_mem_parallelscan_50k_20260425`, with the same Java-comparable sketch geometry, bounded drift gate `ok`, hist deltas 4/3 ppm, and rhist deltas 840/840 ppm.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after promoting Java-derived Rust auto-memory telemetry into the mode-matrix `summary.tsv` (`rust_mem_auto_status`, Java sketch bytes, recommended Rust bytes, and recommended `mem=` string), correcting the recommendation to account for Rust's decimal memory suffixes plus reserved histogram memory, and skipping auto-memory when explicit Rust sketch sizing controls such as `sketchmemory`, `cells`, or `matrixbits` would override `mem=`.
- `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=700000000 MODE_CASES='default' READS=100 TABLE_READS=1000000 THREADS=2 ZIPTHREADS=1 JAVA_XMX=512m MEM=128m RUST_EXTRA_ARGS='passes=1 bits=32 sketchmemory=16m' TIMEOUT=2m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=1500000 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh vendor/BBTools-master/resources/sample1.fq.gz vendor/BBTools-master/resources/sample2.fq.gz tmp/java_rust_modes_auto_mem_explicit_sketch_skip_sample_20260425` passed and correctly reported `rust_mem_auto_status=skipped_explicit_sketch`, preventing a misleading auto-memory-applied label when explicit `sketchmemory` controls table geometry.
- `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=700000000 MODE_CASES='prefilter' READS=100 TABLE_READS=1000000 THREADS=2 ZIPTHREADS=1 JAVA_XMX=512m MEM=128m RUST_EXTRA_ARGS='passes=1 bits=32' TIMEOUT=2m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=1500000 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh vendor/BBTools-master/resources/sample1.fq.gz vendor/BBTools-master/resources/sample2.fq.gz tmp/java_rust_modes_auto_mem_prefilter_sample_20260425` passed: Rust auto-memory was `applied`, prefilter/main sketch totals were Java-comparable at 1002379 memory ppm, and Java/Rust `hist` plus `rhist` were identical.
- `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=5000000000 MODE_CASES='prefilter' REQUIRE_IDENTICAL_COMPARISONS=0 DRIFT_GATE_PROFILE=bounded ALLOW_MODE_FAILURES=1 READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=768m TIMEOUT=8m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=6000000 scripts/benchmark_java_rust_modes.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/java_rust_modes_human_prefilter_auto_mem_50k_20260425` completed safely before the packed-scan speedup: Java used 2.055834s and 3518508 KB RSS, Rust used 13.864797s and 3346844 KB RSS with `mem=4344m`, Rust/Java sketch memory ratio was about 1000019 ppm, and bounded drift gates passed with hist deltas 3/3 ppm and rhist deltas 840/840 ppm.
- `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=700000000 MODE_CASES='default' READS=100 TABLE_READS=1000000 THREADS=2 ZIPTHREADS=1 JAVA_XMX=512m MEM=128m RUST_EXTRA_ARGS='passes=1 bits=32' TIMEOUT=2m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=1500000 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh vendor/BBTools-master/resources/sample1.fq.gz vendor/BBTools-master/resources/sample2.fq.gz tmp/java_rust_modes_auto_mem_summary_sample_ceil_20260425` passed: Rust auto-memory was `applied`, Java-derived Rust memory was `mem=526m`, Rust provisioned a 288750096-byte sketch against Java's 288022856-byte sketch, and Java/Rust `hist` plus `rhist` were identical.
- `RUST_MEM_AUTO_FROM_JAVA=1 RUST_MEM_AUTO_MAX_BYTES=5000000000 MODE_CASES='default' REQUIRE_IDENTICAL_COMPARISONS=0 DRIFT_GATE_PROFILE=bounded ALLOW_MODE_FAILURES=1 READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=768m TIMEOUT=8m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=6000000 scripts/benchmark_java_rust_modes.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/java_rust_modes_human_default_auto_mem_50k_ceil_20260425` completed safely: Java used 1.734027s and 3553320 KB RSS, Rust used 2.974838s and 3337884 KB RSS with `mem=4348m`, Rust/Java sketch memory ratio was about 1000174 ppm, and bounded drift gates passed with hist deltas 3/3 ppm and rhist deltas 840/840 ppm.
- `CARGO_BUILD_JOBS=1 RAYON_NUM_THREADS=2 cargo check --all-targets`, `cargo test countup_ -- --nocapture --test-threads=1`, `cargo test packed_count_min -- --nocapture --test-threads=1`, `cargo test writer_parallel_gzip_output_round_trips_fastq -- --nocapture --test-threads=1`, and `cargo test accepts_shared_io_runtime_controls_as_noops_and_validates_values -- --nocapture --test-threads=1` passed after adding external sorted count-up temp runs with byte-budgeted spills and bounded fan-in compaction, built-in parallel gzip output wiring, and linear-counting bounded-sketch BBTools-style unique-kmer estimates.
- `CARGO_BUILD_JOBS=1 RAYON_NUM_THREADS=2 cargo fmt --all`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `cargo build --release`, and a release-mode paired phiX smoke with `countup=t keepall=t sketchmemory=64k hashes=2 bits=8 zipthreads=2 threads=2` passed after the same changes.
- `cargo test countup_ -- --nocapture`, `cargo test bounded_ -- --nocapture`, `cargo test accepts_constrained_count_min_controls_as_real_sketch_settings -- --nocapture`, `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/sketchmemory_bounded_heartbeat`, and `scripts/parity_countup_fallback_real_dataset.sh tmp/countup_bounded_sketch_heartbeat` passed after adding explicit `sketchmemory`/`countminmemory` count-min byte budgets and moving `countup=t` kept-count updates onto the bounded sketch path when sketch sizing is requested.
- `CARGO_BUILD_JOBS=1 RAYON_NUM_THREADS=2 cargo check --all-targets`, `cargo test bounded_ -- --nocapture`, `cargo test accepts_thread_counts_like_bbnorm_as_rayon_controls -- --nocapture`, and `cargo test long_kmer -- --nocapture` passed after adding `threads=max/all`, chunk-parallel bounded sketch slot-delta insertion, streaming count/sketch kmer callbacks, and allocation-light long-kmer hashing.
- `CARGO_BUILD_JOBS=1 RAYON_NUM_THREADS=2 cargo fmt --all`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `cargo build --release`, and a release-mode `threads=max sketchmemory=1k hashes=2 bits=8` paired phiX smoke passed after the same performance-hardening changes; the smoke resolved `threads=max` to 24 available Rayon workers on the local machine and emitted keep/hist/rhist outputs.
- `cargo test packed_count_min -- --nocapture`, `cargo test bounded_ -- --nocapture`, `cargo test accepts_constrained_count_min_controls_as_real_sketch_settings -- --nocapture`, and `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/direct_bounded_sketch_heartbeat` passed after moving constrained `cells`/`matrixbits` runs onto direct packed fixed-memory count-min sketches for input counts plus kept-output side counts.
- `cargo test build_pass -- --nocapture`, `cargo test real_phi_x_behavior_changing_sketch_controls_fall_back_to_exact_counting --test java_parity -- --nocapture`, and `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/sketch_buildpasses_real_heartbeat` passed after replacing the `buildpasses=2` exact-count fallback with deterministic trusted-kmer depth reduction. The paired phiX sketch guard now verifies real Rust behavior for `buildpasses=2`, `prehashes=1`, `prefilterhashes=1`, `prefiltercells=1k`, `precells=1k`, `cells=1k`, and `matrixbits=10`.
- `cargo test prefilter -- --nocapture` and `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/sketch_prefilter_hash_real_heartbeat` passed after turning hash-only prefilter controls into real Rust sketch behavior: default `prefilter=t` still matches Java, `buildpasses=2` now uses deterministic trusted-kmer filtering, and `prehashes=1`/`prefilterhashes=1` now join `prefiltercells=1k`/`precells=1k`, `cells=1k`, and `matrixbits=10` as real behavior-changing sketch requests.
- `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/sketch_prefilter_real_heartbeat` passed: Rust keeps bare default `prefilter=t` on the Java-safe exact-count path, while `prehashes=1`/`prefilterhashes=1` plus constrained `prefiltercells=1k`/`precells=1k` requests exercise real deterministic Rust prefilter sketch estimates and constrained `cells=1k`/`matrixbits=10` requests exercise direct fixed-memory count-min sketches.
- `THREAD_CASES='1 2 auto' scripts/parity_side_routing_biological_stress.sh tmp/side_routing_markuncorrectable_scer` passed on local S. cerevisiae paired biological data: side routing still preserved 1000 pairs across keep+toss and depth bins, real-derived ECC `outuncorrected` still emitted one paired uncorrected record, and the new `markuncorrectableerrors=t` branch preserved Java-shaped quality marking on the mutated mate while leaving the clean mate unchanged across `threads=1/2/auto`.
- `THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_side_routing_biological_stress.sh tmp/side_routing_markuncorrectable_ecoli` passed on local E. coli paired biological data: side routing still preserved 1000 pairs across keep+toss and depth bins, real-derived ECC `outuncorrected` still emitted one paired uncorrected record, and `markuncorrectableerrors=t` preserved the marked-quality uncorrectable mate while leaving the clean mate unchanged across `threads=1/2/auto`.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_markuncorrectable_heartbeat` passed after preserving `markuncorrectableerrors=t` quality rollback semantics: Java and Rust now both keep the uncorrectable mutant sequence while lowering the suspect quality in the keep output and mirrored `outuncorrected` output on the forced single-end fixture.
- `scripts/parity_countup_fallback_real_dataset.sh tmp/countup_markuncorrectable_heartbeat` passed after applying the same rollback fix to count-up ECC: Rust `countup=t ecc=t markuncorrectableerrors=t` now preserves the marked-quality uncorrectable record in both the keep stream and `outuncorrected`.
- `scripts/parity_kmer_table_runtime_fallback_real_dataset.sh tmp/kmer_table_prealloc_heartbeat` passed after wiring `initialsize` plus `prealloc`/`preallocate` into Rust exact-count table reserve hints: the guard still matches the local Java paired phiX baseline, checks the new preallocation notes, and rejects malformed and out-of-range preallocation values including `prealloc=1.5`.
- Manual Rust-only biological stability over local S. cerevisiae paired reads passed with `reads=10000 tablereads=10000 initialsize=1m prealloc=0.25` across `threads=1/2/auto`: baseline exact-count outputs and sized-map outputs were byte-identical for keep/toss, low/mid/high bins, `hist`, and `rhist`.
- `READS=10000 TABLE_READS=10000 THREAD_CASES='1 2 auto' scripts/parity_side_output_stats_biological_stress.sh tmp/side_output_stats_biological_stress_scer_10k` passed on local S. cerevisiae paired biological data: 10000 pairs and 1480000 bases were retained under `keepall=t`, covered quality-family, read-length, GC, base-content, entropy, identity, no-alignment match, quality-accuracy, insert, indel, and error side-output histograms all had internally consistent totals, and all FASTQ/histogram artifacts were byte-identical across `threads=1/2/auto`.
- `scripts/parity_countup_fallback_real_dataset.sh tmp/countup_fallback_prepass_inclusion` passed after adding Java-shaped count-up relaxed prepass inclusion: Rust `countup=t` now excludes prepass-tossed short reads by default and carries them forward only with `addbadreadscountup=t`, while still covering renamed headers, side streams, ECC, mark/trim-after-marking, overlap-only mate repair, and uncorrected routing.
- `THREAD_CASES='1 2 auto' scripts/parity_countup_biological_stress.sh tmp/countup_biological_stress_prepass_scer` passed after adding Java-shaped count-up prepass inclusion plus presort: local S. cerevisiae paired biological data stayed byte-identical across `threads=1/2/auto`, with 993 kept pairs, 7 tossed pairs, depth-bin streams totaling 1000 pairs, empty correction-driven uncorrected streams, and 1000 retained pairs under `countup=t ecc=t keepall=t`.
- `THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_countup_biological_stress.sh tmp/countup_biological_stress_prepass_ecoli` passed after adding Java-shaped count-up prepass inclusion plus presort: local E. coli paired biological data stayed byte-identical across `threads=1/2/auto`, with 829 kept pairs, 171 tossed pairs, depth-bin streams totaling 1000 pairs, empty correction-driven uncorrected streams, and 1000 retained pairs under `countup=t ecc=t keepall=t`.
- `THREAD_CASES='1 2 auto' scripts/parity_countup_biological_stress.sh tmp/countup_biological_markuncorrectable_scer` passed after extending the biological count-up harness with a real-derived forced-uncorrectable branch: local S. cerevisiae paired data now verifies `countup=t ecc=t keepall=t addbadreadscountup=t markuncorrectableerrors=t eccmaxqual=0` preserves 31 keep pairs, emits one paired `outuncorrected` record, lowers only the mutated mate's quality string, leaves the clean mate unchanged, and stays byte-identical across `threads=1/2/auto`.
- `THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_countup_biological_stress.sh tmp/countup_biological_markuncorrectable_ecoli` passed with the same new count-up biological forced-uncorrectable branch on local E. coli paired data, confirming keep-vs-`outuncorrected` quality agreement on the mutated mate and byte-identical `threads=1/2/auto` outputs.
- `THREAD_CASES='1 2 auto' bash scripts/parity_overlap_ecc_biological_stress.sh tmp/overlap_ecc_biological_scer` passed on local S. cerevisiae paired biological data: the overlap-only stress now covers both a real-derived accepted fixture where single-pass and `countup=t` `ecco=t` repair the lower-quality mate and a compact competing-overlap ambiguity fixture where `ecco=t` correctly leaves the mutant mate untouched, while `ecco=f` leaves both fixtures unchanged; stderr records the overlap-repair note and outputs stay byte-identical across `threads=1/2/auto`.
- `THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" bash scripts/parity_overlap_ecc_biological_stress.sh tmp/overlap_ecc_biological_ecoli` passed on local E. coli paired biological data with the same accepted-plus-rejected overlap stress coverage, confirming single-pass and count-up overlap repair behavior plus strict ambiguity rejection and byte-identical `threads=1/2/auto` outputs.
- `THREAD_CASES='1 2 auto' scripts/parity_long_kmer_biological_stress.sh tmp/long_kmer_biological_stress_scer` passed on local S. cerevisiae paired biological data: both plain `k=40` and `k=40 fixspikes=t` runs preserved 1000 pairs across keep+toss, depth-bin streams totaled 1000 pairs, produced hist/rhist outputs, and stayed byte-identical across `threads=1/2/auto` with 990 kept pairs, 10 tossed pairs, 1000 low-bin pairs, 0 mid-bin pairs, and 0 high-bin pairs.
- `THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_long_kmer_biological_stress.sh tmp/long_kmer_biological_stress_ecoli` passed on local E. coli paired biological data: both plain `k=40` and `k=40 fixspikes=t` runs preserved 1000 pairs across keep+toss, depth-bin streams totaled 1000 pairs, produced hist/rhist outputs, and stayed byte-identical across `threads=1/2/auto` with 779 kept pairs, 221 tossed pairs, 929 low-bin pairs, 71 mid-bin pairs, and 0 high-bin pairs.
- `THREAD_CASES='1 2 auto' scripts/parity_multipass_ecc_real_derived_stress.sh tmp/multipass_ecc_real_derived_stress_phix_toss_threads` passed on bundled paired phiX: all `ecc1=f eccf=t`, `ecc1=t eccf=f`, and `ecc=t` `passes=2` staging cases restored 46 reads per mate, left zero mutant sequences in keep-all output, preserved all 46 pairs across keep+toss routing, kept zero mutant sequences, exercised paired toss routing, and produced byte-identical outputs across `threads=1`, `threads=2`, and `threads=auto`.
- `STRESS_RECORDS=3 THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_2.fastq.gz" scripts/parity_multipass_ecc_real_derived_stress.sh tmp/multipass_ecc_real_derived_stress_scer_toss_threads` passed on local S. cerevisiae paired biological data: each staged `passes=2` ECC mode restored 138 records per mate in keep-all output, preserved all 138 pairs across keep+toss routing, kept zero mutant sequences, exercised paired toss routing, and stayed byte-identical across `threads=1/2/auto`.
- `STRESS_RECORDS=3 THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_multipass_ecc_real_derived_stress.sh tmp/multipass_ecc_real_derived_stress_ecoli_toss_threads` passed on local E. coli paired biological data: each staged `passes=2` ECC mode restored 138 records per mate in keep-all output, preserved all 138 pairs across keep+toss routing, kept zero mutant sequences, exercised paired toss routing, and stayed byte-identical across `threads=1/2/auto`.
- `THREAD_CASES='1 2 auto' scripts/parity_multipass_ecc_real_derived_stress.sh tmp/multipass_ecc_markuncorrectable_phix` passed on bundled paired phiX after aligning Rust with Java's final-stage-only `outuncorrected` behavior: `passes=2` `ecc1=f eccf=t`, `ecc1=t eccf=f`, and `ecc=t` all keep the forced uncorrectable pair in keep-all output, preserve Java-shaped marked qualities on the mutated mate, emit paired `outuncorrected` only when the final ECC stage runs, and stay byte-identical across `threads=1/2/auto`.
- `STRESS_RECORDS=3 THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_2.fastq.gz" scripts/parity_multipass_ecc_real_derived_stress.sh tmp/multipass_ecc_markuncorrectable_scer` passed on local S. cerevisiae paired biological data: staged multipass ECC keeps the forced uncorrectable pair, emits no paired `outuncorrected` record for `ecc1=t eccf=f`, emits one paired `outuncorrected` record for `ecc1=f eccf=t` and `ecc=t`, preserves marked mutated-mate qualities, and stays byte-identical across `threads=1/2/auto`.
- `STRESS_RECORDS=3 THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_multipass_ecc_real_derived_stress.sh tmp/multipass_ecc_markuncorrectable_ecoli` passed on local E. coli paired biological data with the same Java-shaped staged forced-uncorrectable coverage, including keep-all retention, final-stage-only paired `outuncorrected` emission, marked-quality preservation on the mutated mate, and byte-identical `threads=1/2/auto` outputs.
- `THREAD_CASES='1 2 auto' scripts/parity_side_routing_biological_stress.sh tmp/side_routing_biological_stress_scer_probe` passed on local S. cerevisiae paired biological data: side routing preserved 1000 pairs across keep+toss, depth-bin streams totaled 1000 pairs, real-derived ECC `outuncorrected` emitted one paired uncorrected record, and all outputs were byte-identical across `threads=1/2/auto`.
- `THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_side_routing_biological_stress.sh tmp/side_routing_biological_stress_ecoli` passed on local E. coli paired biological data: side routing preserved 1000 pairs across keep+toss, depth-bin streams totaled 1000 pairs, real-derived ECC `outuncorrected` emitted one paired uncorrected record, and all outputs were byte-identical across `threads=1/2/auto`.
- `THREAD_CASES='1 2 auto' scripts/parity_countup_biological_stress.sh tmp/countup_biological_stress_scer_threads` passed on local S. cerevisiae paired biological data: `threads=1`, `threads=2`, and `threads=auto` all produced byte-identical count-up and count-up-ECC outputs, with 993 kept pairs, 7 tossed pairs, depth-bin streams totaling 1000 pairs, empty correction-driven uncorrected streams, and 1000 retained pairs under `countup=t ecc=t keepall=t`.
- `THREAD_CASES='1 2 auto' DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_countup_biological_stress.sh tmp/countup_biological_stress_ecoli_threads` passed on local E. coli paired biological data before count-up presort; the current presort validation supersedes this older 819 kept / 181 tossed checkpoint.
- `scripts/parity_ecc_real_derived_stress.sh tmp/ecc_real_derived_stress_generalized_phiX_len40` passed: the generalized ECC stress guard still passes on bundled phiX after adding `DATA1`/`DATA2`, `STRESS_RECORDS`, `THREADS`, and `MIN_READ_LEN` controls plus `eccmaxqual=99` so controlled mutations are correction-eligible across real quality profiles.
- `STRESS_RECORDS=3 THREADS=auto DATA1="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_2.fastq.gz" scripts/parity_ecc_real_derived_stress.sh tmp/ecc_real_derived_stress_scer_auto` passed: the stress guard now covers local S. cerevisiae biological paired data with 74 bp reads, 18 injected mutants across three source pairs, zero mutant sequences left in keep-all output, zero mutant sequences retained by normalization, and paired toss routing exercised.
- `STRESS_RECORDS=3 THREADS=auto DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_ecc_real_derived_stress.sh tmp/ecc_real_derived_stress_ecoli_auto` passed: the same generalized ECC stress guard covers local E. coli 101 bp paired data with 18 injected mutants across three source pairs, zero mutant sequences left in keep-all output, zero mutant sequences retained by normalization, and paired toss routing exercised.
- `scripts/parity_ecc_real_derived_stress.sh tmp/ecc_real_derived_noisy_uncorrectable_phix` passed after extending the real-derived ECC harness with a noisier forced-uncorrectable branch: bundled phiX now verifies that adjacent multi-mutation paired reads are retained in keep-all output and mirrored to paired `outuncorrected` streams under `ecc=t keepall=t eccmaxqual=0`.
- `STRESS_RECORDS=3 THREADS=auto DATA1="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/scer_s288c_pe_srr23631023/SRR23631023_2.fastq.gz" scripts/parity_ecc_real_derived_stress.sh tmp/ecc_real_derived_noisy_uncorrectable_scer` passed: local S. cerevisiae biological paired data now exercises the same noisier real-derived uncorrectable branch, preserving the forced multi-mutation pair in keep-all output while routing it to paired `outuncorrected` under `eccmaxqual=0`.
- `STRESS_RECORDS=3 THREADS=auto DATA1="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz" DATA2="$HOME/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz" scripts/parity_ecc_real_derived_stress.sh tmp/ecc_real_derived_noisy_uncorrectable_ecoli` passed: local E. coli biological paired data now covers the same noisier real-derived uncorrectable ECC routing, keeping the forced multi-mutation pair in keep-all output and emitting it to paired `outuncorrected`.
- `scripts/parity_countup_fallback_real_dataset.sh tmp/countup_overlap_ecc_heartbeat` passed: Rust count-up now exercises `addbadreadscountup=t` plus `rename=t` on the paired phiX guard fixture, has unit coverage proving tossed reads update the exact kept-kmer table only when `abrc` is enabled, emits renamed count-up headers, honors `minlen` toss filtering, creates keep/toss/depth-bin/outuncorrected side streams instead of rejecting them, performs table-based ECC on kept count-up reads, routes uncorrectable count-up reads to `outuncorrected`, covers count-up `markerrors=t trimaftermarking=t`, verifies `countup=t ecco=t` can still repair an accepted high-entropy paired overlap-only low-quality mate error when `k=31` prevents table ECC from helping, and now also guards the stricter competing short-overlap ambiguity rejection in count-up mode; it is also backed by biological `markuncorrectableerrors=t` stress on forced uncorrectable pairs.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_overlap_repair_heartbeat` passed: `ecco=t` now records overlap-correction intent and Rust performs Java-shaped strict overlap handling before table-based ECC; the smoke uses one compact fixture where `ecco=t` repairs the lower-quality overlapping mate, plus a competing-overlap ambiguity fixture where Rust now matches Java by rejecting the overlap and preserving the parsed/clamped FASTQ output byte-for-byte, and it still checks the explicit stderr note.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_norm_toss_heartbeat` passed: the ECC smoke now also compares Java/Rust normalization/toss routing for `ecc=t target=2 max=2` with both `tossbadreads=f` and `tossbadreads=t`, verifies keep/toss files are non-empty, confirms the double-mutant kept read is corrected when bad-read tossing is disabled, and confirms noisy reads are tossed when bad-read tossing is enabled.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_noisy_multipass_heartbeat` passed: the ECC smoke now also compares Java/Rust `passes=2` staged correction for `ecc1=f eccf=t`, `ecc1=t eccf=f`, and `ecc=t` on a noisier mixed-mutant fixture with two distinct single substitutions plus a double-substitution read, and verifies Rust does not leave mutant sequences behind.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_behavior_probe` passed: `ecc=t` no longer emits the old fallback note, preserves paired phiX no-error output against the vendored no-ECC baseline, and corrects a representative one-substitution fixture in Rust.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_behavior_uncorrectable` passed: the ECC smoke now also forces an uncorrectable high-quality suspected error with `eccmaxqual=0` and verifies Rust keep plus `outuncorrected` FASTQ output byte-for-byte against vendored Java.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_mark_trim_behavior` passed: the ECC smoke now also compares Java/Rust `markerrors=t trimaftermarking=t qtrim=r trimq=20 optitrim=f` output on a representative tail-error fixture, proving Rust defers qtrim until after ECC marking in that path.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_multipass_staging_behavior` passed: the ECC smoke now also verifies `passes=2` correction staging for `ecc1=f eccf=t`, `ecc1=t eccf=f`, and `ecc=t` against vendored Java on the representative substitution fixture.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_paired_uncorrectable_heartbeat` passed: the ECC smoke now forces a two-file paired uncorrectable mate case and verifies Java/Rust keep1, keep2, outuncorrected1, and outuncorrected2 byte-for-byte.
- `scripts/parity_rename_real_dataset.sh tmp/rename_ecc_fields_heartbeat` passed: renamed ECC-active single-end and paired outputs now match Java's header shape, including `e1=0` and `e1=0,e2=0` fields emitted before correction work mutates reads.
- `scripts/parity_multipass_real_dataset.sh tmp/multipass_p234_targetbad_heartbeat` passed: paired phiX keep/hist outputs now compare Java/Rust for `passes=2`, `passes=3`, and `passes=4`; Rust also has unit coverage for Java-shaped `targetbadpercentilelow/high` bad-read target tightening during intermediate multipass passes.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_ecc_behavior` passed: the smoke suite now records `ecc_behavior` alongside multipass/countup/fallback guards, with the new ECC correction check included.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/benchmark_biological_matrix.sh tmp/biological_matrix_ecc_behavior_quick` passed on the local larger S. cerevisiae paired dataset preset; `threads_auto` completed in 0.038409s with 19764 KB max RSS.
- `scripts/parity_kmer_table_runtime_fallback_real_dataset.sh tmp/kmer_table_matching_runtime_heartbeat` passed: vendored `KmerNormalize` rejects shared kmer-table runtime controls such as `initialsize`, `ways`, table buffer length aliases, `tabletype`, `rcomp`, `maskmiddle`, preallocation, prefilter-memory, and prepass toggles in this path, while Rust accepts them as explicit working-path fallbacks, validates malformed numeric values, and matches the local Java paired phiX baseline outputs.
- `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/sketch_countmin_real_behavior` passed: Rust now validates malformed sketch/table-sizing values, applies real deterministic trusted filtering for `buildpasses=2`, applies real deterministic prefilter estimates for `prehashes=1`/`prefilterhashes=1` and constrained `prefiltercells=1k`/`precells=1k`, and builds direct fixed-memory count-min sketches for constrained `cells=1k` and `matrixbits=10` requests so paired phiX histograms now change under tiny sketch settings.
- `scripts/parity_tmpdir_controls_real_dataset.sh tmp/tmpdir_managed_temp_heartbeat` passed: vendored Java and Rust both accept `tmpdir`/`usetmpdir` controls, leave the requested temp directories empty in the covered single-pass paired phiX path, produce matching keep/bin/hist outputs, and Rust now creates and cleans managed multipass temp directories under the requested `tmpdir` parent.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_tmpdir_controls_heartbeat` passed: fallback smoke now includes the temporary-directory control guard.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_tmpdir_controls_quick_heartbeat` passed after the temporary-directory parser addition: component smoke, fallback smoke with `tmpdir_controls`, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_diagnostic_sizing_fallback_real_dataset.sh tmp/diagnostic_sizing_break_disabled_heartbeat` passed: Rust now accepts disabled/non-positive `breaklen`/`breaklength` controls as explicit no-ops, keeps positive and malformed read breaking rejected, and matches the local Java paired phiX baseline outputs with diagnostic sizing and disabled recalibration fallbacks.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_break_disabled_heartbeat` passed: fallback smoke includes disabled read-breaking coverage in the diagnostic sizing stage.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_break_disabled_quick_heartbeat` passed after the disabled break-length parser addition: component smoke, fallback smoke with updated diagnostics, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_null_outputs_real_dataset.sh tmp/null_outputs_case_heartbeat` passed: case-insensitive `null`/`NULL` sequence and histogram sinks now match vendored Java, while `none` remains a literal path like BBTools in the focused Rust unit coverage.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_null_case_heartbeat` passed: fallback smoke includes the expanded case-insensitive null output sink guard.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_null_case_quick_heartbeat` passed after the case-insensitive null sink correction: component smoke, fallback smoke with expanded `null_outputs`, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_null_outputs_real_dataset.sh tmp/null_outputs_heartbeat` passed: vendored Java and Rust both suppress literal `out=null`/`out=NULL` paired-bin null sequence outputs and `hist=NULL`, do not create stray `null`/`NULL` files, and retain matching paired phiX `hist`/`rhist` outputs.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_null_outputs_heartbeat` passed: the fallback smoke now includes BBTools-style null output sink coverage alongside the existing fallback/compatibility stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_null_outputs_quick_heartbeat` passed after the null-output sink addition: component smoke, fallback smoke with `null_outputs`, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_diagnostic_sizing_fallback_real_dataset.sh tmp/diagnostic_sizing_disabled_recal_heartbeat` passed: vendored `KmerNormalize` rejects shared `testsize`, while Rust accepts diagnostic sizing plus disabled recalibration aliases, rejects enabled/malformed recalibration, and matches the local Java paired phiX FASTQ/hist baseline.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_disabled_recal_heartbeat` passed: the diagnostic sizing stage now includes disabled recalibration alias coverage alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_disabled_recal_quick_heartbeat` passed after the disabled recalibration parser addition: component smoke, fallback smoke with the updated diagnostic stage, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_quality_recal_suffix_real_dataset.sh tmp/quality_recal_suffix_heartbeat` passed: vendored `KmerNormalize` accepts `_p1`/`_p2` pass-suffixed quality recalibration controls, while Rust now accepts them as output-preserving no-ops and matches Java paired phiX FASTQ/hist outputs.
- `scripts/parity_diagnostic_sizing_fallback_real_dataset.sh tmp/diagnostic_sizing_fallback_heartbeat` passed: vendored `KmerNormalize` rejects shared `testsize`, while Rust accepts it plus disabled `recalibrate` aliases as no-ops, keeps `breaklen`, enabled recalibration, and malformed recalibration rejected, and matches the local Java paired phiX baseline outputs.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_recal_suffix_diagnostic_heartbeat` passed: the pass-suffixed quality recalibration and diagnostic sizing guards are now included alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_recal_suffix_diagnostic_quick_heartbeat` passed after the quality suffix and diagnostic sizing parser additions: component smoke, fallback smoke with the new stages, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_genome_context_fallback_real_dataset.sh tmp/genome_context_fallback_heartbeat` passed: vendored `KmerNormalize` rejects shared `build`/`genome` controls in this path, while Rust now accepts them as reference-context no-ops, keeps mapping-aware filters rejected, and matches the local Java paired phiX baseline outputs.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_genome_context_heartbeat` passed: the genome-build context guard is now included alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_genome_context_quick_heartbeat` passed after the genome-build context parser addition: component smoke, fallback smoke with genome context controls, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_io_runtime_hints_fallback_real_dataset.sh tmp/io_runtime_hints_fallback_heartbeat` passed: vendored `KmerNormalize` rejects shared extension and I/O worker controls such as `extin` and `workers`, while Rust now accepts them as format/runtime hints and matches the local Java paired phiX baseline outputs.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_io_hints_heartbeat` passed: the I/O hint fallback guard is now included alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_io_hints_quick_heartbeat` passed after the I/O hint parser addition: component smoke, fallback smoke with extension/I/O worker controls, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_cardinality_estimate_real_dataset.sh tmp/cardinality_estimate_heartbeat` passed: vendored `KmerNormalize` rejects shared cardinality/loglog controls such as `cardinality`, while Rust now accepts them as exact-count/no-cardinality fallbacks and matches the local Java paired phiX baseline outputs.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_cardinality_heartbeat` passed: the cardinality/loglog fallback guard is now included alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_cardinality_quick_heartbeat` passed after the cardinality/loglog parser addition: component smoke, fallback smoke with cardinality/loglog controls, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_side_output_stats_fallback_real_dataset.sh tmp/side_output_entropy_heartbeat` passed: vendored `KmerNormalize` rejects shared stats histogram controls such as `qhist`, while Rust now accepts them, emits covered `qhist`, `bqhist`, `qchist`, `aqhist`, `obqhist`, `lhist`, `gchist`, `bhist`, and `enthist` artifacts with 20000 quality-counted bases, 10000/10000 paired quality-count bases, 100/100 average-quality reads, 200 reads/20000 bases in the length histogram, 200 reads/20000 bases in the GC-bin histogram, 100 paired base-quality rows, 200 paired base-content rows, and 200 entropy-scored reads on paired phiX, keeps the remaining side-output stats as explicit non-emitting fallbacks, and matches the local Java paired phiX baseline outputs.
- `scripts/parity_side_output_stats_fallback_real_dataset.sh tmp/side_output_idhist_heartbeat` passed: Rust now emits covered sequence-input `idhist` fallback artifacts with 200 reads/20000 bases at 100 identity on paired phiX, while keeping the main paired Java baseline outputs byte-identical.
- `scripts/parity_side_output_stats_fallback_real_dataset.sh tmp/side_output_alignment_fallback_heartbeat` passed: Rust now emits covered no-alignment `mhist`, `ihist`, `qahist`, `indelhist`, and `ehist` fallback artifacts on paired phiX, with match-shaped rows over 100 positions, 20000 quality-accuracy match observations, and 200 reads at zero observed alignment errors, while keeping the main paired Java baseline outputs byte-identical.
- `scripts/parity_side_output_stats_fallback_real_dataset.sh tmp/side_output_single_pass_stats_heartbeat` passed after consolidating all covered read-local side-output histograms into one trimmed primary input scan instead of separate rereads for quality, length, GC, base-content, entropy, and identity artifacts.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_side_output_stats_heartbeat` passed: the side-output stats fallback guard is now included alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_side_output_stats_quick_heartbeat` passed after the side-output stats parser addition: component smoke, fallback smoke with side-output stats controls, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_file_aliases_representative_dataset.sh tmp/file_aliases_heartbeat` passed: vendored `KmerNormalize` rejects shared `input`/`output` file aliases in this path, while Rust now accepts them as real aliases and matches canonical Java output on a representative paired fixture.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_file_aliases_heartbeat` passed: the shared file-alias guard is now included alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_file_aliases_quick_heartbeat` passed after the file-alias parser addition: component smoke, fallback smoke with file aliases, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_sam_runtime_noops_real_dataset.sh tmp/sam_runtime_noops_heartbeat` passed: vendored `KmerNormalize` rejects shared SAM/readgroup/streamer controls in this path, while Rust now accepts them as no-ops for covered FASTQ output and paired phiX output matches the local Java baseline.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_sam_runtime_heartbeat` passed: the new SAM/readgroup runtime fallback guard is now included alongside the existing fallback/compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_sam_runtime_quick_heartbeat` passed after the SAM/readgroup parser addition: component smoke, fallback smoke with SAM runtime no-ops, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_config_file_real_dataset.sh tmp/config_file_heartbeat` passed: comma-separated BBTools `config=<file>` expansion now feeds the Rust parser from one-argument-per-line config files and paired phiX output matches the local Java baseline.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_config_file_heartbeat` passed: config-file expansion is now included alongside the existing default multipass, count-up, ECC, sketch-control, sampling-option, deterministic, peak-alias, trimq-comma, MPI, pairing-runtime, and preparser fallback guards.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_config_file_quick_heartbeat` passed after the config-file parser addition: component smoke, fallback smoke with config expansion, bundled phiX thread scaling, biological RSS guard, and the S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_preparser_runtime_noops_real_dataset.sh tmp/preparser_runtime_noops_heartbeat` passed: BBTools preparser controls (`json`, `silent`, `printexecuting`, proxy settings, `metadatafile`, `bufferbf`) now parse as Rust no-ops and paired phiX output matches the local Java baseline.
- `scripts/parity_pairing_runtime_fallback_real_dataset.sh tmp/pairing_runtime_fallback_heartbeat` passed: enabled global pairing controls (`pairreads=t`, `flipr2=t`) now emit explicit-routing notes and Rust output matches the local Java baseline on paired phiX.
- `scripts/parity_mpi_fallback_real_dataset.sh tmp/mpi_fallback_heartbeat` passed: enabled BBTools MPI controls (`usempi=t`, `mpi=2`, `crismpi=t`, `mpikeepall=t`) now emit local-run notes and Rust output matches the local Java baseline on paired phiX.
- `scripts/parity_trimq_comma_fallback_real_dataset.sh tmp/trimq_comma_fallback_heartbeat` passed: `trimq=10,20` now emits a position-specific trim fallback note and matches the real-derived first-threshold `trimq=10` baseline.
- `scripts/parity_peak_short_aliases_representative_dataset.sh tmp/peak_short_aliases_heartbeat` passed: vendored `KmerNormalize` still rejects `h/v/w/minp/maxp`, while Rust now accepts those aliases and produces byte-identical output to the long-option peak baseline.
- `scripts/parity_deterministic_fallback_representative_dataset.sh tmp/deterministic_mode_heartbeat` passed: `deterministic=f` no longer emits the old fallback note; Rust keeps nondeterministic read selection enabled while the keepall guard remains Java-identical.
- `scripts/parity_sampling_options_fallback_representative_dataset.sh tmp/sampling_options_fallback_heartbeat` passed: vendored Java still rejects wrapper sampling knobs, while Rust now emits ignore notes and preserves baseline representative output.
- `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/sketch_controls_fallback_heartbeat` passed: vendored Java still changes histograms for `prehashes=1` and `buildpasses=2`; Rust now exercises real prefilter collision behavior for `prehashes`/`prefilterhashes` and constrained `prefiltercells`/`precells`, `buildpasses` now exercises real trusted-kmer depth reduction, and constrained `cells`/`matrixbits` requests use direct fixed-memory Rust count-min sketches instead of staying byte-identical to the exact-count baseline.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_multimark_semantics_heartbeat` passed after aligning ECC marking semantics with Java: `markerrors`/`markuncorrectableerrors` now honor `cfl`/`cfr` direction toggles but do not cap marked sites at `ecclimit`, and the Java/Rust guard byte-compares a multi-site `markerrors=t ecclimit=1 cfl=t cfr=f` fixture.
- `cargo test ecco -- --nocapture` passed after adding explicit `ecco=auto` support: Rust now accepts the Java option, resolves it with a bounded 1%-sample paired-overlap probe, disables overlap correction when the Java-shaped sample is empty on compact fixtures, and matches the Java-golden compact auto-overlap fixture.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_heartbeat` passed: default multipass, `countup=t`, `ecc=t`, sketch-control, sampling-option, and deterministic-mode fallback behaviors all completed through their representative Java/Rust sanity scripts.
- `scripts/fallback_smoke.sh tmp/fallback_smoke_multipass_markuncorrectable` passed: the default fallback suite now includes the staged multipass marked-uncorrectable ECC guard alongside the existing compatibility smoke stages.
- `MATRIX_CASES=scer_s288c_srr23631023 MATRIX_THREAD_CASES=auto THREAD_CASES=auto MAX_RSS_KB=1000000 scripts/working_pipeline_smoke.sh tmp/working_pipeline_overlap_quick` passed after adding overlap-ECC biological stress to the working pipeline: component smoke, fallback smoke, bundled phiX thread scaling, count-up/side-routing/side-output/long-kmer biological stress, the new overlap-ECC biological stress, the biological RSS guard, and the bounded S. cerevisiae biological matrix preset all completed successfully.
- `scripts/parity_ecc_fallback_real_dataset.sh tmp/ecc_behavior_probe` passed: `ecc=t` no longer emits the old fallback note, preserves paired phiX no-error output against the vendored no-ECC baseline, and corrects a representative one-substitution fixture in Rust.
- `scripts/parity_countup_fallback_real_dataset.sh tmp/countup_fallback_heartbeat` passed: vendored Java still fails on the local paired phiX `countup=t` guard fixture, while Rust produces exact count-up keep/toss/hist/rhist outputs through the supported single-pass path.
- `scripts/parity_default_multipass_fallback_representative_dataset.sh tmp/default_multipass_fallback_heartbeat` passed: omitted `passes` now runs the supported Rust single-pass engine with a clear stderr note and matches vendored BBNorm output on the compact keep-all representative fixture.
- `READS=50000 KEEP_OUTPUTS=0 DATA1='/home/jake/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz' DATA2='/home/jake/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz' scripts/benchmark_biological_dataset.sh tmp/biological_ecoli_50k_1_2_auto` passed byte-identical `threads=1`, `threads=2`, and `threads=auto` output checks. Timings were 7.234772s, 4.036265s, and 1.607828s respectively; peak RSS was 100136 KB, 131928 KB, and 242928 KB respectively.
- `READS=250000 KEEP_OUTPUTS=0 DATA1='/home/jake/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz' DATA2='/home/jake/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz' scripts/benchmark_biological_dataset.sh tmp/biological_ecoli_250k_1_2_auto` passed byte-identical `threads=1`, `threads=2`, and `threads=auto` output checks. Timings were 35.921721s, 20.350735s, and 7.791469s respectively; peak RSS was 352872 KB, 366108 KB, and 533892 KB respectively.
- `READS=500000 KEEP_OUTPUTS=0 DATA1='/home/jake/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_1.fastq.gz' DATA2='/home/jake/Projects/biological data/reads/short_reads/ecoli_mg1655_pe_drr023054/DRR023054_2.fastq.gz' scripts/benchmark_biological_dataset.sh tmp/biological_ecoli_500k_1_2_auto` passed byte-identical `threads=1`, `threads=2`, and `threads=auto` output checks. Timings were 73.626009s, 40.766744s, and 15.435244s respectively; peak RSS was 352756 KB, 366276 KB, and 536080 KB respectively.
- `READS=250000 KEEP_OUTPUTS=0 DATA1='/home/jake/Projects/biological data/reads/short_reads/bacillus_subtilis_168_pe_drr066522/DRR066522_1.fastq.gz' DATA2='/home/jake/Projects/biological data/reads/short_reads/bacillus_subtilis_168_pe_drr066522/DRR066522_2.fastq.gz' scripts/benchmark_biological_dataset.sh tmp/biological_bsubtilis_250k_1_2_auto` passed byte-identical `threads=1`, `threads=2`, and `threads=auto` output checks. Timings were 120.074106s, 69.816927s, and 26.804570s respectively; peak RSS was 2581860 KB, 2657908 KB, and 3218776 KB respectively. This dataset is a useful memory-pressure fixture compared with E. coli at the same read bound.
- `MATRIX_READS=10000 scripts/benchmark_biological_matrix.sh tmp/biological_matrix_10k_heartbeat` passed byte-identical thread checks across yeast, E. coli, and Bacillus presets. Fastest times were 0.311901s, 0.334383s, and 1.450881s respectively, all with `threads=auto`; max RSS was 132260 KB, 144944 KB, and 450008 KB respectively.
- `scripts/benchmark_biological_matrix.sh tmp/biological_matrix_1k_spombe_heartbeat` passed byte-identical thread checks across S. cerevisiae, S. pombe, E. coli, and Bacillus presets. Fastest times were 0.037934s, 0.062535s, 0.044024s, and 0.131774s respectively, all with `threads=auto`; max RSS was 19248 KB, 39872 KB, 28384 KB, and 50516 KB respectively.
- `MATRIX_CASES='spombe_972_srr17530188 ecoli_mg1655_drr023054' MATRIX_READS=10000 scripts/benchmark_biological_matrix.sh tmp/biological_matrix_spombe_ecoli_10k` passed byte-identical thread checks for multi-case filtered matrix selection. Fastest times were 0.737018s and 0.336659s respectively, both with `threads=auto`; max RSS was 267840 KB and 144704 KB respectively.
- `MATRIX_READS=10000 scripts/benchmark_biological_matrix.sh tmp/biological_matrix_10k_four_dataset` passed byte-identical thread checks across S. cerevisiae, S. pombe, E. coli, and Bacillus presets. Fastest times were 0.299141s, 0.663508s, 0.368586s, and 1.404300s respectively, all with `threads=auto`; max RSS was 132040 KB, 267648 KB, 148004 KB, and 532680 KB respectively.
- `MATRIX_CASES='scer_s288c_err915337 ecoli_mg1655_drr217208 ecoli_mg1655_srr13921545' MATRIX_READS=1000 MATRIX_THREAD_CASES='1 2 auto' scripts/benchmark_biological_matrix.sh tmp/biological_matrix_new_presets_1k` passed byte-identical thread checks for the three added biological matrix presets. Fastest times were 0.046415s, 0.074106s, and 0.046395s respectively, all with `threads=auto`; max RSS was 22008 KB, 43588 KB, and 29576 KB respectively.
- `MATRIX_READS=1000 MATRIX_THREAD_CASES='1 2 auto' scripts/benchmark_biological_matrix.sh tmp/biological_matrix_1k_seven_dataset` passed byte-identical thread checks across all seven biological matrix presets: two S. cerevisiae runs, S. pombe, three E. coli runs, and Bacillus. Fastest times were 0.043545s, 0.041624s, 0.072195s, 0.042202s, 0.066064s, 0.044206s, and 0.114736s respectively, all with `threads=auto`; max RSS was 19896 KB, 21448 KB, 33812 KB, 22376 KB, 40832 KB, 29428 KB, and 48000 KB respectively.
- `scripts/working_pipeline_smoke.sh tmp/working_pipeline_smoke_seven_dataset_default` passed the default one-command working check after biological matrix expansion: component smoke, bundled phiX thread scaling, biological RSS guard smoke, and all seven 1k-read biological matrix presets with `threads=1 2 auto`. Matrix fastest times were 0.042024s, 0.044107s, 0.066870s, 0.048569s, 0.055599s, 0.049384s, and 0.127032s respectively, all with `threads=auto`; max RSS was 20616 KB, 19856 KB, 40356 KB, 23868 KB, 40432 KB, 32076 KB, and 59984 KB respectively.
- `scripts/working_pipeline_smoke.sh tmp/working_pipeline_smoke_full_default` passed the pre-expansion default one-command working check: component smoke, bundled phiX thread scaling, biological RSS guard smoke, and the original full 1k S. cerevisiae/S. pombe/E. coli/Bacillus matrix.
- `cargo test --all` passed after the parser and harness additions: 82 library tests, 4 basic integration tests, 95 Java-golden parity tests, and doc tests all completed successfully.
- `CARGO_BUILD_JOBS=1 cargo test kmer -- --test-threads=1`, `cargo test exact_counts_remove_duplicate_kmers_per_read -- --test-threads=1`, `cargo test bounded_sketch_chunked_parallel_matches_pairwise_increment -- --test-threads=1`, `cargo test packed_count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after switching short-kmer enumeration to BBTools-style rolling forward/reverse state, replacing per-read duplicate hash sets with sorted/deduped buffers, and updating packed count-min unique-kmer estimates to BBTools' hash-adjusted used-fraction formula.
- Release-mode human-slice no-output benchmark on `tmp/human_benchmark_8threads/human_GRCh38_500k_R{1,2}.fq.gz` with `threads=8`, `k=31`, `minq=0`, `minprob=0`, `keepall=t`, and no FASTQ output completed in 152.18s with 4,809,252 KB max RSS after the rolling-kmer/sorted-buffer changes. The previous Rust run of the same no-output sampled case was 200.27s with 4,681,136 KB max RSS, and the new Rust `hist`/`rhist` outputs are byte-identical to that previous Rust run.
- `CARGO_BUILD_JOBS=1 cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `cargo build --release` passed after adding large-input automatic count-min selection, BBTools-shaped memory sizing, `mem=`/`memory=` auto budget parsing, `autocountmin=` controls, and `exact=t` exact-map override. The same 500k-pair human no-output benchmark now automatically selected the bounded sketch without explicit `sketchmemory`, completing in 73.01s with 1,729,264 KB max RSS; the prior post-rolling exact run was 152.18s with 4,809,252 KB max RSS, and the original exact run was 200.27s with 4,681,136 KB max RSS.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `cargo build --release` passed after replacing the default `bits=32` bounded count-min insertion path with a direct atomic table and reusable per-worker duplicate-removal buffers. On the same 500k-pair human 8-thread benchmark, Rust auto-sketch no-output dropped from 77.32s / 1,739,188 KB RSS to 10.13s / 1,401,356 KB RSS with byte-identical `hist`/`rhist`; full FASTQ output completed in 10.44s / 1,403,824 KB RSS with keep FASTQs byte-identical to the Java benchmark outputs and auto-sketch histograms byte-identical to the prior Rust auto-sketch run.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo build --release`, and `cargo test --all -- --test-threads=1` passed after changing the default atomic `bits=32` count-min path to BBTools-style conservative updates. On the 500k-pair human `target=40 max=80 min=5 k=31 minq=0 minprob=0 threads=8` normalization benchmark, Java kept 27,277 pairs in 8.43s / 13.05 GiB RSS, Rust exact kept 27,339 pairs in 174.05s / 4.68 GiB RSS, the previous Rust auto-sketch kept 28,362 pairs in 10.03s / 1.34 GiB RSS, and the conservative Rust auto-sketch kept 27,397 pairs in 17.06s / 1.72 GiB RSS. Conservative auto-sketch now differs from Rust exact by 58 kept pairs and from Java by 120 kept pairs while remaining memory-bounded; detailed artifacts are under `tmp/human_benchmark_8threads/normalization_target40_max80_conservative_auto_20260424`.
- `cargo check` passed after switching `flate2` to the `zlib-rs` backend, wiring `threads`/`zipthreads`/`pigz`/`unpigz` controls into `.gz` input readers, keeping parallel `.gz` output, and adding `scripts/benchmark_giant_safe.sh` as a guarded release benchmark harness that defaults to bounded count-min, null sequence outputs, repo-owned `/proc` peak-RSS/time capture, and explicit thread/memory settings for large real read sets. `scripts/install_benchmark_tools.sh` installs `pigz`/`unpigz` locally when system package installation is unavailable, and `scripts/benchmark_java_rust_human.sh` provides matched Java/Rust benchmarking with desktop-safe read limits and comparison artifacts. A bounded human-slice smoke with `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m` completed under `tmp/giant_safe_smoke_20260424_zlib`, processing 2,000 reads with 182,074 input unique kmers and no sequence-output files.
- `cargo test count_min -- --test-threads=1`, `cargo test bounded_ -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `scripts/parity_sketch_controls_fallback_real_dataset.sh tmp/sketch_total_cells_20260424`, and `cargo build --release` passed after changing explicit `cells`/`matrixbits` to BBTools-style total-cell budgets and switching packed small-bit sketches to deterministic conservative updates. `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=8 JAVA_XMX=8g MEM=512m WRITE_OUTPUTS=0 scripts/benchmark_java_rust_human.sh ... tmp/java_rust_human_safe_1k_totalcells_20260424` matched Java `hist`/`rhist` exactly, with Java at 1.071s / 6,692,328 KB RSS and Rust at 0.102s / 252,836 KB RSS. `READS=500000 TABLE_READS=500000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0 scripts/benchmark_giant_safe.sh ... tmp/giant_safe_human_500k_8t_totalcells_20260424` processed 1,000,000 reads / 150,000,000 bases in 16.659s with 679,660 KB peak RSS and no sequence-output files.
- `cargo test count_min_hash_uses_bbtools_row_rotation_masks -- --test-threads=1` and `cargo test count_min -- --test-threads=1` passed after replacing the generic Rust count-min bucket mixer with a KCountArray7MTA-shaped row hash: row 0 uses BBTools-style double masking, later rows rotate right by 6 bits before applying row masks, and the deterministic static mask table preserves the Java invariant of 16 set bits in each 32-bit half with the sign bit clear. This is not byte-for-byte Java random-mask parity yet, but it moves bounded sketches onto the same row-rotation/mask structure used by the vendored source.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 scripts/benchmark_giant_safe.sh ... tmp/giant_safe_hashshape_1k_20260424` passed after the KCountArray-shaped hash change. The safe Rust-only human slice processed 2,000 reads / 300,000 bases in 0.151699s with 302,020 KB peak RSS and no FASTQ outputs, keeping the validation path desktop-safe while confirming the bounded sketch still runs end-to-end.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test count_min -- --test-threads=1`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 scripts/benchmark_giant_safe.sh ... tmp/giant_safe_primecells_1k_20260424` passed after porting KCountArray7MTA-style prime cell sizing into Rust's bounded split-row sketches. Requested `cells`/memory still cap total allocation, but each hash row now rounds down to a prime length for better large-table distribution; the safe human slice processed 2,000 reads / 300,000 bases in 0.151512s with 298,500 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test prefilter -- --test-threads=1`, `cargo test --all -- --test-threads=1`, and `scripts/parity_kmer_table_runtime_fallback_real_dataset.sh tmp/kmer_table_runtime_prefilter_memory_20260424` passed after turning `filtermemory`/`prefiltermemory`/`filtermem`/`filtermemoryoverride` from note-only parser fallbacks into real Rust prefilter memory budgets. Pure kmer-table runtime knobs still byte-match the local Java phiX baseline, while the prefilter memory aliases now deliberately exercise Rust prefilter collision behavior with fixed memory and prime row sizing.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test prefilter -- --test-threads=1`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilterfraction=0.35' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_prefilterfraction_1k_20260424` passed after making `prefiltersize`/`prefilterfraction` real prefilter memory partition controls. Bare `prefilter=t` remains Java-safe unless sizing/hash/fraction controls are supplied; the guarded human slice with fraction-based prefiltering processed 2,000 reads / 300,000 bases in 0.151454s with 302,516 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilterfraction=0.35' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_twostage_prefilter_partition_deterministic_1k_20260424` passed after moving configured bounded input prefiltering onto a real two-stage prefilter-plus-main sketch. Fraction-based prefiltering now steals memory from the main table budget instead of adding a second full allocation, and atomic conservative chunk replay is sorted for deterministic collision behavior; the guarded human slice processed 2,000 reads / 300,000 bases in 0.707425s with 310,488 KB peak RSS and no FASTQ outputs.
- `cargo test count_min_hash_uses_bbtools_row_rotation_masks -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilterfraction=0.35' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_java_masks_prefilter_1k_20260424` passed after replacing the deterministic placeholder count-min mask table with BBTools' seed-0 FastRandomXoshiro mask generation loop from `KCountArray7MTA`. The guarded human slice processed 2,000 reads / 300,000 bases in 0.655246s with 310,232 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilterfraction=0.35' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_kcount_arrays_prefilter_1k_20260424` passed after adding KCountArray-style internal array slot placement to bounded sketch buckets. Large sketch rows now choose an internal array from low hash bits, shift by the array bits, then mod into prime cells-per-array; the guarded human slice processed 2,000 reads / 300,000 bases in 0.655178s with 309,968 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test lockedincrement -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilterfraction=0.35 symmetricwrite=f' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_symmetricwrite_unlocked_1k_20260424` passed after wiring BBTools `lockedincrement`/`symmetricwrite` into bounded Rust sketches. Multi-bit, multi-hash tables now default to KCountArray7MTA locked/conservative writes, `symmetricwrite=f` switches both packed and atomic sketches to independent row increments, and exact-mode collision-estimate fallbacks now replay through bounded sketch objects instead of materializing unbounded hash-bucket maps; the guarded human slice processed 2,000 reads / 300,000 bases in 0.605137s with 310,116 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test unique_kmers -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilterfraction=0.35' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_prefilter_threshold_unique_1k_20260425` passed after adding BBTools-style thresholded unique-kmer estimates for exact maps, packed count-min sketches, atomic count-min sketches, and prefilter/main high-depth splits. The guarded human slice processed 2,000 reads / 300,000 bases in 0.656408s with 310,256 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_prefilter_flag_default_fraction_1k_20260425` passed after making bare `prefilter=t` use BBTools' default 35% prefilter/main memory partition whenever Rust is already on a bounded count-min path. Small non-sketch inputs still remain exact; the guarded human slice processed 2,000 reads / 300,000 bases in 0.604481s with 310,180 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prefilter_limit_gate_1k_20260425` passed after making the two-stage count-min path carry an explicit KCountArray-style `prefilterLimit` gate for main-table counting, read-depth lookup, and thresholded unique-kmer estimates. The guarded human slice processed 2,000 reads / 300,000 bases in 0.705617s with 310,064 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=2 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_shared_kcount_table_1k_20260425` passed after moving bounded main, output, and prefilter sketches from separated hash-row storage to one shared KCountArray-style table probed by all hash functions. The guarded human slice processed 2,000 reads / 300,000 bases in 0.604775s with 310,296 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test kcount_array -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_thread_shards_8t_1k_20260425` passed after making bounded main, output, and prefilter sketches derive their KCountArray-style internal array count from explicit Rust `threads=` settings. The guarded human slice processed 2,000 reads / 300,000 bases in 0.504708s with 284,500 KB peak RSS, kept 10 reads, tossed 1,990 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test active_rayon_threads_for_auto -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=auto ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_auto_thread_shards_1k_20260425` passed after making unset/`threads=auto` bounded sketches derive their KCountArray-style minimum internal array count from Rayon's active worker count. The guarded human slice processed 2,000 reads / 300,000 bases in 0.304121s with 168,584 KB peak RSS, kept 10 reads, tossed 1,990 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=10000 TABLE_READS=10000 THREADS=auto ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_kcount_increment_return_10k_20260425` passed after aligning packed and atomic bounded sketch updates with KCountArray7MTA-style `incrementAndReturnUnincremented` semantics and removing an extra atomic depth-histogram allocation/copy. The guarded human slice processed 20,000 reads / 3,000,000 bases in 1.371354s with 559,336 KB peak RSS, kept 188 reads, tossed 19,812 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test key_locked_like_kcountarray -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after adding BBTools' 1999-lock key-striped guard around atomic conservative read-min/raise-all updates. The matching guarded human slice (`READS=10000 TABLE_READS=10000 THREADS=auto ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_kcount_locks_10k_20260425`) processed 20,000 reads / 3,000,000 bases in 1.419841s with 525,440 KB peak RSS, kept 188 reads, tossed 19,812 reads, wrote only histogram artifacts, and produced byte-identical `hist`/`rhist` versus the prior 10k run.
- `cargo fmt --all --check`, `cargo test atomic_count_min_ -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after splitting atomic conservative updates into a key-locked direct call and a lock-free bulk replay helper for already-merged engine chunk maps. The matching guarded human slice (`READS=10000 TABLE_READS=10000 THREADS=auto ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_kcount_lock_batch_10k_20260425`) processed 20,000 reads / 3,000,000 bases in 1.419233s with 666,788 KB peak RSS, kept 188 reads, tossed 19,812 reads, wrote only histogram artifacts, and produced byte-identical `hist`/`rhist` versus both prior 10k runs. At this scale, wall time was effectively flat versus per-key locks, so larger guarded runs are still needed before claiming a measurable speed win.
- `cargo fmt --all --check`, `cargo test atomic_count_min_allocates_locks_only_for_conservative_updates -- --test-threads=1`, `cargo test atomic_count_min_ -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after making atomic independent count-min sketches skip the BBTools-style 1999-lock stripe allocation while preserving conservative locking by default. The matching independent-row guarded human slice (`READS=10000 TABLE_READS=10000 THREADS=auto ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t symmetricwrite=f' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_independent_no_locks_10k_20260425`) processed 20,000 reads / 3,000,000 bases in 1.368810s with 604,324 KB peak RSS, kept 188 reads, tossed 19,812 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test long_kmer -- --test-threads=1`, `cargo test kmer::tests -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after replacing per-window heap vectors in the Java-shaped long-kmer hash path with stack-backed word buffers for common `Kmer.xor()` layouts, while retaining a heap fallback for unusually large `k`. Source inspection confirmed vendored BBNorm counts scalar `Kmer.xor()` fingerprints for `k>31`, not the `KCountArray7MTA.long[]` overload. The matching guarded human slice (`READS=10000 TABLE_READS=10000 THREADS=auto ZIPTHREADS=1 MEM=512m K=40 WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_long_k40_stack_words_10k_20260425`) processed 20,000 reads / 3,000,000 bases in 1.725197s with 370,736 KB peak RSS, kept 126 reads, tossed 19,874 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test long_kmer -- --test-threads=1`, `cargo test kmer::tests -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after replacing the remaining `k>31` full-window rescans with rolling BBTools-style Java word state that emits the same scalar `Kmer.xor()` fingerprints. The matching guarded human slice (`READS=10000 TABLE_READS=10000 THREADS=auto ZIPTHREADS=1 MEM=512m K=40 WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_long_k40_rolling_10k_20260425`) processed 20,000 reads / 3,000,000 bases in 1.372024s with 382,388 KB peak RSS, kept 126 reads, tossed 19,874 reads, produced byte-identical `hist`/`rhist` versus the prior stack-buffer encoder path, and wrote only histogram artifacts. A larger capped smoke (`READS=50000 TABLE_READS=50000 ... tmp/giant_safe_long_k40_rolling_50k_20260425`) processed 100,000 reads / 15,000,000 bases in 5.333477s with 508,136 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test prefilter_and_main_sketches_use_independent_kcountarray_mask_seeds -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after making bounded sketches carry the Java-style `KCountArray7MTA` mask seed for each table instance. This follows vendored `KCountArray7MTA.makeMasks`, where the static counter advances by 7 per table, so two-stage prefilter/main sketches now use independent seed-0/seed-7 FastRandomXoshiro mask tables instead of reusing one global mask table. The matching guarded human slice (`READS=10000 TABLE_READS=10000 THREADS=auto ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh ... tmp/giant_safe_prefilter_independent_masks_10k_20260425`) processed 20,000 reads / 3,000,000 bases in 1.785025s with 590,128 KB peak RSS, kept 188 reads, tossed 19,812 reads, produced byte-identical `hist`/`rhist` versus `tmp/giant_safe_kcount_locks_10k_20260425`, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, and guarded human-slice release benchmarks passed after repairing the KCountArray hot path: bounded atomic sketches now fill row buckets incrementally instead of replaying row hashes, reuse the selected mask table across hash rows, avoid per-chunk sorting during deterministic approximate sketch replay, and enable schedule-dependent parallel replay only when `deterministic=f`. On `tmp/human_benchmark_8threads/human_GRCh38_500k_R{1,2}.fq.gz` with `READS=500000 TABLE_READS=500000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0`, the default deterministic bounded path improved from 60.722160s / 685,036 KB RSS (`tmp/gzip_scaling_finish_pass_20260425_zip8_500k_current`) to 34.552039s / 671,584 KB RSS (`tmp/hotpath_maskreuse_nosort_500k_20260425`); explicit `deterministic=f` parallel replay completed in 23.330537s / 688,056 KB RSS (`tmp/hotpath_parallel_replay_detf_500k_20260425`) with approximate sketch output allowed to vary by schedule.
- Post-hot-path Java/Rust guarded human smoke passed with `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=8 JAVA_XMX=4g MEM=512m WRITE_OUTPUTS=0 scripts/benchmark_java_rust_human.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/java_rust_human_post_hotpath_1k_20260425`: Java completed in 0.921459s / 3,425,420 KB RSS, Rust completed in 0.153106s / 253,276 KB RSS, and `hist`/`rhist` were byte-identical.
- `cargo fmt --all --check`, `cargo test constrained_count_min_caps_wide_cells_like_kcountarray -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_kcount_maxvalue_cap_1k_20260425` passed after aligning packed bounded-sketch saturation with BBTools `KCountArray.maxValue`: wide Rust packed cells now cap at signed 32-bit max instead of allowing counts above Java's `Integer.MAX_VALUE`. The guarded human slice processed 2,000 reads / 300,000 bases in 0.606421s with 280,272 KB peak RSS, kept 10 reads, tossed 1,990 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test accepts_constrained_count_min_controls_as_real_sketch_settings -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_kcount_parser_guard_1k_20260425` passed after tightening Rust sketch controls to BBTools KCountArray limits: main and prefilter `bits`/`cbits`/`cellbits` now require power-of-two widths from 1 to 32, main and prefilter hash counts now require 1 to 8 rows, and programmatic sketch construction clamps to the same 8-row KCountArray mask table limit. The guarded human slice processed 2,000 reads / 300,000 bases in 0.607181s with 284,724 KB peak RSS, kept 10 reads, tossed 1,990 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_sizing_guard_1k_20260425` passed after adding fail-fast bounded-sketch sizing guards. `matrixbits` is now parsed strictly as a BBTools exponent (`1 << matrixbits`, rejecting out-of-range exponents), and main/prefilter count-min construction verifies requested table bytes fit configured `mem`/`sketchmemory` budgets directly, or a safe fraction of currently available memory when no explicit budget is set, before KCountArray prime sizing or allocation. The guarded human slice processed 2,000 reads / 300,000 bases in 0.454191s with 284,476 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test short_kmer_sketch -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m K=10 WRITE_OUTPUTS=0 scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_shortk_cap_k10_1k_20260425` passed after adding the BBTools short-kmer universe cap for non-prefiltered bounded sketches. For `k<32`, Rust now caps the main count-min table to at most `4^k` possible short kmers when no prefilter is active, while preserving requested/partitioned sizing when prefiltering is active. The guarded `k=10` human slice processed 2,000 reads / 300,000 bases in 0.050788s with 33,088 KB peak RSS, kept 306 reads, tossed 1,694 reads, and wrote only histogram artifacts.
- `cargo fmt --all --check`, `cargo test countup_auto_memory_budget_halves_filter_bytes_like_bbnorm -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_countup_auto_half_1k_20260425` passed after aligning automatic bounded sketch memory with BBTools `COUNTUP ? mem/2 : mem` sizing. Auto-derived count-min budgets now halve only for `countup=t`, preventing the input and kept-count sketches from each claiming the full usable table budget while leaving explicit `cells=`/`sketchmemory=` user budgets untouched; the guarded human count-up smoke processed 1,380 reads / 207,000 bases in 0.151967s with 231,648 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test countup_kept_count -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t prefilter=t' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_countup_prefilter_kcaup_bits_1k_20260425` passed after porting BBTools' dedicated count-up kept-count table sizing. `countup=t` bounded output counts now use the Java `kcaup` shape: three hashes, 4/8/16-bit cells selected from the 0.95-adjusted target depth, no inherited prefilter partition or short-kmer cap, and the next KCountArray mask seed after the input main/prefilter tables; the guarded human count-up+prefilter smoke processed 1,380 reads / 207,000 bases in 3.430945s with 273,788 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test multipass_caps_wide_count_min_bits_like_bbnorm -- --test-threads=1`, `cargo test multipass -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test bounded_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_multipass_cbits16_1k_20260425` passed after porting BBTools' `cbits>16 && passes>1` cap into Rust multipass orchestration. Default and explicit wide multipass bounded sketches now use 16-bit cells like Java, while single-pass and explicitly narrower bit widths are preserved; the guarded human multipass smoke processed 2,000 reads / 300,000 bases in 0.101347s with 253,544 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test multipass_intermediate_pass_uses_bits1_like_bbnorm -- --test-threads=1`, `cargo test accepts_default_equivalent_sketch_controls_as_noops -- --test-threads=1`, `cargo test multipass -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test bits1 -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='bits1=8' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_multipass_bits1_8_1k_20260425` passed after turning BBTools `bits1`/`cbits1`/`cellbits1` into real Rust intermediate-pass bounded-sketch controls. Multipass intermediate passes now use `bits1` when supplied, otherwise they inherit the Java-capped main cell width, while the final pass keeps the main bounded-sketch width; the guarded human `bits1=8` multipass smoke processed 2,000 reads / 300,000 bases in 0.101012s with 253,940 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test prefilter_default_hashes_track_main_hashes_like_bbnorm -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test hashes -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t hashes=8' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prefilter_hashes8_default_prehashes_1k_20260425` passed after porting BBTools' derived prefilter-hash default. When `prehashes`/`prefilterhashes` is unset, Rust now uses `(hashes+1)/2` for the prefilter table just like `KmerNormalize`, so `prefilter=t hashes=8` builds a four-hash prefilter while explicit prefilter hash counts still override it; the guarded human smoke processed 2,000 reads / 300,000 bases in 0.505495s with 286,432 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test accepts_prefilter_controls_with_constrained_sketch_settings -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t hashes=8 prehashes=0' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prefilter_prehashes0_1k_20260425` passed after matching BBTools `prehashes=0` parsing. Rust now accepts zero prefilter hashes as "leave prefilter hashes unset" rather than rejecting the command; `prehashes=0` alone does not force a prefilter table, while `prefilter=t prehashes=0` derives the Java default from the main hash count. The guarded human smoke processed 2,000 reads / 300,000 bases in 0.504262s with 279,228 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test nondefault_kcountarray_mask_seeds_are_cached -- --test-threads=1`, `cargo test countup_kept_count_sketch_uses_next_mask_seed_after_prefilter_and_main -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t prefilter=t' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_countup_prefilter_mask_cache_1k_20260425` passed after caching nondefault KCountArray7MTA mask seeds. Count-up output tables after prefilter/main use seed 14; Rust now builds that BBTools FastRandomXoshiro mask table once per seed instead of leaking a fresh table on every k-mer lookup, and the guarded human count-up+prefilter smoke completed in 0.505509s with 273,492 KB peak RSS.
- `cargo fmt --all --check`, `cargo test countup_prefilter_mask_seed_uses_dedicated_hot_cache -- --test-threads=1`, `cargo test nondefault_kcountarray_mask_seeds_are_cached -- --test-threads=1`, `cargo test countup_kept_count_sketch_uses_next_mask_seed_after_prefilter_and_main -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t prefilter=t' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_countup_prefilter_seed14_hotcache_1k_20260425` passed after moving the common count-up+prefilter seed-14 KCountArray mask table onto a dedicated lock-free `OnceLock` cache. This keeps the bounded count-up output hot path from taking the generic mask-cache mutex on every k-mer lookup; the guarded human smoke completed in 0.503999s with 268,196 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test kcount_layout_carries_resolved_mask_table_for_bucket_fills -- --test-threads=1`, `cargo test incremental_count_min_buckets_match_row_hash_replay -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t prefilter=t' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_countup_prefilter_layout_masks_1k_20260425` passed after moving resolved KCountArray mask-table pointers into `KCountArrayLayout`. Bounded sketch bucket fills now use the already-resolved Java mask table directly instead of resolving it from the seed on every k-mer read/update; the guarded human count-up+prefilter smoke completed in 0.506280s with 273,300 KB peak RSS and no FASTQ outputs.
- `cargo fmt --all --check`, `cargo test accepts_prefilter_controls_with_constrained_sketch_settings -- --test-threads=1`, `cargo test zero_prefilter_fraction_does_not_force_prefilter_sketch -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilterfraction=0' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prefilterfraction0_1k_20260425`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t precells=0' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prefilter_precells0_1k_20260425` passed after aligning zero-valued prefilter sizing controls with BBTools. `prefilterfraction=0` now disables fraction-derived prefilter sizing instead of accidentally forcing a tiny prefilter path, while `precells=0` leaves explicit cells unset so bare commands stay direct and `prefilter=t precells=0` still derives the default prefilter partition; guarded human smokes completed in 0.101250s / 250,588 KB RSS and 0.507046s / 281,276 KB RSS respectively.
- `cargo fmt --all --check`, `cargo test accepts_prefilter_controls_with_constrained_sketch_settings -- --test-threads=1`, `cargo test explicit_prefilter_hashes_enable_default_partition_like_bbnorm -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prehashes=1' TIMEOUT=20m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prehashes1_default_partition_1k_20260425` passed after making explicit nonzero `prehashes`/`prefilterhashes` enable the prefilter path like BBTools. Rust now applies the same default 35% prefilter/main memory partition for `prehashes=1` instead of adding an unpartitioned default-size side table; the guarded human smoke completed in 0.456681s with 281,688 KB peak RSS and no FASTQ outputs.
- `cargo test accepts_prefilter_controls_with_constrained_sketch_settings -- --nocapture`, `cargo test forced_off_prefilter_ignores_lingering_controls_like_bbnorm -- --nocapture`, `cargo test`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `READS=1000 TABLE_READS=1000 THREADS=8 MEM=512m TIMEOUT=3m WRITE_OUTPUTS=0 EXTRA_ARGS='prehashes=1 prefilter=f' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prefilter_forced_off_order_1k_20260425` passed after adding an explicit prefilter forced-off parser state. Rust now preserves BBTools command-order semantics: nonzero `prehashes`/`prefiltercells` can enable prefiltering, later `prefilter=f` disables sketch construction even though earlier values remain available, and later prefilter controls can re-enable it. The guarded human smoke processed 2,000 reads / 300,000 bases in 0.101330s with 252,516 KB peak RSS and no FASTQ outputs.
- `cargo test parses_bare_boolean_flags_like_bbnorm -- --nocapture`, `cargo fmt --all`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test`, `cargo fmt --all --check`, and `READS=1000 TABLE_READS=1000 THREADS=8 MEM=512m TIMEOUT=3m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter' scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_bare_prefilter_1k_20260425` passed after adding BBTools-style bare boolean flag parsing for high-impact normalization/sketch/ECC/runtime switches. Documented commands such as `bbnorm.sh ... prefilter` now parse as `prefilter=t` instead of being mistaken for positional input, while unknown bare tokens remain positional. The guarded human smoke processed 2,000 reads / 300,000 bases in 0.504638s with 286,196 KB peak RSS and no FASTQ outputs.
- `cargo test accepts_prefilter_controls_with_constrained_sketch_settings -- --test-threads=1`, `cargo test parses_bare_boolean_flags_like_bbnorm -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test`, and `READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter pbits=4' TIMEOUT=3m scripts/benchmark_giant_safe.sh tmp/human_benchmark_8threads/human_GRCh38_500k_R1.fq.gz tmp/human_benchmark_8threads/human_GRCh38_500k_R2.fq.gz tmp/giant_safe_prefilter_pbits4_1k_20260425` passed after adding the documented BBTools `pbits` alias for prefilter cell width. Rust now accepts `prefilterbits`, `prebits`, and `pbits` on the same bounded prefilter path; the guarded human smoke processed 2,000 reads / 300,000 bases in 0.303854s with 284,264 KB peak RSS and no FASTQ outputs.
- `bash -n scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh`, tiny paired FASTQ RSS-guard smokes, an intentional `MAX_RSS_KB=1` tripwire run, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test` passed after adding benchmark RSS circuit breakers. `scripts/benchmark_giant_safe.sh` now accepts `MAX_RSS_KB` and writes `rss_guard.tsv`, while `scripts/benchmark_java_rust_human.sh` accepts shared `MAX_RSS_KB` plus `JAVA_MAX_RSS_KB`/`RUST_MAX_RSS_KB`; exceeding the guard exits 125 before follow-on work can keep stressing the workstation. The passing tiny Rust-only guard recorded 15,184 KB RSS, the intentional failure recorded 15,420 KB RSS against a 1 KB cap, and the tiny Java/Rust guard recorded 204,900 KB for Java and 15,424 KB for Rust.
- `python3 -m py_compile scripts/measure_command.py`, `bash -n scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh`, direct `measure_command.py` RSS tripwires, guarded tiny paired FASTQ smokes, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all` passed after turning the post-run RSS guard into a live process-tree circuit breaker. `scripts/measure_command.py` now accepts `--max-rss-kb`, terminates the command process group as soon as sampled RSS exceeds the cap, returns exit 125, and annotates stderr with `RSS guard exceeded`; `scripts/benchmark_giant_safe.sh` and `scripts/benchmark_java_rust_human.sh` pass the cap through before large Java/Rust runs begin. Intentional 1 KB guard trips completed safely under `tmp/rss_guard_giant_live_fail_20260425` and `tmp/rss_guard_java_rust_live_fail_20260425`, while high-cap controls completed under `tmp/rss_guard_giant_live_pass_20260425` and `tmp/rss_guard_java_rust_live_pass_20260425` with Java/Rust tiny hist/rhist still identical.
- `python3 -m py_compile scripts/measure_command.py`, `bash -n scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh`, tiny paired FASTQ harness smokes, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all` passed after tightening benchmark RSS-guard semantics and summaries. `MAX_RSS_KB=0/off/none/unlimited` now consistently means no guard in both giant-safe and Java/Rust benchmark scripts, invalid nonnumeric caps fail before launching the benchmark, and `benchmark_giant_safe.sh` keeps `timed_out`, `rss_guard_limit_kb`, and `rss_guard_exceeded` in `time_summary.tsv` even when repo-owned metrics are present. Guard artifacts: `tmp/rss_summary_giant_unlimited_zero_20260425` records unlimited/false with no `rss_guard.tsv`, `tmp/rss_summary_giant_live_fail_20260425` records `rss_guard_exceeded=true` and exits 125, `tmp/rss_summary_java_rust_unlimited_zero_20260425` preserves Java/Rust tiny hist/rhist identity with zero/unlimited caps, and `tmp/rss_summary_java_rust_live_fail_20260425` still kills the Rust leg at the live RSS tripwire.
- `bash -n scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, a tiny paired FASTQ Java/Rust mode-smoke, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all`, and a guarded human-slice Java/Rust comparison passed after adding benchmark mode-argument forwarding to `scripts/benchmark_java_rust_human.sh`. The harness now accepts shared `EXTRA_ARGS` plus `JAVA_EXTRA_ARGS`/`RUST_EXTRA_ARGS`, records those knobs in `environment.tsv`, and appends them before fixed hist/rhist/null-output artifact paths so stable modes can be compared without editing the script. Tiny `EXTRA_ARGS='prefilter=t' RUST_EXTRA_ARGS='autocountmin=f'` validation completed under `tmp/java_rust_extra_args_prefilter_20260425` with identical hist/rhist. A guarded human slice using `EXTRA_ARGS='k=40 fixspikes=t' READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1000000` completed under `tmp/java_rust_human_k40_fixspikes_extraargs_1k_20260425`: Java 0.713815s / 3,436,036 KB RSS, Rust 0.101174s / 244,156 KB RSS, with identical `hist` and `rhist`.
- `bash -n scripts/benchmark_java_rust_modes.sh`, tiny paired and single-end Java/Rust mode smokes, a guarded three-mode human-slice matrix, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all` passed after adding `scripts/benchmark_java_rust_modes.sh`. The wrapper runs stable mode cases through `scripts/benchmark_java_rust_human.sh`, supports custom `MODE_ARGS_<name>` definitions, records `config.tsv`, emits a combined `summary.tsv`, and handles single-end inputs without accidentally treating the output directory as read 2. Tiny paired validation under `tmp/java_rust_modes_tiny_20260425_040022` covered `default`, `prefilter`, `k40`, and custom `prefilter=t hashes=4`; all modes had status 0 and identical Java/Rust `hist`/`rhist`. A guarded human slice using `MODE_CASES='default prefilter k40_fixspikes' READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1000000` completed under `tmp/java_rust_modes_human_1k_20260425_040053`: default Java 0.666584s / 3,419,816 KB RSS and Rust 0.101598s / 251,440 KB RSS; prefilter Java 0.969149s / 3,500,236 KB RSS and Rust 0.505310s / 285,656 KB RSS; k40+fixspikes Java 0.819813s / 3,430,128 KB RSS and Rust 0.101043s / 245,396 KB RSS. All three human modes produced identical Java/Rust `hist` and `rhist`.
- `bash -n scripts/benchmark_java_rust_modes.sh`, a tiny pass matrix, a tiny intentional Rust RSS-guard failure matrix, a guarded two-mode human multipass matrix, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all` passed after hardening `scripts/benchmark_java_rust_modes.sh` so mode failures or non-identical `hist`/`rhist` comparisons make the wrapper exit nonzero by default after writing `summary.tsv`. Exploratory runs can still set `ALLOW_MODE_FAILURES=1`, and approximate-sketch sweeps can set `REQUIRE_IDENTICAL_COMPARISONS=0`. The negative tiny run under `tmp/java_rust_modes_gate_tiny_fail_20260425_041607` tripped `RUST_MAX_RSS_KB=1`, recorded a status-1 wrapper exit, and preserved the failure summary. The new guarded human multipass matrix under `tmp/java_rust_modes_human_multipass_1k_20260425_041625` matched Java `hist`/`rhist` exactly for `passes=2` and `passes=2 ecc=t markuncorrectableerrors=t`; `passes=2` ran Java 0.921154s / 3,985,304 KB RSS vs Rust 0.253781s / 254,132 KB RSS, and the ECC+mark mode ran Java 1.273141s / 4,093,396 KB RSS vs Rust 0.252486s / 255,636 KB RSS.
- `bash -n scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh`, a tiny expected-Java-failure matrix, a guarded human count-up expected-failure matrix, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all` passed after teaching the Java/Rust harnesses to continue the Rust leg for known Java-failing modes. `scripts/benchmark_java_rust_human.sh` now accepts `ALLOW_JAVA_FAILURE=1`, records it in `environment.tsv`, and runs Rust even if the Java leg exits nonzero; `scripts/benchmark_java_rust_modes.sh` now accepts `EXPECTED_FAILURE_MODES`, forwards `ALLOW_JAVA_FAILURE=1` only for those modes, and adds `expected_failure`, `java_status`, and `rust_status` columns to `summary.tsv` so expected Java failures do not mask Rust failures. The tiny control under `tmp/java_rust_modes_expected_java_fail_tiny_20260425_043140` forced Java to exceed `JAVA_MAX_RSS_KB=1`, then verified Rust still completed. The guarded human `countup=t` probe under `tmp/java_rust_modes_human_countup_expected_1k_20260425_043151` recorded Java status 1 after `BBNorm terminated in an error state` while Rust completed successfully in 0.304005s / 235,484 KB RSS, processing 1,380 reads / 207,000 bases with no FASTQ outputs.
- `bash -n scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh` and `WRITE_OUTPUTS=1` tiny paired Java/Rust validation passed after making `scripts/benchmark_java_rust_human.sh` compare sequence-output payloads rather than raw gzip bytes. The harness now streams `.fq.gz` outputs through Python gzip before comparison, avoiding false differences from Java/Rust compression metadata or block layout while keeping memory bounded; the control under `tmp/java_rust_outputs_tiny_20260425_044833` reported identical Java/Rust `hist`, `rhist`, keep1/2, and toss1/2 outputs.
- `cargo test countup_prepass_requires_both_mates_bad_like_java -- --nocapture` passed after aligning Rust count-up prepass with Java's `REQUIRE_BOTH_BAD=(rbb || COUNTUP)` setup. Rust now preserves the paired “both mates must be bad” toss semantics during the relaxed count-up prepass even though the prepass temporarily disables `count_up`, preventing a single error-marked mate from dropping an otherwise usable pair before sorted count-up selection.
- `cargo test countup_ -- --nocapture` passed after adding Java-shaped final count-up `tossbadreads=t` handling. Rust count-up now computes the same sorted input-depth spike error count used by BBTools and applies the two post-keep `TOSS_ERROR_READS` rejection rules, while `keepall=t` still overrides the toss.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all`, and a guarded Rust-only human-slice count-up smoke passed after the count-up `tossbadreads=t` change. The smoke (`READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t tossbadreads=t' MAX_RSS_KB=1000000 scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_tossbadreads_1k_20260425_050747`) completed in 0.252806s with 232,128 KB max RSS, processed 156 reads / 23,400 bases after count-up prepass filtering, kept 30 reads, tossed 126 reads, and wrote only histogram artifacts.
- `bash -n scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh` and a guarded human expected-Java-failure matrix passed after adding `countup_tossbadreads` to `scripts/benchmark_java_rust_modes.sh` and teaching the Java/Rust harness to mark comparisons as `skipped_java_failed` when Java exits nonzero. The run (`EXPECTED_FAILURE_MODES='countup_tossbadreads' MODE_CASES='countup_tossbadreads' READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m TIMEOUT=3m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_countup_tossbadreads_expected_1k_20260425_052125`) recorded Java status 1, Rust status 0, Rust 0.253745s / 235,576 KB RSS, 156 processed reads / 23,400 bases, 30 kept reads, 126 tossed reads, and skipped Java/Rust hist/rhist comparison because Java did not complete.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh` and a two-mode guarded human matrix passed after adding first-class `countup_prefilter` and `countup_prefilter_tossbadreads` modes. The run (`EXPECTED_FAILURE_MODES='countup_prefilter countup_prefilter_tossbadreads' MODE_CASES='countup_prefilter countup_prefilter_tossbadreads' READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m TIMEOUT=3m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_countup_prefilter_expected_1k_20260425_053609`) recorded Java status 1 and Rust status 0 for both expected-failure modes. Rust `countup=t prefilter=t` completed in 0.453259s / 270,520 KB RSS with 1,380 processed reads, 30 kept, and 1,350 tossed; Rust `countup=t prefilter=t tossbadreads=t` completed in 0.453595s / 268,980 KB RSS with 156 processed reads, 30 kept, and 126 tossed; comparisons were marked `skipped_java_failed` because Java did not complete these modes.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all`, and a guarded prefilter human slice passed after surfacing BBTools-style prefilter/main unique-kmer split estimates in `RunSummary` and CLI stderr. Rust now reports the combined approximate input cardinality plus `depth 1-N` prefilter-only and `depth N+` main-table estimates without changing normalization decisions. The guarded run (`READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='prefilter=t' TIMEOUT=3m MAX_RSS_KB=1000000 scripts/benchmark_giant_safe.sh ... tmp/giant_safe_prefilter_unique_split_1k_20260425_060528`) processed 2,000 reads / 300,000 bases in 0.807058s with 285,188 KB max RSS and reported 182,060 total estimated input kmers split into 181,437 at depth 1-3 and 623 at depth 4+.
- `python3 -m py_compile scripts/extract_unique_kmer_summary.py`, `bash -n scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all`, and a guarded Java/Rust prefilter human-slice matrix passed after adding machine-readable unique-kmer estimate extraction to the benchmark harnesses. `scripts/benchmark_giant_safe.sh` and `scripts/benchmark_java_rust_human.sh` now write `unique_kmers.tsv`, and `scripts/benchmark_java_rust_modes.sh` promotes Java/Rust total, low-depth, high-depth, and delta estimates into `summary.tsv` without making approximate cardinality deltas fail otherwise identical hist/rhist runs. The guarded matrix (`MODE_CASES='prefilter' READS=1000 TABLE_READS=1000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m TIMEOUT=3m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1000000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_prefilter_unique_summary_fixed_1k_20260425_063629`) matched Java/Rust `hist` and `rhist`, with Java 0.923507s / 3,502,856 KB RSS and Rust 0.808077s / 284,164 KB RSS; the unique summary recorded Java 182,068 total / 181,445 depth 1-3 / 623 depth 4+ versus Rust 182,060 total / 181,437 depth 1-3 / 623 depth 4+.
- `python3 -m py_compile scripts/extract_unique_kmer_summary.py`, `bash -n scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh`, and a guarded exploratory 10k human long-kmer matrix passed after extending the benchmark unique summary with histogram-derived raw/unique kmer totals plus read-depth totals. This exposed an important reporting distinction for `k=40 fixspikes=t`: Java printed a low approximate unique estimate while its own histogram unique count stayed aligned with Rust. The guarded run (`MODE_CASES='k40_fixspikes' REQUIRE_IDENTICAL_COMPARISONS=0 READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m TIMEOUT=5m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1500000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_k40_histunique_10k_20260425_065119`) completed with Java 1.024846s / 3,464,760 KB RSS and Rust 0.605572s / 370,872 KB RSS; Java/Rust printed unique estimates differed by 1,047,862, but histogram unique totals were 2,098,272 versus 2,098,279, only +7 for Rust over ~2.1M unique kmers.
- `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, `bash -n scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh`, and a guarded 10k human default-mode matrix passed after adding `histogram_diffs.tsv` to Java/Rust harness runs and promoting absolute histogram/read-depth drift columns into mode summaries. This makes approximate bounded-sketch runs debuggable without requiring byte-identical histograms. The guarded run (`MODE_CASES='default' REQUIRE_IDENTICAL_COMPARISONS=0 READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m TIMEOUT=5m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1500000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_histdiff_default_10k_20260425_070511`) completed with Java 1.069643s / 3,444,968 KB RSS and Rust 0.706808s / 526,408 KB RSS; histograms were non-identical but quantified as 1,574 raw-kmer absolute drift over 2.4M raw kmers, 1,179 unique-kmer absolute drift, and rhist drift of 4 reads / 600 bases.
- `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and guarded pass/fail 10k human default-mode matrices passed after adding optional drift gates to `scripts/benchmark_java_rust_modes.sh`. Approximate sweeps can now set `REQUIRE_IDENTICAL_COMPARISONS=0` plus `MAX_HIST_ABS_RAW_DELTA`, `MAX_HIST_ABS_UNIQUE_DELTA`, `MAX_RHIST_ABS_READS_DELTA`, and/or `MAX_RHIST_ABS_BASES_DELTA` to fail only when quantified drift exceeds explicit caps. The passing guarded run (`MAX_HIST_ABS_RAW_DELTA=2000 MAX_HIST_ABS_UNIQUE_DELTA=1500 MAX_RHIST_ABS_READS_DELTA=5 MAX_RHIST_ABS_BASES_DELTA=700 ... tmp/java_rust_modes_human_drift_gate_pass_10k_20260425_071806`) completed with Java 0.916878s / 3,451,000 KB RSS and Rust 0.657729s / 488,696 KB RSS and `drift_gate=ok`; the intentional tripwire (`MAX_HIST_ABS_RAW_DELTA=1 ... tmp/java_rust_modes_human_drift_gate_fail_10k_20260425_071822`) exited nonzero with `drift_gate=fail` and reason `hist_abs_raw_delta>1`.
- `python3 -m py_compile scripts/compare_histogram_tables.py scripts/extract_unique_kmer_summary.py`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and guarded pass/fail 10k human default-mode matrices passed after adding scale-aware parts-per-million drift reporting and gates. `histogram_diffs.tsv` now includes `col2_abs_delta_ppm` and `col3_abs_delta_ppm`, mode summaries expose `hist_raw_delta_ppm`, `hist_unique_delta_ppm`, `rhist_reads_delta_ppm`, and `rhist_bases_delta_ppm`, and approximate sweeps can fail on `MAX_HIST_RAW_DELTA_PPM`, `MAX_HIST_UNIQUE_DELTA_PPM`, `MAX_RHIST_READS_DELTA_PPM`, and/or `MAX_RHIST_BASES_DELTA_PPM`. The passing guarded run (`MAX_HIST_RAW_DELTA_PPM=700 MAX_HIST_UNIQUE_DELTA_PPM=600 MAX_RHIST_READS_DELTA_PPM=250 MAX_RHIST_BASES_DELTA_PPM=250 ... tmp/java_rust_modes_human_ppm_drift_gate_pass_10k_20260425_073231`) recorded Java 0.917199s / 3,452,516 KB RSS, Rust 0.605830s / 494,388 KB RSS, and `drift_gate=ok`; the intentional tripwire (`MAX_HIST_RAW_DELTA_PPM=10 ... tmp/java_rust_modes_human_ppm_drift_gate_fail_10k_20260425_073247`) exited nonzero with `drift_gate=fail` and reason `hist_raw_delta_ppm>10`.
- `bash -n scripts/benchmark_java_rust_modes.sh`, fake-harness pass/fail profile tripwires, and a guarded four-mode 10k human slice passed after adding `DRIFT_GATE_PROFILE` presets to `scripts/benchmark_java_rust_modes.sh`. `DRIFT_GATE_PROFILE=bounded` now opts approximate-sketch sweeps into `REQUIRE_IDENTICAL_COMPARISONS=0` plus broad ppm caps (`MAX_HIST_RAW_DELTA_PPM=5000`, `MAX_HIST_UNIQUE_DELTA_PPM=5000`, `MAX_RHIST_READS_DELTA_PPM=2000`, `MAX_RHIST_BASES_DELTA_PPM=2000`) unless the caller overrides individual caps; `DRIFT_GATE_PROFILE=strict10k` applies the tighter 10k default-mode regression caps from the prior run. The real guarded matrix (`DRIFT_GATE_PROFILE=bounded MODE_CASES='default prefilter k40_fixspikes passes2' READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m TIMEOUT=6m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1500000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_bounded_profile_10k_20260425_074722`) recorded `drift_gate=ok` for all modes: default Java 0.969776s / 3,498,636 KB RSS vs Rust 0.656455s / 492,164 KB RSS with hist raw/unique ppm 656/528 and rhist reads/bases ppm 200/200; prefilter Java 1.230379s / 3,505,668 KB RSS vs Rust 1.766079s / 566,004 KB RSS with ppm 126/102 and 200/200; k40+fixspikes Java 1.024658s / 3,464,564 KB RSS vs Rust 0.555604s / 378,464 KB RSS with ppm 11/8 and 100/100; passes2 Java 1.230056s / 4,089,776 KB RSS vs Rust 0.907711s / 491,972 KB RSS with ppm 86/69 and 0/0.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh`, fake-harness profile expansion checks, a direct `SKIP_JAVA=1` harness smoke, and a guarded 10k human production profile passed after adding mode-level benchmark profiles and safe Java skipping for known-broken count-up comparisons. `MODE_PROFILE=bounded_core` now expands to the stable comparable approximate modes and defaults `DRIFT_GATE_PROFILE=bounded`; `MODE_PROFILE=countup_expected` expands to the count-up modes, marks them expected Java failures, and skips Java by default; `MODE_PROFILE=production_probe` combines both, so giant-safe probes run Java only for stable comparable modes and go directly to Rust for count-up. The lower Java/Rust harness now accepts `SKIP_JAVA=1`, records synthetic Java status 126, marks comparisons `skipped_java_failed`, and still emits Rust timing/RSS/unique-kmer artifacts. The real guarded production run (`MODE_PROFILE=production_probe READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=512m TIMEOUT=8m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=1500000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_production_profile_skipjava_fixed_10k_20260425_080958`) completed all eight modes: comparable default/prefilter/k40+fixspikes/passes2 rows had `drift_gate=ok`, and countup/countup_prefilter/countup_tossbadreads/countup_prefilter_tossbadreads rows had Java status 126, Rust status 0, `drift_gate=skipped_java_failed`, and Rust RSS between 400,832 and 472,432 KB.
- A larger guarded 50k human production-profile probe (`MODE_PROFILE=production_probe READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 JAVA_XMX=4g MEM=768m TIMEOUT=12m WRITE_OUTPUTS=0 JAVA_MAX_RSS_KB=8000000 RUST_MAX_RSS_KB=2500000 scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_human_production_profile_skipjava_50k_20260425_082503`) completed all eight rows and intentionally tripped the broad bounded drift gate only on default non-prefilter mode. Default completed Java 1.735437s / 3,524,640 KB RSS and Rust 3.128541s / 802,148 KB RSS, but failed with hist raw/unique ppm 10,613/8,532, marking default bounded-sketch drift as the next tuning target. Prefilter stayed within gate with Java 2.202508s / 3,501,900 KB RSS versus Rust 6.708396s / 803,404 KB RSS and hist ppm 874/705; k40+fixspikes stayed within gate at 273/205 ppm; passes2 stayed within gate at 1,589/1,276 ppm. Rust-only skipped-Java countup modes all completed under the 2,500,000 KB guard: countup 4.792380s / 629,796 KB, countup_prefilter 7.570959s / 671,908 KB, countup_tossbadreads 4.439582s / 662,172 KB, and countup_prefilter_tossbadreads 7.267240s / 697,968 KB.
- `cargo fmt --all --check`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `cargo test input_count_layout_summary_reports_prefilter_and_main_tables -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all`, and guarded Java/Rust sample benchmark smokes passed after adding machine-readable bounded-sketch layout reporting. Rust stderr now emits one `Sketch layout:` row per bounded table with table role, packed/atomic kind, cells, hashes, cell width, KCountArray shard count, cells per shard, mask seed, update mode, max cell value, data bytes, and prefilter limit where applicable; `scripts/extract_unique_kmer_summary.py` now parses both Rust layout rows and Java `Made hash table`/`Made prefilter` lines into `sketch_tables`, `sketch_total_cells`, `sketch_memory_bytes`, and full layout strings, and `scripts/benchmark_java_rust_modes.sh` promotes Java/Rust sketch cell and byte totals into `summary.tsv`. The forced Rust-sketch sample validation under `tmp/java_rust_modes_sample_forced_rust_layout_20260425` kept Java/Rust `hist` and `rhist` identical while recording Java's 169,930,000 cells / 679,707,935 bytes versus Rust's 3,999,986 cells / 16,015,936 bytes, giving the next default-drift tuning loop direct table-geometry evidence instead of only RSS and `mem=` guesses.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and guarded Java/Rust sample geometry smokes passed after adding benchmark sketch-geometry ratio reporting and optional gates. `scripts/benchmark_java_rust_modes.sh` now records `sketch_cell_ratio_ppm` and `sketch_memory_ratio_ppm` in `summary.tsv`, writes the corresponding minimum-ratio controls to `config.tsv`, and can fail drift gates with `MIN_RUST_JAVA_SKETCH_CELL_PPM` or `MIN_RUST_JAVA_SKETCH_MEMORY_PPM` when Rust is too underprovisioned relative to Java's emitted KCountArray geometry. The passing forced-sketch sample under `tmp/java_rust_modes_sample_geometry_ratio_20260425` recorded identical Java/Rust `hist` and `rhist` with Rust at 23,539 ppm of Java's sketch cells and 23,563 ppm of Java's sketch bytes; the intentional gate run under `tmp/java_rust_modes_sample_geometry_gate_fail_20260425` exited 1 with `drift_gate=fail` and reason `sketch_cell_ratio_ppm<900000`, proving future large human probes can distinguish true algorithmic drift from deliberate memory-budget mismatch.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, guarded Java/Rust sample geometry classification, and a single guarded 50k human default-mode geometry run passed after adding benchmark drift classification. `scripts/benchmark_java_rust_modes.sh` now records `SKETCH_UNDERPROVISIONED_PPM` in `config.tsv` and adds `drift_classification` to `summary.tsv`, labeling low Rust-vs-Java sketch geometry as `underprovisioned_sketch_ok` when outputs still match or `underprovisioned_sketch_drift` when quantified histogram drift also fails. The tiny forced-sketch validation under `tmp/java_rust_modes_sample_geometry_classification_20260425` remained Java/Rust `hist`/`rhist` identical and classified as `underprovisioned_sketch_ok` at ~23,500 ppm of Java's table. The 50k human default probe under `tmp/java_rust_modes_human_default_geometry_classified_50k_20260425` completed with Java 1.580865s / 3,529,984 KB RSS and Rust 2.823897s / 810,780 KB RSS, but classified the bounded drift as `underprovisioned_sketch_drift`: Rust used 103,765,544 cells / 415,078,168 bytes versus Java's 757,190,000 cells / 3,027,951,944 bytes (137,040 cell ppm / 137,082 memory ppm), while hist raw/unique drift remained 10,613/8,532 ppm.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and a guarded Java/Rust sample benchmark passed after adding Java-geometry-derived Rust memory recommendations to the mode harness. `scripts/benchmark_java_rust_modes.sh` now appends `rust_mem_for_java_sketch_bytes` and `rust_mem_for_java_sketch` to `summary.tsv`, using the inverse of BBTools' `max((memory-96000000)*0.73, memory*0.45)` filter-byte sizing to suggest the Rust `mem=` needed for comparable table geometry. The safe sample artifact `tmp/java_rust_modes_sample_mem_recommendation_20260425` stayed Java/Rust `hist`/`rhist` identical, classified as `underprovisioned_sketch_ok`, and recommended `mem=980m` to match Java's 679,707,935-byte table versus the deliberately tiny Rust 16,015,936-byte sketch. Applying the same calculation to the 50k human default Java table from `tmp/java_rust_modes_human_default_geometry_classified_50k_20260425` suggests roughly `mem=4048m`, confirming that the earlier `mem=768m` run was intentionally far below Java-equivalent geometry rather than a clean sketch-parity test.
- `bash -n scripts/benchmark_java_rust_human.sh scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and guarded Java/Rust sample auto-memory smokes passed after adding optional Java-derived Rust memory autotuning to the lower benchmark harness. `scripts/benchmark_java_rust_human.sh` now accepts `RUST_MEM_AUTO_FROM_JAVA=1`, parses Java's emitted sketch bytes before launching Rust, appends a final `mem=<recommended>` override when under `RUST_MEM_AUTO_MAX_BYTES`, and records `rust_mem_auto_status`, Java sketch bytes, recommended bytes, recommended `mem=`, and final Rust command in `environment.tsv`. The cap-control artifact `tmp/java_rust_human_auto_mem_cap_skip_sample_20260425` recommended `980m` but correctly skipped applying it under a 20 MB cap. The applied auto-table artifact `tmp/java_rust_human_auto_mem_applied_auto_table_sample_20260425` used Java's 288,022,856-byte table to append `mem=468m`, producing a Rust 246,410,128-byte bounded table with identical Java/Rust `hist` and `rhist` while staying under the 1,000,000 KB Rust RSS guard.
- `cargo fmt --all --check`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after splitting Rust stage telemetry for input counting into `input_exact_counting`, `input_prefilter_counting`, and `input_main_counting` while preserving aggregate `input_counting`. The tiny prefilter sample under `tmp/java_rust_modes_stage_detail_prefilter_sample_20260425` stayed Java/Rust `hist` and `rhist` identical and showed Rust split timing of 0.013184s prefilter, 0.015207s main, and 0.452368s summary scan. The guarded 50k human prefilter probe under `tmp/java_rust_modes_human_prefilter_stage_detail_50k_20260425` auto-sized Rust to Java-equivalent sketch geometry (`mem=4344m`, 3,025,556,576 Rust sketch bytes versus 3,025,498,276 Java bytes), completed with Java 2.144598s / 3,655,584 KB RSS and Rust 6.454518s / 3,389,660 KB RSS, passed the bounded drift gate, and isolated Rust time as 2.428338s in prefilter counting, 1.755222s in main counting, 1.424879s in summary-count scanning, 0.259472s in input hist, 0.239313s in input rhist, and 0.288588s in normalization. The next performance target is therefore a parallel/atomic compact prefilter update path or equivalent KCountArray-shaped prefilter replay, followed by further summary-scan reduction.
- `cargo fmt --all`, `cargo fmt --all --check`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after moving the prefilter gate into per-thread k-mer collection for prefiltered main-count passes. This preserves the previous deterministic semantics by matching the old collect-then-retain result, but avoids inserting low-depth prefilter-rejected kmers into temporary chunk maps. The sample validation under `tmp/java_rust_modes_prefilter_earlygate_sample_20260425` kept Java/Rust `hist` and `rhist` identical and reduced sample Rust main counting from the prior 0.015207s to 0.011908s. The guarded 50k human prefilter probe under `tmp/java_rust_modes_human_prefilter_earlygate_50k_20260425` auto-sized Rust to Java-equivalent sketch geometry (`mem=4344m`, 3,025,556,576 Rust sketch bytes versus 3,025,498,276 Java bytes), passed the bounded drift gate, and improved Rust elapsed time from the prior 6.454518s / 3,389,660 KB RSS to 5.155066s / 3,274,504 KB RSS. Stage timing isolated the win: main counting dropped from 1.755222s to 0.393165s while prefilter counting remained the dominant 2.396684s bottleneck and summary-count scanning was 1.501501s.
- `cargo fmt --all --check`, `cargo test atomic_packed -- --test-threads=1`, `cargo test nondeterministic_input_prefilter_uses_atomic_packed_sketch -- --test-threads=1`, `cargo test prefilter_gate_during_collection_matches_post_retain -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after adding an opt-in atomic packed prefilter table for `deterministic=f`. Deterministic runs still use the stable packed prefilter path, while schedule-dependent runs can update the compact KCountArray-shaped prefilter directly with atomic packed words and key-striped conservative-update guards. The tiny sample validation under `tmp/java_rust_modes_atomicpacked_prefilter_detf_sample_20260425` stayed Java/Rust `hist` and `rhist` identical. The guarded 50k human prefilter probe under `tmp/java_rust_modes_human_atomicpacked_prefilter_detf_50k_20260425` auto-sized Rust to Java-equivalent sketch geometry, passed the bounded drift gate, and improved Rust elapsed time from the deterministic early-gate run's 5.155066s / 3,274,504 KB RSS to 3.375530s / 2,983,792 KB RSS. Stage timing isolated the win: input prefilter counting dropped from 2.396684s to 0.427172s, total input counting dropped from 2.789866s to 0.797440s, and main counting stayed fast at 0.370244s. The largest remaining prefilter-mode bottleneck is now summary-count scanning at 1.733508s, plus roughly 0.52s combined input `hist`/`rhist` scanning.
- `cargo fmt --all --check`, `cargo test combined_primary_histograms_match_separate_collectors -- --test-threads=1`, `cargo test atomic_packed_count_min_matches_packed_sequential_updates -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after removing the next prefilter-mode scanning bottlenecks. Atomic packed prefilters now track newly occupied cells with per-chunk aggregation, so `deterministic=f` unique-kmer summaries no longer scan the multi-GB prefilter table; `InputCounts` also computes total and split unique estimates in one call instead of asking the prefilter twice. Input `hist`+`rhist` requests now share a single read pass and coverage analysis. On the guarded 50k human prefilter probe with Java-equivalent Rust sketch geometry (`tmp/java_rust_modes_human_prefilter_combined_hist_50k_20260425`), Rust improved from 3.375530s / 2,983,792 KB RSS (`tmp/java_rust_modes_human_atomicpacked_prefilter_detf_50k_20260425`) to 1.416485s / 2,985,024 KB RSS while Java took 2.102958s / 3,676,352 KB RSS. Stage timing shows summary-count scanning dropped from 1.733508s to 0.029741s and the former separate `input_hist`+`input_rhist` scans collapsed from about 0.519s combined to one 0.265288s pass; bounded drift stayed `ok` with hist raw/unique drift 4 ppm and rhist drift 840 ppm.
- `cargo fmt --all --check`, `cargo test nondeterministic_atomic_count_min_direct_path_matches_sequential_without_collisions -- --test-threads=1`, `cargo test atomic_count_min_chunked_parallel_matches_sequential_conservative_bits32 -- --test-threads=1`, `cargo test atomic_count_min_unique_kmers_honors_min_depth_threshold -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after extending the occupied-cell fast path to the main 32-bit atomic count-min sketch and adding a direct atomic main-count path for `deterministic=f`. Default deterministic bounded runs now avoid the summary table scan, while opt-in schedule-dependent runs also bypass the per-chunk hash-map/replay overhead and update the KCountArray-shaped table directly with per-chunk occupied-cell aggregation. On the guarded 50k human default-mode probe with Java-equivalent Rust sketch geometry, `deterministic=f` improved Rust from 1.821203s / 3,367,436 KB RSS (`tmp/java_rust_modes_human_default_atomic_occupied_detf_50k_20260425`) to 1.109446s / 2,972,652 KB RSS (`tmp/java_rust_modes_human_default_direct_atomic_detf_50k_20260425`), while Java took 1.488941s / 3,506,552 KB RSS. Stage timing shows main counting dropped from 1.189727s to 0.525058s and summary counting stayed effectively zero at 0.000005s; bounded drift stayed `ok` with hist raw/unique drift 4/3 ppm and rhist drift 840 ppm. Deterministic default mode also now has near-zero summary counting (`tmp/java_rust_modes_human_default_atomic_occupied_50k_20260425`), but still uses the stable map/replay insertion path for reproducibility.
- `cargo fmt --all --check`, `cargo test nondeterministic_atomic_output_counts_direct_path_matches_sequential_without_collisions -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after moving bounded output-kept count updates onto the same direct atomic KCountArray-shaped path when `deterministic=f`. Deterministic output-count runs still use the stable map/replay path, while schedule-dependent runs avoid building per-chunk kept-kmer maps during normalization and aggregate occupied-cell deltas directly into the output sketch. A guarded 50k human paired-read `histout`/`rhistout` probe with Java-derived Rust memory sizing (`tmp/java_rust_human_default_output_direct_detf_50k_20260425`) completed with Java 1.685934s / 3,495,012 KB RSS and Rust 1.772257s / 5,935,516 KB RSS under the 6,000,000 KB Rust guard. Rust stage timing was `input_main_counting=0.554342s`, `normalize=0.411173s`, `output_hist=0.257709s`, and `output_rhist=0.261263s`; bounded input drift stayed tiny at hist raw/unique 4/3 ppm and rhist 840 ppm. A Rust-only deterministic A/B on the same slice (`tmp/rust_human_output_counts_dett_50k_20260425`) took 4.088450s / 6,163,432 KB RSS, confirming the nondeterministic direct atomic path is the fast production choice for comparable output-hist workloads. Java did not emit `histout`/`rhistout` with null sequence outputs because vendored `KmerNormalize` only computes output histograms when a real keep-output stream exists, so this benchmark intentionally treats Java as an input-hist/RSS/time comparator and Rust as the safe null-output output-hist stress target.
- `cargo fmt --all --check`, `cargo test output_count -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after making auto-sized output-kept sketches use a bounded side budget instead of duplicating the full input table. Explicit `cells=` and `sketchmemory=` still preserve the requested output sketch size, but automatic `mem=`/Java-derived sizing now gives `output_kept` 25% of the input filter bytes with a 64 MiB floor, and simultaneous output side tables use the next KCountArray mask seed after the input/prefilter tables. On the same guarded 50k human `histout`/`rhistout` stress (`tmp/java_rust_human_output_sidebudget_detf_50k_20260425`), Rust RSS dropped from 5,935,516 KB to 3,717,936 KB while wall time improved from 1.772257s to 1.664902s; Java took 1.581645s / 3,539,104 KB RSS. The emitted layouts show `input_main` at 757,115,432 cells / 3,028,477,728 bytes / mask seed 0 and `output_kept` at 189,278,744 cells / 757,130,976 bytes / mask seed 7. Input drift stayed within the previous noise envelope at hist raw/unique 3/3 ppm and rhist 840 ppm, while `output_hist` and `output_rhist` stages remained about 0.26s and 0.25s respectively. This closes the most immediate workstation-freeze risk from `histout`/`rhistout` null-output probes: output side reporting no longer doubles the Java-equivalent main-table allocation by default.
- `cargo fmt --all --check`, `cargo test combined_primary_histograms -- --test-threads=1`, `cargo test nondeterministic_atomic_output_counts_direct_path_matches_sequential_without_collisions -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after reusing the combined histogram/read-depth collector for output-side `histout`+`rhistout` generation. Standard normalization now scans the reads once for both output depth histograms when both are requested, including the keep-filtered path that replays input-depth normalization decisions; standalone `histout` or `rhistout` still use the single-purpose collectors. On the same guarded 50k human side-budget output stress (`tmp/java_rust_human_output_combined_hist_detf_50k_20260425`), Rust improved from 1.664902s / 3,717,936 KB RSS (`tmp/java_rust_human_output_sidebudget_detf_50k_20260425`) to 1.416148s / 3,716,560 KB RSS, while Java took 1.641200s / 3,515,436 KB RSS. Stage timing shows `output_hist=0.253653s` and `output_rhist=0.000000s`, confirming the second read scan was eliminated; input drift stayed tiny at hist raw/unique 3/2 ppm and rhist 840 ppm. This makes the safe null-output output-reporting path faster than Java on the 50k human probe while keeping RSS under the 4.5 GB guard.
- `cargo fmt --all --check`, `cargo test countup_writes_histout_and_peaksout -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after caching the count-up kept-count depth histogram when both `histout` and `peaksout` are requested. Count-up output reporting now calls `kept_counts.depth_hist(config.hist_len)` once and reuses that vector for both the depth histogram file and peak calling; standalone outputs keep the prior behavior. The regression test exercises a real count-up run with both side outputs present. A guarded paired-human count-up smoke with null FASTQ output (`READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t histout=... peaksout=...' MAX_RSS_KB=1600000 scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_hist_peaks_10k_20260425_ok`) completed in 2.639297s with 1,290,032 KB peak RSS, processed 3,036 reads / 455,400 bases after count-up prepass filtering, kept 824 reads, tossed 2,212 reads, and wrote both `histout.tsv` and `peaksout.tsv`. Stage timing showed one `output_hist=1.672309s` scan over the 114,091,144-cell kept sketch instead of two scans for histogram plus peaks.
- `cargo fmt --all`, `cargo test packed_count_min -- --test-threads=1`, `cargo test countup_writes_histout_and_peaksout -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after adding bounded sparse occupancy tracking to packed count-min sketches. Packed sketches now keep an exact occupied-cell count, retain a touched-slot list up to 8,000,000 cells, use that sparse list for `depth_hist` and threshold occupancy when it is still safe, and automatically fall back to full-table scans once the side list would exceed the cap. This makes count-up output reporting cheaper without letting the helper structure become an unbounded memory surprise. The matched guarded paired-human count-up smoke (`READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t histout=... peaksout=...' MAX_RSS_KB=1600000 TIMEOUT=5m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_sparse_hist_8m_10k_20260425`) completed in 1.634044s with 919,212 KB peak RSS, processed 3,036 reads / 455,400 bases, kept 824 reads, tossed 2,212 reads, and wrote both `histout.tsv` and `peaksout.tsv`. Against the previous cached-hist baseline (`tmp/giant_safe_countup_hist_peaks_10k_20260425_ok`), wall time improved from 2.639297s to 1.634044s, peak RSS dropped from 1,290,032 KB to 919,212 KB, `output_hist` dropped from 1.672309s to 0.682785s, and `summary_counts` stayed effectively zero because unique-kmer estimation now uses the tracked occupied count instead of scanning the kept sketch.
- `cargo fmt --all`, `cargo test packed_count_min_depth_hist -- --test-threads=1`, `cargo test countup_writes_histout_and_peaksout -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after compacting packed-sketch histogram reduction by the cell maximum. Packed `depth_hist` now reduces into `min(histlen, max_count+1)` bins and expands to the requested output length only once, preserving `histlen`/`printzerocoverage` compatibility while avoiding per-worker million-bin vectors for 4/8/16-bit sketches. On the same guarded count-up human smoke (`READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t histout=... peaksout=...' MAX_RSS_KB=1600000 TIMEOUT=5m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_compact_histbins_10k_20260425`), Rust completed in 1.008465s with 418,144 KB peak RSS, processed 3,036 reads / 455,400 bases, kept 824 reads, tossed 2,212 reads, and wrote both output side reports. Relative to the sparse-slot baseline (`tmp/giant_safe_countup_sparse_hist_8m_10k_20260425`), wall time improved from 1.634044s to 1.008465s, peak RSS dropped from 919,212 KB to 418,144 KB, and `output_hist` dropped from 0.682785s to 0.027841s; relative to the original cached-hist baseline (`tmp/giant_safe_countup_hist_peaks_10k_20260425_ok`), `output_hist` dropped from 1.672309s to 0.027841s and overall RSS is now roughly one third of the original guarded smoke.
- `cargo fmt --all`, `cargo test countup_prepass -- --test-threads=1`, `cargo test countup_work -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after removing a duplicated count-up prepass analysis path. The relaxed count-up prepass now produces a `PairAnalysis` once, reuses that analysis to build the sorted work-source key whenever ECC has not mutated the read, and defers rollback clones until ECC correction actually needs them. This preserves the existing Java-shaped prepass inclusion, `addbadreadscountup`, and ECC rollback behavior while trimming CPU and allocation pressure in the common non-ECC count-up path. On the guarded paired-human count-up smoke (`READS=10000 TABLE_READS=10000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t histout=... peaksout=...' MAX_RSS_KB=1600000 TIMEOUT=5m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_prepass_reuse_10k_20260425`), Rust completed in 0.957672s with 403,920 KB peak RSS, processed 3,036 reads / 455,400 bases, kept 824 reads, tossed 2,212 reads, and wrote both output reports. Relative to the compact-hist baseline (`tmp/giant_safe_countup_compact_histbins_10k_20260425`), `countup_work_source` dropped from 0.263778s to 0.243902s, wall time improved from 1.008465s to 0.957672s, and RSS dropped from 418,144 KB to 403,920 KB.
- Guarded larger paired-human count-up validation passed after the recent count-up reporting and prepass changes. The deterministic safe probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_prepass_reuse_50k_20260425`) completed in 4.592378s with 486,452 KB peak RSS, processed 18,154 reads / 2,723,100 bases after the count-up prepass, kept 11,976 reads, tossed 6,178 reads, and wrote both `histout.tsv` and `peaksout.tsv`. Stage timing shows the post-optimization count-up output report remains negligible (`output_hist=0.027998s`), while the remaining deterministic cost is now table build and sorted work-source generation: `input_main_counting=2.703249s`, `countup_work_source=1.210830s`, and `countup_normalize=0.361349s`. A matched production-fast probe with schedule-dependent selection (`EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...'`, artifact `tmp/giant_safe_countup_prepass_reuse_detf_50k_20260425`) completed in 2.369755s with 261,972 KB peak RSS and the same 1.8 GB guard. The direct atomic `deterministic=f` path cut `input_main_counting` from 2.703249s to 0.453643s while preserving safe bounded memory; keep/toss counts differ slightly, as expected for nondeterministic read selection, so deterministic mode remains the reproducible output path and `deterministic=f` is the fast production-throughput choice.
- `cargo fmt --all`, `cargo test countup_ -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test prefilter -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after moving count-up work-source prepass analysis onto bounded Rayon batches. Count-up still reads pairs and assigns original indexes/random coins sequentially for reproducible ordering semantics, but it now processes up to 1024 candidate pairs at a time in parallel before feeding the existing external-sort spill budget. On the guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_chunked_prepass_detf_50k_20260425`), Rust completed in 1.361299s with 261,552 KB peak RSS, processed 18,176 reads / 2,726,400 bases after prepass filtering, kept 11,972 reads, tossed 6,204 reads, and wrote both side reports. Against the previous fast count-up probe (`tmp/giant_safe_countup_prepass_reuse_detf_50k_20260425`), wall time improved from 2.369755s to 1.361299s and `countup_work_source` dropped from 1.229158s to 0.247747s while RSS stayed flat/slightly lower (261,972 KB to 261,552 KB). The remaining fast-path time is now mostly input histogram/normalization rather than work-source construction: `input_main_counting=0.433732s`, `input_hist=0.265739s`, `countup_normalize=0.353988s`, and `output_hist=0.027755s`.
- `cargo fmt --all`, `cargo test output_pair_analysis_is_only_required_for_rename_or_depth_bins -- --test-threads=1`, `cargo test ecc_pair_rollback_restores_corrected_mate_when_partner_is_uncorrectable -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo test ecc_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after trimming two normalization-path overheads. ECC rollback clones are now lazy and only allocated when rollback is possible, with a regression covering the paired case where one mate is corrected and the other is uncorrectable. Count-up normalization now skips the post-decision `analyze_pair` entirely unless `rename=t` or low/mid/high depth-bin outputs require it; standard normalization also skips the depth-bin writer no-op when no depth-bin outputs exist. On the guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_skip_output_analysis_detf_50k_20260425`), Rust completed in 1.261725s with 261,064 KB peak RSS, processed 18,166 reads / 2,724,900 bases after prepass filtering, kept 11,972 reads, tossed 6,194 reads, and wrote both side reports. Against the previous chunked-prepass baseline (`tmp/giant_safe_countup_chunked_prepass_detf_50k_20260425`), wall time improved from 1.361299s to 1.261725s and `countup_normalize` dropped from 0.353988s to 0.301933s while RSS stayed flat/slightly lower (261,552 KB to 261,064 KB). The remaining fast-count-up cost on this slice is now mainly `input_main_counting=0.431409s`, `input_hist=0.251380s`, `countup_work_source=0.242277s`, and `countup_normalize=0.301933s`.
- `cargo fmt --all`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test countup_decision_plan_reuses_input_depth_gate_for_kept_updates -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after fusing count-up keep/toss decisions with kept-table update eligibility. The production count-up loop now builds a small `CountupDecisionPlan` while it is already probing input depths, then reuses the eligible key indexes when updating the kept-count table instead of probing the input sketch a second time. The old decision/update helpers remain test-only comparison paths, and the regression verifies the planned update matches the replayed old update while excluding below-`min_depth` keys. On the guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_decision_plan_detf_50k_20260425`), `countup_normalize` dropped from 0.301933s (`tmp/giant_safe_countup_skip_output_analysis_detf_50k_20260425`) to 0.286665s, with a repeat at 0.279166s (`tmp/giant_safe_countup_decision_plan_repeat_detf_50k_20260425`) and RSS still about 261 MB. Total nondeterministic wall time was noisy because `input_main_counting`, `input_hist`, and `countup_work_source` jittered upward in the repeats, so the conservative claim is a stage-local normalization win rather than a clean wall-clock win for `deterministic=f`. The current deterministic guarded count-up probe (`tmp/giant_safe_countup_decision_plan_dett_50k_20260425`) completed in 3.582812s with 508,640 KB peak RSS, kept the same 18,154 processed / 11,976 kept / 6,178 tossed counts as the earlier deterministic baseline, and shows the cumulative count-up work-source/normalization improvements clearly: `countup_work_source=0.246496s` and `countup_normalize=0.278158s` versus 1.210830s and 0.361349s in `tmp/giant_safe_countup_prepass_reuse_50k_20260425`.
- `cargo fmt --all`, `cargo test countup_work_source_collects_input_histograms_like_separate_collectors -- --test-threads=1`, `cargo test combined_primary_histograms -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after folding count-up input histogram/read-depth histogram generation into the count-up work-source pass. Count-up no longer performs a separate input read pass solely for `hist`/`rhist`; instead the bounded chunked work-source builder analyzes each candidate chunk once for both sorted count-up work and optional input-side histograms, while preserving the old standalone collectors for non-count-up paths and single-purpose callers. The regression compares the fused count-up work-source histograms against the separate primary `hist` and `rhist` collectors on the same exact-count fixture. On the guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_combined_input_hist_detf_50k_20260425`), wall time improved from 1.311002s / 261,076 KB RSS (`tmp/giant_safe_countup_decision_plan_detf_50k_20260425`) to 1.209808s / 261,468 KB RSS, with `input_hist` dropping from 0.261311s to 0.191490s and `countup_work_source` from 0.272297s to 0.243878s. The deterministic reproducibility probe (`tmp/giant_safe_countup_combined_input_hist_dett_50k_20260425`) preserved the exact prior processed/kept/tossed and unique-kmer outputs from `tmp/giant_safe_countup_decision_plan_dett_50k_20260425` (18,154 processed, 11,976 kept, 6,178 tossed; input/output unique kmers 11,142,904/34,679), while wall time stayed essentially flat at 3.579701s versus 3.582812s and RSS rose from 508,640 KB to 531,468 KB. This is a clear fast-path pass-reduction win and a neutral deterministic change; the next count-up bottlenecks are the deterministic table-build path and gzip/input read throughput.
- `cargo fmt --all`, `cargo test countup_work -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after fusing the remaining count-up input-hist analysis with count-up prepass analysis. The read analyzer now separates one raw coverage lookup from config-specific interpretation, so the count-up work-source chunk builder can derive both normal input `hist`/`rhist` analysis and the relaxed count-up presort analysis from the same k-mer depth scan whenever the configs share `k`, canonical mode, and spike smoothing. The trim-after-marking path keeps the old conservative clone/reanalyze behavior to preserve ECC ordering. On the guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=1 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_fused_analysis_detf_50k_20260425`), wall time improved from 1.209808s / 261,468 KB RSS (`tmp/giant_safe_countup_combined_input_hist_detf_50k_20260425`) to 1.110508s / 264,224 KB RSS. Accounting moved the shared scan into `countup_work_source`: `input_hist` dropped from 0.191490s to 0.000223s, `countup_work_source` rose from 0.243878s to 0.295206s, and the combined input-hist/work-source cost fell from 0.435368s to 0.295429s. The deterministic reproducibility probe (`tmp/giant_safe_countup_fused_analysis_dett_50k_20260425`) improved from 3.579701s / 531,468 KB RSS to 3.277415s / 511,260 KB RSS versus `tmp/giant_safe_countup_combined_input_hist_dett_50k_20260425`, with identical processed/kept/tossed counts, identical input/output unique-kmer summaries, and byte-identical `input.hist.tsv`, `input.rhist.tsv`, `histout.tsv`, and `peaksout.tsv`. This makes the current count-up path both faster and safer to validate at larger sizes; the remaining large-run bottlenecks are mostly deterministic input-table construction and compressed input throughput.
- `cargo fmt --all`, `cargo test gzip_threads_are_split_across_concurrent_gzip_streams -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after making Rust treat `zipthreads` as a total budget across concurrently open gzip streams instead of a per-stream multiplier. Paired gz inputs, paired gz output pairs, and paired input-list readers now split the budget conservatively across active `.gz` streams, so `threads=8 zipthreads=8` opens two mate decoders at about four pigz workers each instead of two eight-worker decoders plus eight Rayon workers. Single gzip streams still receive the full budget, and mixed plain/gzip pairs only count the gzip path. On the guarded paired-human fast count-up probe with external pigz available (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_zipbudget8_detf_50k_20260425`), Rust completed in 1.061903s with 261,372 KB peak RSS, processed 18,128 reads / 2,719,200 bases, kept 11,960 reads, tossed 6,168 reads, and stayed below the 1.8 GB guard. A comparison run with `ZIPTHREADS=16` (`tmp/giant_safe_countup_zipbudget16_detf_50k_20260425`), which now approximates the previous two `pigz -p 8` shape, was effectively tied at 1.060625s / 262,108 KB RSS; the safer `ZIPTHREADS=8` cap also beat the prior single-thread gzip fused-analysis probe (`tmp/giant_safe_countup_fused_analysis_detf_50k_20260425`, 1.110508s / 264,224 KB RSS). This closes a practical workstation-freeze risk from accidental gzip oversubscription without reducing throughput on the 50k human guard slice.
- `cargo fmt --all`, `cargo test countup_work_source_collects_input_histograms_like_separate_collectors -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after carrying primary input `SeqFormat` metadata through the count-up work-source builder. `run_countup` no longer reopens `PrimaryReaders` after `collect_countup_work_source` solely to rediscover writer formats, which removes a redundant gz input open/decompressor spawn in count-up runs while preserving output format semantics. On the guarded paired-human fast count-up probe with split gzip budget (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_reuse_format_zip8_detf_50k_20260425`), Rust completed in 1.011057s with 261,724 KB peak RSS, processed 18,124 reads / 2,718,600 bases, kept 11,938 reads, tossed 6,186 reads, and wrote histogram/peaks artifacts only. Against the prior split-gzip baseline (`tmp/giant_safe_countup_zipbudget8_detf_50k_20260425`), wall time moved from 1.061903s to 1.011057s at effectively unchanged RSS; stage timing was `input_main_counting=0.418346s`, `input_hist=0.000166s`, `countup_work_source=0.266015s`, `countup_normalize=0.286731s`, and `output_hist=0.030804s`.
- `cargo fmt --all`, `cargo test seqio -- --test-threads=1`, `cargo test reader_threaded_gzip_input_round_trips_fastq -- --test-threads=1`, `cargo test writer_parallel_gzip_output_round_trips_fastq -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after increasing FASTQ/FASTA file, gzip decoder, external pigz/unpigz pipe, and sequence-output buffers to a bounded 1 MiB per active stream. This replaces the standard 8 KiB wrappers without changing parser/writer semantics, trimming syscall/pipe churn for huge compressed reads and large FASTQ outputs while adding only small per-stream memory overhead. On the guarded paired-human fast count-up probe with split gzip budget (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_iobuf1m_zip8_detf_50k_20260425`), Rust completed in 1.010024s with 261,264 KB peak RSS, processed 18,158 reads / 2,723,700 bases, kept 11,970 reads, tossed 6,188 reads, and wrote histogram/peaks artifacts only. Against the immediate prior reuse-format guard (`tmp/giant_safe_countup_reuse_format_zip8_detf_50k_20260425`, 1.011057s / 261,724 KB RSS), the small 50k slice was effectively flat; stage timing was `input_main_counting=0.422583s`, `input_hist=0.000125s`, `countup_work_source=0.260684s`, `countup_normalize=0.286290s`, and `output_hist=0.028893s`, so larger compressed-input/output-writing probes are still needed to quantify the I/O-buffer win.
- `cargo fmt --all`, `cargo test countup_work_candidate_memory_hint_tracks_payload_size -- --test-threads=1`, `cargo test countup_prepass_chunk_ready_respects_pair_and_byte_limits -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after making count-up work-source prepass chunks byte-bounded as well as pair-bounded. The retained production path keeps the proven 1024-pair Rayon chunk cap for short-read throughput, but now also tracks candidate payload size and flushes at 16 MiB so long-read count-up runs cannot accumulate arbitrarily large prepass chunks before analysis/spill. A tried 4096-pair short-read batch was not retained because it regressed the 50k guard (`tmp/giant_safe_countup_bytechunk4096_zip8_detf_50k_20260425`, 1.062258s / 264,304 KB RSS) without improving `countup_work_source`. The final guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_bytecap1024_zip8_detf_50k_20260425`) completed in 1.010276s with 261,392 KB peak RSS, processed 18,174 reads / 2,726,100 bases, kept 11,986 reads, tossed 6,188 reads, and stayed flat versus the immediate prior I/O-buffer guard (`tmp/giant_safe_countup_iobuf1m_zip8_detf_50k_20260425`, 1.010024s / 261,264 KB RSS). Stage timing was `input_main_counting=0.418857s`, `input_hist=0.000120s`, `countup_work_source=0.261841s`, `countup_normalize=0.289550s`, and `output_hist=0.028537s`.
- `cargo fmt --all`, `cargo test countup_presort_tie_breaks_by_record_id_without_duplicate_key_id -- --test-threads=1`, `cargo test countup_spilled_runs_merge_like_in_memory_sort -- --test-threads=1`, `cargo test countup_compacted_run_group_preserves_sorted_order -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, `python3 -m py_compile scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py`, and `cargo fmt --all --check` passed after removing the duplicated read-ID string from count-up sort keys. Count-up work pairs already carry the first read's `SequenceRecord`, so in-memory sort and external run merge now use `pair.r1.id` for the same tie-breaker instead of cloning and serializing a second ID inside `CountupSortKey`; this reduces per-pair heap payload and temp-run bytes on large/spilled count-up runs while preserving sort order. On the guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_sortkey_nodupid_repeat_zip8_detf_50k_20260425`), Rust completed in 1.010596s with 260,756 KB peak RSS, processed 18,138 reads / 2,720,700 bases, kept 11,972 reads, tossed 6,166 reads, and stayed flat versus the byte-capped prepass guard (`tmp/giant_safe_countup_bytecap1024_zip8_detf_50k_20260425`, 1.010276s / 261,392 KB RSS). Stage timing was `input_main_counting=0.418667s`, `input_hist=0.000137s`, `countup_work_source=0.261991s`, `countup_normalize=0.276217s`, and `output_hist=0.028123s`; a first repeat (`tmp/giant_safe_countup_sortkey_nodupid_zip8_detf_50k_20260425`) showed expected nondeterministic jitter but the same safe RSS envelope.
- `cargo fmt --all`, `cargo test countup_run_reader_uses_large_spill_buffer -- --test-threads=1`, `cargo test countup_spilled_runs_merge_like_in_memory_sort -- --test-threads=1`, `cargo test countup_compacted_run_group_preserves_sorted_order -- --test-threads=1`, `cargo test gzip_threads_are_split_across_concurrent_gzip_streams -- --test-threads=1`, `cargo test output_gzip_threads_are_split_across_all_active_output_streams -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after tightening two huge-run I/O safety paths. Count-up external sort run files now use explicit 1 MiB readers/writers for initial spills and compaction merges, avoiding tiny default buffers once count-up leaves memory for temp runs. Gzip output writers now share one zipthread budget across all active keep/toss/low/mid/high/outuncorrected streams at the current input-list index instead of granting each output pair the full budget; with `threads=8 zipthreads=8`, four paired keep/toss `.fq.gz` streams now receive two compressor workers each rather than allowing multiple independent pools to oversubscribe the workstation. The guarded paired-human count-up probe with real compressed outputs (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=1 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_output_zipbudget_global_50k_20260425`) completed safely in 1.115976s with 269,356 KB peak RSS, processed 18,168 reads / 2,725,200 bases, kept 11,956 reads, tossed 6,212 reads, and wrote keep/toss FASTQ.gz plus hist/peaks artifacts without tripping the RSS guard.
- `cargo fmt --all`, `cargo test packed_count_min_disables_slot_tracking_for_large_tables -- --test-threads=1`, `cargo test packed_count_min_layout_reports_tracked_slot_memory -- --test-threads=1`, `cargo test packed_count_min_untracked_depth_hist_uses_compact_reducers -- --test-threads=1`, `cargo test count_min -- --test-threads=1`, `cargo test sketch -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all -- --test-threads=1` passed after making large packed count-min sketches more memory-honest. Packed sketches now skip occupied-slot side-vector tracking when the table is larger than `PACKED_SKETCH_TRACKED_SLOT_LIMIT`, preventing a hidden side allocation that could approach 64 MiB per huge packed sketch before being discarded; smaller packed sketches keep the fast tracked-slot path, and packed sketch layout summaries now include actual tracked-slot vector capacity when present. This exposed and fixed a fallback histogram reducer bug: the untracked packed-table `depth_hist` path was allocating `hist_len`-sized Rayon reducer buffers instead of compact `max_count+1` buffers. The first guarded paired-human count-up probe with tracking disabled (`tmp/giant_safe_countup_packed_notrack_50k_20260425`) proved the bug by jumping to 1.722393s / 1,072,892 KB RSS with `output_hist=0.718086s`; after compact reducers, the same guarded probe (`tmp/giant_safe_countup_packed_notrack_compacthist_50k_20260425`) completed safely in 1.062552s / 262,168 KB RSS, `output_hist=0.051264s`, processed 18,172 reads / 2,725,800 bases, kept 11,978 reads, tossed 6,194 reads, and stayed below the 1.8 GB RSS guard.
- `bash -n scripts/benchmark_biological_dataset.sh`, `python3 scripts/measure_command.py --max-rss-kb 10000 --timeout 30s -- python3 -c '...'`, and a guarded biological smoke (`READS=100 TABLE_READS=100 THREAD_CASES='1' KEEP_OUTPUTS=0 MAX_RSS_KB=2200000 TIMEOUT=2m EXTRA_ARGS='zipthreads=1' scripts/benchmark_biological_dataset.sh tmp/biological_dataset_measure_command_smoke_ok_20260425`) passed after moving the biological dataset benchmark onto `scripts/measure_command.py`. The biological benchmark now uses live process-tree RSS sampling plus optional timeout enforcement like `benchmark_giant_safe.sh`, normalizes `MAX_RSS_KB=0/off/none/unlimited`, records the measure script and timeout in `dataset.tsv`, and still writes the same `results.tsv` shape for thread-scaling comparisons. The smoke completed in 0.101041s with 1,458,112 KB peak RSS on the default S. cerevisiae paired slice and preserved output comparisons for the single thread case; a direct live RSS guard probe killed an 80 MiB sleeping Python child in 0.051875s with status 125 and `RSS guard exceeded: true`, confirming the monitor terminates process trees before long-running benchmark jobs can freeze the workstation.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_biological_dataset.sh scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh`, and `python3 -m py_compile scripts/measure_command.py scripts/extract_stage_timings.py scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py` passed after adding count-up external-spill telemetry. `RunSummary` now carries `CountupSpillSummary`, count-up spill and merge helpers record initial run count, compaction merge count, final run count, total bytes written, peak live temp bytes, and final live temp bytes, and the CLI emits a machine-readable `Count-up spill:` stderr row whenever external temp runs are used. `scripts/extract_unique_kmer_summary.py` now promotes those fields into `unique_kmers.tsv`, with a forced parser probe confirming `initial_runs=3`, `merge_runs=1`, `final_runs=1`, `bytes_written=1234`, `peak_live_bytes=999`, and `final_live_bytes=456`. The guarded paired-human fast count-up probe (`READS=50000 TABLE_READS=50000 THREADS=8 ZIPTHREADS=8 MEM=512m WRITE_OUTPUTS=0 EXTRA_ARGS='countup=t deterministic=f histout=... peaksout=...' MAX_RSS_KB=1800000 TIMEOUT=8m scripts/benchmark_giant_safe.sh ... tmp/giant_safe_countup_spilltelemetry_50k_20260425`) completed safely in 1.113676s with 261,992 KB peak RSS, processed 18,144 reads / 2,721,600 bases, kept 11,986 reads, tossed 6,158 reads, and stayed below the RSS guard. That 50k probe did not spill, as expected; the focused spill/compaction regressions (`countup_spilled_runs_merge_like_in_memory_sort`, `countup_compacted_run_group_preserves_sorted_order`, and `countup_compaction_tracks_peak_and_final_temp_bytes`) cover the forced external-run path without risking a workstation-sized benchmark.
- `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and `cargo fmt --all --check` passed after promoting count-up spill telemetry into the Java/Rust mode-matrix summary. `scripts/benchmark_java_rust_modes.sh` now copies Rust `countup_spill_initial_runs`, `countup_spill_merge_runs`, `countup_spill_final_runs`, `countup_spill_bytes_written`, `countup_spill_peak_live_bytes`, and `countup_spill_final_live_bytes` from each mode's `unique_kmers.tsv` into top-level `summary.tsv`, beside the existing count-up stage timings. A fake skipped-Java harness smoke under `tmp/fake_spill_modes_harness/out` proved nonblank spill values survive aggregation (`7/2/1` runs, `4096` bytes written, `2048` peak live bytes, `512` final live bytes). A real safe count-up smoke (`MODE_CASES='countup' EXPECTED_FAILURE_MODES='countup' SKIP_EXPECTED_FAILURE_JAVA=1 REQUIRE_IDENTICAL_COMPARISONS=0 ALLOW_MODE_FAILURES=1 READS=100 TABLE_READS=100 THREADS=2 ZIPTHREADS=1 MEM=128m ... scripts/benchmark_java_rust_modes.sh ... tmp/java_rust_modes_countup_spill_columns_sample_20260425`) completed Rust in 0.050637s with 15,256 KB RSS and showed the new columns present but blank, as expected for a non-spilling tiny run. This makes future production-profile/count-up giant runs report temp-run pressure directly in the mode summary instead of requiring manual stderr/unique-kmer inspection.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_biological_dataset.sh`, and `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py` passed after adding count-up spill guardrails to the safe benchmark harnesses. `scripts/benchmark_java_rust_modes.sh` now accepts `MAX_COUNTUP_SPILL_INITIAL_RUNS`, `MAX_COUNTUP_SPILL_MERGE_RUNS`, `MAX_COUNTUP_SPILL_FINAL_RUNS`, `MAX_COUNTUP_SPILL_BYTES_WRITTEN`, `MAX_COUNTUP_SPILL_PEAK_LIVE_BYTES`, and `MAX_COUNTUP_SPILL_FINAL_LIVE_BYTES`, records them in `config.tsv`, and fails the mode row with `drift_classification=countup_spill_guard` even for skipped-Java expected-failure count-up modes. The fake mode harness tripwire under `tmp/countup_spill_guard_tests/mode_fail` correctly failed on `countup_spill_peak_live_bytes>1024`, while `tmp/countup_spill_guard_tests/mode_pass` passed at the exact 2048-byte cap. `scripts/benchmark_giant_safe.sh` now exposes the same spill caps, writes `countup_spill_guard.tsv`, and exits 125 when a standalone giant-safe run exceeds them; the fake unique-summary tripwire under `tmp/countup_spill_guard_tests/giant_fail` exited 125 with `status=exceeded`, while `tmp/countup_spill_guard_tests/giant_pass` recorded `status=ok`. This gives future count-up giant probes a disk/temp-run circuit breaker alongside the existing live RSS and timeout guards.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_biological_dataset.sh`, and `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py` passed after moving count-up spill protection into the Rust engine. The CLI now accepts Rust safety caps `maxcountupspillbytes`/`maxcountupspilllivebytes`/`countupspillbytes`/`countupspilllimit` for peak live external-sort temp bytes and `maxcountupspillwritebytes`/`maxcountupspillwrittenbytes`/`countupspillwritebytes` for cumulative temp-run bytes written. `spill_countup_run` and count-up run compaction enforce those caps immediately after each initial spill or merge write, so an opted-in production run can abort during count-up instead of waiting for post-run benchmark summaries. New regressions cover parser acceptance/rejection plus live-byte abort on an initial spill and cumulative-write abort during compaction. `scripts/benchmark_java_rust_human.sh` and `scripts/benchmark_giant_safe.sh` now forward `MAX_COUNTUP_SPILL_PEAK_LIVE_BYTES` and `MAX_COUNTUP_SPILL_BYTES_WRITTEN` into the Rust command as engine caps. The real tiny skipped-Java count-up smoke under `tmp/java_rust_modes_countup_engine_spillcap_sample_20260425` completed in 0.050947s / 15,572 KB RSS with no spill and confirmed the final Rust command included `maxcountupspillbytes=0`; the fake mode and giant-safe pass/fail tripwires under `tmp/countup_spill_guard_tests` still verify summary-level guard behavior.
- `cargo fmt --all`, `cargo test countup_spill -- --test-threads=1`, `cargo test accepts_constrained_count_min_controls_as_real_sketch_settings -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after extending count-up spill protection from bytes to run counts. The CLI now accepts engine abort caps `maxcountupspillinitialruns`, `maxcountupspillmergeruns`, and `maxcountupspillfinalruns` (with shorter `countupspill*` aliases) in addition to the previous byte caps. `spill_countup_run`, run compaction, and final source creation enforce those caps immediately, so a huge count-up external sort can be stopped by initial run count, merge run count, or live/final temp-run count before it creates an unsafe temp-file fanout. New regressions cover parser acceptance/rejection, initial-spill run-limit aborts, and compaction run-limit aborts. `scripts/benchmark_java_rust_human.sh` and `scripts/benchmark_giant_safe.sh` now pass the run caps to Rust, and `scripts/benchmark_java_rust_modes.sh` explicitly forwards all count-up spill cap environment values into each delegated harness run instead of depending on inherited export state. The real tiny skipped-Java count-up smoke under `tmp/java_rust_modes_countup_engine_run_caps_sample_20260425` completed in 0.050622s / 15,280 KB RSS with no spill and confirmed the final Rust command included `maxcountupspillinitialruns=0`, `maxcountupspillmergeruns=0`, and `maxcountupspillfinalruns=0`; the zero caps are safe on that smoke because it stays in memory, while production spilling runs will abort as soon as they exceed the selected temp-run budget.
- `cargo fmt --all`, `cargo test countup_spill -- --test-threads=1`, `cargo test accepts_constrained_count_min_controls_as_real_sketch_settings -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after closing the last count-up spill cap mismatch. Rust now accepts and enforces `maxcountupspillfinallivebytes`/`maxcountupspillfinalbytes`/`countupspillfinallivebytes` as engine abort caps on current/final live external-sort temp bytes, complementing the existing peak-live and cumulative-written byte caps. `scripts/benchmark_java_rust_human.sh` and `scripts/benchmark_giant_safe.sh` now pass `MAX_COUNTUP_SPILL_FINAL_LIVE_BYTES` into Rust instead of only recording/summary-gating it, while `scripts/benchmark_java_rust_modes.sh` already forwards that environment value to delegated harness runs. A new forced spill regression proves the engine aborts on `maxcountupspillfinallivebytes=0`, and parser coverage validates acceptance/rejection beside the other spill caps. The tiny skipped-Java count-up smoke under `tmp/java_rust_modes_countup_engine_final_live_cap_sample_20260425` completed in 0.050632s / 15,504 KB RSS with no spill and confirmed the release command included `maxcountupspillfinallivebytes=0`; future spilling count-up probes can now be stopped by initial run count, merge count, live/final run count, peak live bytes, final live bytes, or total spill bytes written before they consume unsafe disk or memory.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after making atomic 32-bit count-min depth histograms use compact dynamic Rayon reducers. `AtomicCountMinSketch::depth_hist` no longer allocates a `histlen`-sized buffer per worker; worker-local histograms now grow only to the largest observed depth bin and are resized to the requested output length only after reduction, preserving the existing output shape while avoiding accidental multi-worker `histlen` memory blowups on huge `bits=32` sketch runs. A new regression covers sparse high-`histlen` atomic histograms, and the non-empty guarded release smoke under `tmp/giant_safe_atomic_hist_compact_keep_sample_20260425` exercised `cells=8192 bits=32 keepall=t histlen=8192` on the paired sample FASTQ in 0.050795s with 15,576 KB peak RSS, kept all 200 reads, and emitted nonzero output histogram bins through depth 5.
- `cargo fmt --all`, `cargo test packed_count_min_depth_hist -- --test-threads=1`, `cargo test atomic_count_min_depth_hist -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after extending compact dynamic depth-hist reducers to packed count-min sketches. Packed `hist`/`histout` generation now shares the atomic dynamic reducer helper, so worker-local Rayon histograms grow only to observed nonzero depth bins instead of preallocating the requested/capped histogram length in every worker; the final vector is still resized to the requested `histlen`, preserving output shape and `printzerocoverage` behavior. New regressions cover tracked and untracked packed wide-cell sketches, alongside the existing compact small-cell reducers. A guarded count-up smoke under `tmp/giant_safe_countup_packed_dynamic_hist_sample_20260425` exercised `countup=t cells=8192 target=1000 max=2000 deterministic=f histlen=1000000` on paired sample FASTQ, produced a packed `countup_kept` sketch (`bits=16`, 8,186 cells, 16,888 bytes), wrote nonzero output bins in 0.001804s, and completed in 0.050721s with 21,848 KB peak RSS under the 1 GB guard.
- `cargo fmt --all`, `cargo test write_depth_hist -- --test-threads=1`, `cargo test packed_count_min_depth_hist -- --test-threads=1`, `cargo test atomic_count_min_depth_hist -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_giant_safe.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after removing the dense clone from k-mer histogram writing. `write_depth_hist` now streams zero-bin folding from the source histogram instead of duplicating `histlen` bins, preserving `zerobin`, `printzerocoverage`, and Java-compatible row behavior. The guarded million-bin count-up smoke under `tmp/giant_safe_countup_hist_writer_noclone_sample_20260425` completed in 0.050739s with 15,568 KB RSS, down from the previous 21,848 KB packed dynamic-reducer smoke, and `output_hist` dropped from 0.001804s to 0.001016s. Java/Rust benchmark memory recommendations now reserve three dense histogram buffers instead of `(threads+1)` per-worker buffers because compact reducers and streaming writer folding eliminated the old per-worker overhead.
- A follow-up guarded no-clone histogram writer recheck under `tmp/giant_safe_countup_hist_writer_noclone_recheck_20260425` completed in 0.050828s with 15,328 KB RSS and `output_hist=0.001228s` while exercising `countup=t cells=8192 target=1000 max=2000 deterministic=f histlen=1000000` on paired sample FASTQ. The smoke stayed below the 1 GB RSS guard and confirms the streamed histogram writer plus compact reducers keep million-bin count-up side-output reporting in the small-memory envelope.
- `cargo fmt --all`, `cargo test write_sparse_read_depth_hist -- --test-threads=1`, `cargo test countup_work_source_collects_input_histograms_like_separate_collectors -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after keeping count-up input read-depth histograms sparse until write time. The fused count-up work-source path now merges chunk `rhist` bins into a `SparseReadDepthHist` instead of allocating the two dense `histlen`-sized `ReadDepthHistogram` vectors, and `write_sparse_read_depth_hist` streams Java-shaped `#Depth\tReads\tBases` rows directly from sparse bins while preserving zero-coverage printing and overflow-bin behavior. The guarded million-bin count-up smoke under `tmp/giant_safe_countup_sparse_input_rhist_sample_20260425` completed in 0.050654s with 15,420 KB RSS, `countup_work_source=0.001081s`, `output_hist=0.001011s`, and valid `input.rhist.tsv` output under the 1 GB guard. This removes another dense `histlen` allocation family from the huge-safe count-up reporting path; input k-mer `hist` still intentionally materializes one dense output vector for exact row-compatible `hist`/`peaks` writing.
- `cargo fmt --all`, `cargo test write_sparse_depth_hist -- --test-threads=1`, `cargo test countup_work_source_collects_input_histograms_like_separate_collectors -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after keeping count-up input k-mer histograms sparse until write time. The count-up work-source path now merges chunk depth bins into `SparseHist` instead of allocating a dense `histlen` vector; `write_sparse_depth_hist` streams Java-shaped `hist` rows directly from sparse bins while preserving zero-bin folding, `histcolumns`, zero-coverage output, and overflow-bin behavior. If `peaks=` is requested, Rust still densifies once for the existing peak caller, but the common benchmark path with `hist=` only avoids that allocation. The guarded million-bin smoke under `tmp/giant_safe_countup_sparse_input_hist_sample_20260425` completed in 0.050862s with 15,356 KB RSS, `countup_work_source=0.000954s`, and `output_hist=0.001277s`, writing both `input.hist.tsv` and `input.rhist.tsv` under the 1 GB guard.
- `cargo fmt --all`, `cargo test output_counts_sparse_depth_hist_matches_dense_hist -- --test-threads=1`, `cargo test write_sparse_depth_hist -- --test-threads=1`, `cargo test countup_writes_histout_and_peaksout -- --test-threads=1`, `cargo test countup_ -- --test-threads=1`, `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after keeping count-up output `histout` sparse when `peaksout` is not requested. `OutputCounts` can now emit sparse raw-depth histograms from exact, packed, and atomic kept-count backends, and `run_countup` streams `histout` through `write_sparse_depth_hist`; dense `histlen` vectors are still built only when `peaksout` needs indexed bins for peak calling. The guarded million-bin smoke under `tmp/giant_safe_countup_sparse_output_hist_sample_20260425` exercised `countup=t cells=8192 target=1000 max=2000 deterministic=f histout=... histlen=1000000` on paired sample FASTQ and completed in 0.050713s with 15,520 KB peak RSS, `output_hist=0.000044s`, and valid sparse `histout.tsv` bins under the 1 GB guard.
- `cargo fmt --all`, `cargo test combined_primary_histograms -- --test-threads=1`, `cargo test write_sparse_read_depth_hist -- --test-threads=1`, `cargo test countup_writes_histout_and_peaksout -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo fmt --all --check`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after keeping read-depth histogram side outputs sparse when dense k-mer histograms are not already being collected. The primary read-depth collector now merges Rayon chunk `SparseReadDepthHist` maps directly, so count-up `rhistout`, normal input-only `rhist`, and normal output-only `rhistout` can stream through `write_sparse_read_depth_hist` instead of allocating the dense reads/bases vectors for `histlen`. Dense read histograms remain only for the combined `hist+rhist` path where the caller already needs dense k-mer bins for peak-compatible output. Guarded million-bin count-up smokes under `tmp/giant_safe_countup_sparse_output_rhist_sample_20260425` and `tmp/giant_safe_countup_sparse_output_rhist_keepall_sample_20260425` completed in 0.050715s / 15,420 KB RSS and 0.050820s / 15,252 KB RSS respectively; the keep-all smoke wrote non-empty `rhistout.tsv` (`0	200	20000`) with `output_rhist=0.000982s` under the 1 GB guard.
- `cargo fmt --all`, `cargo test combined_primary_histograms -- --test-threads=1`, `cargo test write_sparse_depth_hist -- --test-threads=1`, `cargo test writes_histograms_and_keeps_all_fastq_records -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo fmt --all --check`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after moving normal-mode `hist`/`rhist` side-output collection onto sparse maps whenever no peak caller needs dense bins. `collect_primary_sparse_hist` and `collect_primary_sparse_hist_and_read_hist` now share the same Rayon chunk analysis as dense collectors, and `run_single_pass` streams input `hist`+`rhist` and output `histout`+`rhistout` through sparse writers unless `peaks`/`peaksout` is requested. Peak-compatible paths still materialize dense histograms by design. The guarded million-bin normal-mode smoke under `tmp/giant_safe_sparse_normal_hist_rhist_sample_20260425` exercised bounded input/output sketches with `hist`, `rhist`, `histout`, and `rhistout` plus `histlen=1000000`; it completed in 0.050633s with 15,256 KB peak RSS, `input_hist=0.001078s`, `output_hist=0.000805s`, valid non-empty `histout.tsv`/`rhistout.tsv`, and no RSS guard trip.
- `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test sparse_peak_dense_trims_trailing_zero_histlen_without_changing_peaks -- --test-threads=1`, `cargo test countup_writes_histout_and_peaksout -- --test-threads=1`, `cargo test representative_peak_output_matches_java_bbnorm -- --test-threads=1`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, and a final `cargo fmt --all --check` passed after compacting peak side-output histograms from sparse bins. Normal and count-up `peaks`/`peaksout` paths no longer expand sparse histograms to `histlen` just to call the peak writer; they compact to the last observed depth plus a 32-bin zero tail for smoothing/closure, while overflow bins can still force the full configured length when needed. A regression compares compact sparse peak output byte-for-byte against the old dense million-bin path. Guarded million-bin smokes under `tmp/giant_safe_countup_sparse_peaksout_sample_20260425` and `tmp/giant_safe_normal_sparse_peaksout_sample_20260425` completed in 0.051040s / 15,272 KB RSS and 0.050750s / 15,632 KB RSS respectively, both wrote valid `peaksout.tsv`, and the count-up `output_hist` stage stayed at 0.000056s under the 1 GB guard.
- `cargo fmt --all`, `cargo test gzip -- --test-threads=1`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo fmt --all --check`, `cargo test --all -- --test-threads=1`, `bash -n scripts/benchmark_java_rust_modes.sh scripts/benchmark_java_rust_human.sh scripts/benchmark_giant_safe.sh scripts/benchmark_biological_dataset.sh`, `python3 -m py_compile scripts/extract_unique_kmer_summary.py scripts/compare_histogram_tables.py scripts/extract_stage_timings.py scripts/measure_command.py`, a final `cargo fmt --all --check`, and a release CLI smoke under `tmp/home_local_pigz_discovery_smoke_20260425` passed after hardening threaded gzip input discovery. Rust now resolves external gzip decoders from normal `PATH` entries and from `$HOME/.local/bin/pigz`/`unpigz`, matching the repository installer location even when the user shell has not exported that directory. The release smoke stripped `/home/jake/.local/bin` out of `PATH`, provided a fake `$HOME/.local/bin/pigz`, and confirmed the actual `bbnorm-rs` binary invoked it while processing a gzipped-input run (`used home-local fake pigz`; 1 read / 8 bases kept). This reduces the direct-run risk that huge compressed FASTQ inputs silently fall back to single-thread zlib despite `threads`/`zipthreads` requesting parallel gzip workers.