Expand description
samtools calmd port: recompute the MD and NM aux tags of every alignment
against a reference FASTA, then re-emit the BAM.
The MD/NM walk mirrors bam_fillmd1_core in samtools bam_md.c exactly,
including its nibble-level base comparison (htslib seq_nt16_table /
bam_seqi), its run-length MD string with ^-prefixed deletions, and its
“only rewrite the tag when the value differs” guard so unchanged records
stay byte-for-byte identical (matching aux ordering and integer subtype).
Reading uses the shared rsomics_bamio reader (libdeflate BGZF, parallel
at workers >= 2); records are processed on the raw BAM byte level —
seq/qual/cigar are never decoded into noodles types. MD/NM are written
directly into the raw aux tail via RawRecord::set_aux. This avoids the
full RecordBuf decode+re-encode round-trip (the former bottleneck at
-t4 and above, accounting for 67% of wall time). Output goes through the
bamio work-stealing BGZF writer — at workers >= 2 via the
WsBatchBamWriter, which frames each batch on its own thread so the
main thread’s per-record write cost no longer caps multi-thread throughput.
The reference is read from an indexed FASTA (.fai); each contig is fetched
once on first use and cached for the run. Coordinate-sorted input therefore
touches each contig once and never re-reads it — the common calmd case.
At workers >= 2 the MD/NM computation is parallelised with rayon: records
are collected into a batch, the needed contigs are fetched serially into a
shared read-only map (Arc<Vec<u8>> per contig), then par_iter_mut runs
the raw MD/NM pass on every record simultaneously. Output is written in
original batch order so the byte stream is identical to the serial path.