Expand description
Streaming PCR/optical-duplicate marking, a Rust port of samtools markdup
(MIT) default template mode.
Input is coordinate-sorted and pre-processed by samtools fixmate -m, so
every paired record carries MC (mate CIGAR) and ms (mate score). markdup
streams in coordinate order keeping only a bounded window of recent reads:
a read held in the buffer is finalized and flushed once the current read’s
coordinate has advanced past it by more than max_length bases or the
reference changes (samtools bam_mark_duplicates, the buffer-trim loop).
When a record leaves the window its single/pair hash slots are removed, so
memory stays bounded by the window span rather than the file size — the
whole point of the rewrite over the prior full-buffer version.
Only the 0x400 duplicate flag bit is edited; seq/qual/cigar/name pass
through byte-for-byte via the rsomics_bamio::raw path.
Structs§
Functions§
- markdup
- Stream
input(coordinate-sorted, fixmate-m’d) and emit duplicate-marked records.output_pathofNonewrites BAM to stdout.