Skip to main content

Crate rsomics_bam_markdup

Crate rsomics_bam_markdup 

Source
Expand description

Streaming PCR/optical-duplicate marking, a Rust port of samtools markdup (MIT) default template mode.

Input is coordinate-sorted and pre-processed by samtools fixmate -m, so every paired record carries MC (mate CIGAR) and ms (mate score). markdup streams in coordinate order keeping only a bounded window of recent reads: a read held in the buffer is finalized and flushed once the current read’s coordinate has advanced past it by more than max_length bases or the reference changes (samtools bam_mark_duplicates, the buffer-trim loop). When a record leaves the window its single/pair hash slots are removed, so memory stays bounded by the window span rather than the file size — the whole point of the rewrite over the prior full-buffer version.

Only the 0x400 duplicate flag bit is edited; seq/qual/cigar/name pass through byte-for-byte via the rsomics_bamio::raw path.

Structs§

MarkdupOpts
MarkdupStats

Functions§

markdup
Stream input (coordinate-sorted, fixmate-m’d) and emit duplicate-marked records. output_path of None writes BAM to stdout.