Skip to main content

Crate rsomics_coverage_core

Crate rsomics_coverage_core 

Source
Expand description

Genome-binned BAM read-coverage primitive — the per-bin read-counting core shared by rsomics-bam-signal (deeptools bamCoverage) and rsomics-bam-compare (deeptools bamCompare).

Both deeptools tools tile each chromosome into fixed-width bins and count, per bin, the reads whose reference span overlaps it (deeptools countReadsPerBin). bamCoverage then normalises one BAM’s bins; bamCompare combines two BAMs’ bins per operation. The tiling + counting is identical between them, so it lives here once (Layer A; B never depends on B). The normalisation, bedGraph run-length emit, and two-BAM combination stay in the respective tool crates — those differ.

§Counting semantics (deeptools source parity)

A read is accepted into the count when it is mapped (FLAG 0x4 clear, refID ≥ 0) and passes the BinFilter (skip_flags / min_mapq). With deeptools’ defaults (samFlag_exclude=None, ignoreDuplicates=False, minMappingQuality=None) the filter is empty, so secondary and supplementary reads are not excluded — pass skip_flags = 0x900 to match samtools-style filtering, 0x400 for duplicate-only.

Fragment extent: with extendReads=False / centerReads=False (deeptools defaults) the reference span is the alignment start plus every reference-consuming CIGAR op (M/=/X/D/N), matching pysam get_blocks().

Bin counting: a read contributes +1 to every bin it overlaps — s_idx = floor(fragStart / binSize), e_idx = ceil(fragEnd / binSize). The partial last bin per chromosome is retained.

Structs§

BinFilter
Read-acceptance predicate. A read counts when none of skip_flags are set in its FLAG and its MAPQ is at least min_mapq. Both default to “accept all mapped reads”, matching deeptools’ empty default filter.
BinnedCoverage
The result of binning one BAM: per-chromosome bin counts plus two read totals that downstream normalisation needs.
ChromBins
One chromosome’s bin counts. bins[i] holds the read count for the half-open reference window [i*bin_size, min((i+1)*bin_size, chrom_len)).

Functions§

compute_coverage
Scan input, tiling each reference at bin_size and counting reads per bin.