Expand description
Genome-binned BAM read-coverage primitive — the per-bin read-counting core
shared by rsomics-bam-signal (deeptools bamCoverage) and
rsomics-bam-compare (deeptools bamCompare).
Both deeptools tools tile each chromosome into fixed-width bins and count,
per bin, the reads whose reference span overlaps it (deeptools
countReadsPerBin). bamCoverage then normalises one BAM’s bins; bamCompare
combines two BAMs’ bins per operation. The tiling + counting is identical
between them, so it lives here once (Layer A; B never depends on B). The
normalisation, bedGraph run-length emit, and two-BAM combination stay in the
respective tool crates — those differ.
§Counting semantics (deeptools source parity)
A read is accepted into the count when it is mapped (FLAG 0x4 clear, refID
≥ 0) and passes the BinFilter (skip_flags / min_mapq). With
deeptools’ defaults (samFlag_exclude=None, ignoreDuplicates=False,
minMappingQuality=None) the filter is empty, so secondary and supplementary
reads are not excluded — pass skip_flags = 0x900 to match samtools-style
filtering, 0x400 for duplicate-only.
Fragment extent: with extendReads=False / centerReads=False (deeptools
defaults) the reference span is the alignment start plus every
reference-consuming CIGAR op (M/=/X/D/N), matching pysam get_blocks().
Bin counting: a read contributes +1 to every bin it overlaps —
s_idx = floor(fragStart / binSize), e_idx = ceil(fragEnd / binSize). The
partial last bin per chromosome is retained.
Structs§
- BinFilter
- Read-acceptance predicate. A read counts when none of
skip_flagsare set in its FLAG and its MAPQ is at leastmin_mapq. Both default to “accept all mapped reads”, matching deeptools’ empty default filter. - Binned
Coverage - The result of binning one BAM: per-chromosome bin counts plus two read totals that downstream normalisation needs.
- Chrom
Bins - One chromosome’s bin counts.
bins[i]holds the read count for the half-open reference window[i*bin_size, min((i+1)*bin_size, chrom_len)).
Functions§
- compute_
coverage - Scan
input, tiling each reference atbin_sizeand counting reads per bin.