Skip to main content

Crate rsomics_compute_matrix

Crate rsomics_compute_matrix 

Source
Expand description

bigWig signal → per-region score matrix, matching deeptools computeMatrix reference-point and scale-regions output.

§Output format (deeptools heatmapper.save_matrix)

A gzipped file whose first line is @ followed by a JSON dict of the parameters (no spaces; keys in deeptools’ fixed order; the per-sample “special” params — upstream, downstream, body, bin size, ref point, unscaled 5/3 prime — are emitted as one-element lists). Every subsequent line is one region: chrom, comma-joined exon starts, comma-joined exon ends, name, score, strand, then the per-bin signal values formatted with Python %f (six decimals; missing → nan).

§Per-region binning (deeptools coverage_from_big_wig + coverage_from_array)

For each region a reference point is chosen by mode and strand, two flank spans are laid out around it, the bigWig is read per-base (NaN where the file carries no data or the span runs off the chromosome), each flank is partitioned into bins by numpy.linspace(start, end, nbins, endpoint=False) truncated to int, and each bin’s value is the NaN-masked mean of its bases. Minus-strand regions read the flanks swapped and reverse the final row. With missing data as zero, NaN bases become 0 before averaging.

§reference-point spans (b = upstream, a = downstream, refpoint rp)

  • plus strand: left flank [rp-b, rp]b/binSize bins; right flank [rp, rp+a]a/binSize bins.
  • minus strand: left flank [rp-a, rp]a/binSize bins; right flank [rp, rp+b]b/binSize bins; the row is then reversed.

rp is start (TSS), end (TES) or (start+end)/2 (center) for the plus strand; end (TSS), start (TES) or (start+end)/2 (center) for minus.

§scale-regions spans

upstream flank [start-b, start] (b/binSize bins), the region body [start, end] scaled to body/binSize bins, downstream flank [end, end+a] (a/binSize bins). Minus strand swaps the up/down flanks and reverses.

Structs§

MatrixParams
Knobs that drive matrix layout and value computation, mirroring the deeptools parameter dict that ends up in the gzipped header.
Region
One BED6 region. score is kept as the literal BED field so a . stays . while a numeric value is re-emitted as deeptools’ float (00.0).

Enums§

BinAvg
The averaging statistic applied within each bin (deeptools --averageTypeBins).
Mode
Which subcommand layout to build.
RefPoint
Which point of each region anchors the flanks (reference-point mode).

Functions§

compute_matrix
Compute the matrix and write the gzipped deeptools-format file.
read_bed
Parse a BED file into BED6 regions. #-delimited multi-group BEDs are not supported (single “genes” group only); a # line is a hard error so we never silently mis-group.