Crate rsomics_junction_saturation

Expand description

Subsample-based splice-junction saturation analysis.

Algorithm (reimplemented from RSeQC junction_saturation.py):

Parse BED12 gene models; extract all annotated splice sites (junction donor and acceptor positions) and annotated junctions (donor, acceptor pairs) into hash sets.
Load all mapped, primary reads from the BAM.
Shuffle all read indices once with a seeded ChaCha12 RNG.
For each fraction F in [lower..upper] step S: a. Take the first ⌊F% × total_reads⌋ indices from the shuffled order. b. For each selected read, extract introns from the CIGAR N operations. c. Classify each observed junction as:
- known: both donor and acceptor in the annotated junction set
- partial novel: one of donor/acceptor is annotated
- complete novel: neither is annotated
Write one TSV file <prefix>.junction_saturation.txt with columns: pct\tknown\tpartial_novel\tcomplete_novel

Using a single shuffle (prefix-based sampling) guarantees monotonicity: the read set at fraction F1 is always a subset of the set at F2 > F1. RSeQC’s subsampling is non-deterministic (Python random). We use a seedable ChaCha12 RNG so results are reproducible when --seed is given.

Structs§

FractionResult: Per-fraction junction counts.
JunctionSaturationOpts: Options for junction saturation analysis.

Functions§

run: Run junction saturation analysis.

Crate rsomics_junction_saturation

Crate rsomics_junction_saturation Copy item path

Structs§

Functions§

Crate rsomics_junction_saturation