Skip to main content

Crate rsomics_junction_saturation

Crate rsomics_junction_saturation 

Source
Expand description

Subsample-based splice-junction saturation analysis.

Algorithm (reimplemented from RSeQC junction_saturation.py):

  1. Parse BED12 gene models; extract all annotated splice sites (junction donor and acceptor positions) and annotated junctions (donor, acceptor pairs) into hash sets.
  2. Load all mapped, primary reads from the BAM.
  3. Shuffle all read indices once with a seeded ChaCha12 RNG.
  4. For each fraction F in [lower..upper] step S: a. Take the first ⌊F% × total_reads⌋ indices from the shuffled order. b. For each selected read, extract introns from the CIGAR N operations. c. Classify each observed junction as:
    • known: both donor and acceptor in the annotated junction set
    • partial novel: one of donor/acceptor is annotated
    • complete novel: neither is annotated
  5. Write one TSV file <prefix>.junction_saturation.txt with columns: pct\tknown\tpartial_novel\tcomplete_novel

Using a single shuffle (prefix-based sampling) guarantees monotonicity: the read set at fraction F1 is always a subset of the set at F2 > F1. RSeQC’s subsampling is non-deterministic (Python random). We use a seedable ChaCha12 RNG so results are reproducible when --seed is given.

Structs§

FractionResult
Per-fraction junction counts.
JunctionSaturationOpts
Options for junction saturation analysis.

Functions§

run
Run junction saturation analysis.