Expand description
Format-agnostic sequence-statistics primitives shared by the
rsomics-*-stats tools. The N50/L50/quartile math is a port of
shenwei356/bio util/length-stats.go; the alphabet guess mirrors
seqkit’s seq.GuessAlphabet. Sharing this verbatim (rather than
re-deriving per format) is what lets --all --tabular byte-agree with
seqkit stats -a -T for both FASTA and FASTQ.
Structs§
- Length
Stats - Port of
bio/util/length-stats.go. seqkit’s L50 counts unique-length buckets, not records — reproduced so--tabular --allagrees with seqkit.
Enums§
Functions§
- classify
- seqkit’s alphabet guess: any protein-only residue ⇒ Protein; else U-without-T ⇒ RNA; else DNA. Ambiguity codes and gaps do not decide the type.
- count_
any_ of - Count every byte of
haystackequal to any byte inneedles, dedupingneedlesso overlapping classes (e.g.b"GCgc") are not double-counted.