Skip to main content

Crate rsomics_seqstats

Crate rsomics_seqstats 

Source
Expand description

Format-agnostic sequence-statistics primitives shared by the rsomics-*-stats tools. The N50/L50/quartile math is a port of shenwei356/bio util/length-stats.go; the alphabet guess mirrors seqkit’s seq.GuessAlphabet. Sharing this verbatim (rather than re-deriving per format) is what lets --all --tabular byte-agree with seqkit stats -a -T for both FASTA and FASTQ.

Structs§

LengthStats
Port of bio/util/length-stats.go. seqkit’s L50 counts unique-length buckets, not records — reproduced so --tabular --all agrees with seqkit.

Enums§

SeqType

Functions§

classify
seqkit’s alphabet guess: any protein-only residue ⇒ Protein; else U-without-T ⇒ RNA; else DNA. Ambiguity codes and gaps do not decide the type.
count_any_of
Count every byte of haystack equal to any byte in needles, deduping needles so overlapping classes (e.g. b"GCgc") are not double-counted.