Module debruijn::msp [−][src]
Expand description
Methods for minimum substring partitioning of a DNA string
simple_scan method is based on: Li, Yang. “MSPKmerCounter: a fast and memory efficient approach for k-mer counting.” arXiv preprint arXiv:1505.06550 (2015).
Structs
Represents a sequence interval composed of successive k-mers that share a common minizer p-mer.
Determine MSP substrings of a sequence, for given k and p.
The scan()
method Returns a vector of tuples indicating the substrings,
and the p-mer values as a set of MspIntervalP<P>
values. A user-supplied
score function is used to rank p-mers for the purposes of finding the minimizer.
A permutation is a permutation of the lexicographically-sorted set of all pmers.
A permutation of pmers sorted by their inverse frequency in the dataset will give the
most even bucketing of MSPs over pmers.
Functions
Determine MSP substrings of seq, for given k and p. Returns a vector of tuples indicating the substrings, and the pmer values: (p-mer value, min p-mer position, start position, end position) permutation is a permutation of the lexicographically-sorted set of all pmers. A permutation of pmers sorted by their inverse frequency in the dataset will give the most even bucketing of MSPs over pmers.