[][src]Module debruijn::msp

Methods for minimum substring partitioning of a DNA string

simple_scan method is based on: Li, Yang. "MSPKmerCounter: a fast and memory efficient approach for k-mer counting." arXiv preprint arXiv:1505.06550 (2015).

Structs

MspInterval
MspIntervalP

Represents a sequence interval composed of successive k-mers that share a common minizer p-mer.

Scanner

Determine MSP substrings of a sequence, for given k and p. The scan() method Returns a vector of tuples indicating the substrings, and the p-mer values as a set of MspIntervalP<P> values. A user-supplied score function is used to rank p-mers for the purposes of finding the minimizer. A permutation is a permutation of the lexicographically-sorted set of all pmers. A permutation of pmers sorted by their inverse frequency in the dataset will give the most even bucketing of MSPs over pmers.

Functions

msp_sequence
simple_scanDeprecated

Determine MSP substrings of seq, for given k and p. Returns a vector of tuples indicating the substrings, and the pmer values: (p-mer value, min p-mer position, start position, end position) permutation is a permutation of the lexicographically-sorted set of all pmers. A permutation of pmers sorted by their inverse frequency in the dataset will give the most even bucketing of MSPs over pmers.