Expand description
K-mers and associated operations.
This library provides functionality for extracting k-mers from sequences,
and manipulating them in useful ways. The underlying representation is
64-bit integers (u64
), so k > 32 is not supported by this library.
K-mers (or q-grams in some computer science contexts) are k-length sequences of DNA/RNA “letters” represented as unsigned integers. Following usual practice,
- “A” -> b00
- “C” -> b01
- “G” -> b10
- “T” or “U” -> b11
which has the nice property that the complementary bases are bitwise complements.
Structs
- A sparse k-mer accumulator for large set of k-mers.
- Canonicalisation based on hashing.
- A simple canonicalisation based on lexicographic ordering
- A compressed representation for a sorted list of k-mer and count pairs.
- Iterate over a compressed k-mer list
- A compressed representation for a sorted list of k-mers.
- Iterate over a compressed k-mer list
- Take a sorted iterator and yield k-mer frequencies.
- Accumulate k-mer frequencies for small to medium k.
- Convert an iterator over bytes to a
Read
. - A k-length nucleotide sequence represented as a 64-bit integer.
- An iterator producing k-mer frequencies.
- An iterator over the k-mers drawn from a sequence.
- Merge two k-mer frequency iterators.
- A simple position index for k-mers.
- A simple index allowing multiple sequences to be distinguished in the index.
- A simple sparse accumulator
- An iterator producing k-mer frequencies.
Traits
- A Trait for capturing k-mer canonicalisation.
Functions
- Take a sorted iterator and yield k-mer frequencies.
- Compute the dot product between two k-mer frequency spectra.
- Construct a k-mer frequency iterator from an iterator over frequencies.
- Compute the Jaccard cooeficient from two sets of k-mers.
- Merge two iterators
- An adaptor that coverts an iterator over k-mers into a k-mer frequency iterator