Crate kmers

Source
Expand description

K-mers and associated operations.

This library provides functionality for extracting k-mers from sequences, and manipulating them in useful ways. The underlying representation is 64-bit integers (u64), so k > 32 is not supported by this library.

K-mers (or q-grams in some computer science contexts) are k-length sequences of DNA/RNA “letters” represented as unsigned integers. Following usual practice,

  • “A” -> b00
  • “C” -> b01
  • “G” -> b10
  • “T” or “U” -> b11

which has the nice property that the complementary bases are bitwise complements.

Structs§

BigSparseAccumulator
A sparse k-mer accumulator for large set of k-mers.
CanonicalHash
Canonicalisation based on hashing.
CanonicalLex
A simple canonicalisation based on lexicographic ordering
CompressedKmerFrequencyList
A compressed representation for a sorted list of k-mer and count pairs.
CompressedKmerFrequencyListIterator
Iterate over a compressed k-mer list
CompressedKmerList
A compressed representation for a sorted list of k-mers.
CompressedKmerListIterator
Iterate over a compressed k-mer list
CountingIterator
Take a sorted iterator and yield k-mer frequencies.
DenseAccumulator
Accumulate k-mer frequencies for small to medium k.
IteratorReadAdaptor
Convert an iterator over bytes to a Read.
Kmer
A k-length nucleotide sequence represented as a 64-bit integer.
KmerFrequencyIterator
An iterator producing k-mer frequencies.
KmerIterator
An iterator over the k-mers drawn from a sequence.
MergeIterator
Merge two k-mer frequency iterators.
SimplePosIndex
A simple position index for k-mers.
SimpleSeqPosIndex
A simple index allowing multiple sequences to be distinguished in the index.
SimpleSparseAccumulator
A simple sparse accumulator
Unique
An iterator producing k-mer frequencies.

Traits§

Canonical
A Trait for capturing k-mer canonicalisation.

Functions§

counting_kmer_frequency_iterator
Take a sorted iterator and yield k-mer frequencies.
dot
Compute the dot product between two k-mer frequency spectra.
frequency_vector_iter
Construct a k-mer frequency iterator from an iterator over frequencies.
jaccard
Compute the Jaccard cooeficient from two sets of k-mers.
merge
Merge two iterators
unique
An adaptor that coverts an iterator over k-mers into a k-mer frequency iterator