Module bio::data_structures

source ·
Expand description

Various useful data structures.

Modules

Efficient container for locations annotated across a set of named reference sequences.
BIT-tree (Binary Indexed Trees, aka Fenwick Tree) maintains a prefix-sum or prefix-max that can be efficiently queried and updated. From: Peter M. Fenwick (1994). “A new data structure for cumulative frequency tables”. Software: Practice and Experience. 24 (3): 327–336. Implementation outlined here: https://www.topcoder.com/community/data-science/data-science-tutorials/binary-indexed-trees/
A fixed-width bit encoding implementation. This allows to store a sequence of values over a reduced alphabet by packing them bit-encoded into a sequence of bytes.
The Burrows-Wheeler-Transform and related data structures. The implementation is based on the lecture notes “Algorithmen auf Sequenzen”, Kopczynski, Marschall, Martin and Rahmann, 2008 - 2015.
FM-Index and FMD-Index for finding suffix array intervals matching a given pattern in linear time.
Interval tree, a data structure for efficiently storing and searching intervals.
A classical, flexible, q-gram index implementation.
Rank/Select data structure based on Gonzalez, Grabowski, Mäkinen, Navarro (2005). This implementation uses only a single level of blocks, and performs well for large n.
A data structure for a sequence of small integers with a few big integers. Small ints are stored in type S (e.g. a byte), big ints are stored separately (in type B) in a BTree. The implementation provides vector-like operations on the data structure (e.g. retrieve a position, add an integer, etc.).
Suffix arrays and related algorithms. The implementation is based on the lecture notes “Algorithmen auf Sequenzen”, Kopczynski, Marschall, Martin and Rahmann, 2008 - 2015.