Crate genedex

Crate genedex 

Source
Expand description

This library contains an implementation of the FM-Index data structure (original paper).

It is based on an encoding for the text with rank support data structure (a.k.a. occurrence table) by Simon Gene Gottlieb. This encoding attemps to provide a good trade-off between memory usage and running time of queries. Another traditional encoding is provided with higher memory usage, but faster query running times.

The library supports creating an FM-Index for a set of texts over an alphabet. The index construction is based on the libsais-rs crate and parallelized.

§Usage

The following is a basic example of how to use this library:

use genedex::{FmIndexConfig, alphabet};

let dna_n_alphabet = alphabet::ascii_dna_with_n();
let texts = [b"aACGT", b"acGtn"];

let index = FmIndexConfig::<i32>::new().construct_index(texts, dna_n_alphabet);

let query = b"GT";
assert_eq!(index.count(query), 2);

for hit in index.locate(query) {
    println!(
        "Found query in text {} at position {}.",
        hit.text_id, hit.position
    );
}

More information about the flexible cursor API, build configuration and variants of the FM-Index can be found in the module-level and struct-level documentation.

Optimized functions such as FmIndex::locate_many exist for searching multiple queries at once. They do not use multi-threading, but can still be significantly faster (around 2x) than calling the respective functions for single queries in a loop. The reason for the improved performance is that the queries are searched in batches, which allows different kinds of parallelism inside the CPU to be used. An example of how such a function is used can be found here.

Modules§

alphabet
Contains functions to create various commonly used alphabets.
text_with_rank_support
Different implementations of the text with rank support (a.k.a. occurrence table) data structure that powers the FM-Index.

Structs§

Alphabet
An alphabet that represents the set of valid symbols of a text.
Cursor
A cursor to the FM-Index.
FmIndex
The FM-Index data structure.
FmIndexConfig
A builder-like API to configure and construct the FM-Index.
Hit
Represents an occurrence of a searched query in the set of indexed texts.

Enums§

PerformancePriority
This enum can be supplied to the FmIndexConfig to select different sub-algorithms during the construction of the FM-Index.

Traits§

IndexStorage
Types that can be used to store indices inside the FM-Index.

Type Aliases§

FmIndexCondensed64
A little faster than FmIndexCondensed512, and still space efficient for larger alphabets. This is the default version.
FmIndexCondensed512
The most space efficient version. Currently, this is experimental and the *64 usually provide better trade-offs.
FmIndexFlat64
The fastest version.
FmIndexFlat512
A little smaller and slower than FmIndexFlat64. FmIndexCondensed64 should be a better trade-off for most applications.