Expand description
This library contains an implementation of the FM-Index data structure (original paper).
It is based on an encoding for the text with rank support data structure (a.k.a. occurrence table) by Simon Gene Gottlieb. This encoding attemps to provide a good trade-off between memory usage and running time of queries. Another traditional encoding is provided with higher memory usage, but faster query running times.
The library supports creating an FM-Index for a set of texts over an alphabet. The index construction
is based on the libsais-rs
crate and parallelized.
§Usage
The following is a basic example of how to use this library:
use genedex::{FmIndexConfig, alphabet};
let dna_n_alphabet = alphabet::ascii_dna_with_n();
let texts = [b"aACGT", b"acGtn"];
let index = FmIndexConfig::<i32>::new().construct_index(texts, dna_n_alphabet);
let query = b"GT";
assert_eq!(index.count(query), 2);
for hit in index.locate(query) {
println!(
"Found query in text {} at position {}.",
hit.text_id, hit.position
);
}
More information about the flexible cursor API, build configuration and variants of the FM-Index can be found in the module-level and struct-level documentation.
Optimized functions such as FmIndex::locate_many
exist for searching multiple queries at once. They do not use
multi-threading, but can still be significantly faster (around 2x) than calling the respective functions for single
queries in a loop. The reason for the improved performance is that the queries are searched in batches, which allows
different kinds of parallelism inside the CPU to be used. An example of how such a function is used can be found
here.
Modules§
- alphabet
- Contains functions to create various commonly used alphabets.
- text_
with_ rank_ support - Different implementations of the text with rank support (a.k.a. occurrence table) data structure that powers the FM-Index.
Structs§
- Alphabet
- An alphabet that represents the set of valid symbols of a text.
- Cursor
- A cursor to the FM-Index.
- FmIndex
- The FM-Index data structure.
- FmIndex
Config - A builder-like API to configure and construct the FM-Index.
- Hit
- Represents an occurrence of a searched query in the set of indexed texts.
Enums§
- Performance
Priority - This enum can be supplied to the
FmIndexConfig
to select different sub-algorithms during the construction of the FM-Index.
Traits§
- Index
Storage - Types that can be used to store indices inside the FM-Index.
Type Aliases§
- FmIndex
Condensed64 - A little faster than
FmIndexCondensed512
, and still space efficient for larger alphabets. This is the default version. - FmIndex
Condensed512 - The most space efficient version. Currently, this is experimental and the
*64
usually provide better trade-offs. - FmIndex
Flat64 - The fastest version.
- FmIndex
Flat512 - A little smaller and slower than
FmIndexFlat64
.FmIndexCondensed64
should be a better trade-off for most applications.