Expand description
An idiomatic and mostly safe API wrapper for the awesome and very fast C library
libsais
by Ilya Grebnov.
libsais
provides highly optimized implementation for the features listed below.
These implementations are widely used in the analysis of very large sequences, such as genome analysis
in bioinformatics. For example, libsais
contains the fastest implementation of a suffix array construction
algorithm to date. Comparison to libdivsufsort
| Benchmark crates.io
This crate is not yet battle-tested, there might be bugs. The API is still subject to small changes. Any kind of feedback and suggestions via the issue tracker is highly appreciated!
§Features
This crate exposes the whole functionality of libsais
.
It might be useful to also check out the documentation of the original library.
suffix_array
: Construct suffix arrays foru8
/u16
/i32
/i64
input texts andi32
/i64
output texts, with support for generalized suffix arrays.bwt
: Construct the Burrows-Wheeler-Transform (BWT) foru8
/u16
texts.unbwt
: Recover the original text from a BWT.plcp
: Construct the permuted longest common prefix array (PLCP) for a suffix array and text.lcp
: Construct the longest common prefix array from a PLCP for a suffix array and text.context
: Use a memory allocation optimization for repeated calls on small inputs.
§Usage
This crate provides generic builder-like APIs for all of the features listed above.
The primary entry points to this library are SuffixArrayConstruction
and BwtConstruction
. Obtaining
an LcpConstruction
, PlcpConstruction
or UnBwt
can only be done via an unsafe
constructor or by
using the returned types of the primary operations. Passing logically wrong input to these secondary
functions of libsais
can result in undefined behavior of the underlying C library.
For further details on the individual features, please refer to the module-level documentation. The API is a bit noisy due to lifetimes and typestate, so it is recommended to start with the module-level documentation and examples.
The following is a simple example of how to use this library to construct a suffix array in parallel:
use libsais::{SuffixArrayConstruction, ThreadCount};
let text = b"barnabasbabblesaboutbananas";
let suffix_array: Vec<i32> = SuffixArrayConstruction::for_text(text)
.in_owned_buffer()
.multi_threaded(ThreadCount::openmp_default())
.run()
.expect("The example on the front page should really work")
.into_vec();
§Examples
There are examples for multiple non-trivial use-cases of this library.
§Multithreading
This library supports the multithreaded implementations of libsais
via the optional openmp
feature,
which is enabled by default.
Modules§
- bwt
- Construct the Burrows-Wheeler-Transform (BWT) for a text using
BwtConstruction
. - context
- Use an optimization for repeated calls of
libsais
functions on small inputs. - lcp
- Construct the longest common prefix array (LCP) from a permuted LCP array (PLCP) for a suffix array.
- plcp
- Construct the permuted longest common prefix array (PLCP) for a suffix array and text.
- suffix_
array - Construct the (generalized) suffix array for a text using
SuffixArrayConstruction
. - typestate
- Typestate model for builder APIs, most likely not relevant to you.
- unbwt
- Recover the text from a Burrows-Wheeler-Transform.
Structs§
- BwtConstruction
- One of the two main entry points of this library, for constructing BWTs.
- LcpConstruction
- Construct the permuted longest common prefix array for a suffix array and PLCP.
- Plcp
Construction - Construct the permuted longest common prefix array for a suffix array and text.
- Suffix
Array Construction - One of the two main entry points of this library, for constructing suffix arrays.
- Thread
Count - A wrapper type for passing the desired number of threads. Only useful when the
openmp
feature is activated. - UnBwt
- Recover the text from a BWT
Enums§
- Libsais
Error - The error type of this crate, used for all functions that run a
libsais
algorithm.
Constants§
- LIBSAIS_
I32_ OUTPUT_ MAXIMUM_ SIZE - The maximum text size that this library can handle when using
i32
-based buffers - LIBSAIS_
I64_ OUTPUT_ MAXIMUM_ SIZE - The maximum text size that this library can handle when using
i64
-based buffers - LIBSAIS_
VERSION_ STRING - The version of the original C library
libsais
wrapped by this crate
Traits§
- Input
Element - Possible element types of input texts and output data structures storing text elements implement this trait. You cannot implement it and don’t need to.
- IsValid
Output For - Information about whether an
OutputElement
type can be used with a givenInputElement
type. - Large
Alphabet - When using
i32
/i64
-based texts, the API for suffix array construction changes slightly. Seesuffix_array
for details. - Output
Element - Possible element types of output data structures storing indices implement this trait. You cannot implement it and don’t need to.
- Small
Alphabet - Certain features such as the Burrows-Wheeler-Transform are only available for
u8
/u16
-based texts. - Supports
Plcp Output For - Information about whether an
OutputElement
type supports the construction of PLCP arrays.