Crate libsais

Crate libsais 

Source
Expand description

An idiomatic and mostly safe API wrapper for the awesome and very fast C library libsais by Ilya Grebnov.

libsais provides highly optimized implementation for the features listed below. These implementations are widely used in the analysis of very large sequences, such as genome analysis in bioinformatics. For example, libsais contains the fastest implementation of a suffix array construction algorithm to date. Comparison to libdivsufsort | Benchmark crates.io

This crate is not yet battle-tested, there might be bugs. The API is still subject to small changes. Any kind of feedback and suggestions via the issue tracker is highly appreciated!

§Features

This crate exposes the whole functionality of libsais. It might be useful to also check out the documentation of the original library.

  • suffix_array: Construct suffix arrays for u8/u16/i32/i64 input texts and i32/i64 output texts, with support for generalized suffix arrays.
  • bwt: Construct the Burrows-Wheeler-Transform (BWT) for u8/u16 texts.
  • unbwt: Recover the original text from a BWT.
  • plcp: Construct the permuted longest common prefix array (PLCP) for a suffix array and text.
  • lcp: Construct the longest common prefix array from a PLCP for a suffix array and text.
  • context: Use a memory allocation optimization for repeated calls on small inputs.

§Usage

This crate provides generic builder-like APIs for all of the features listed above.

The primary entry points to this library are SuffixArrayConstruction and BwtConstruction. Obtaining an LcpConstruction, PlcpConstruction or UnBwt can only be done via an unsafe constructor or by using the returned types of the primary operations. Passing logically wrong input to these secondary functions of libsais can result in undefined behavior of the underlying C library.

For further details on the individual features, please refer to the module-level documentation. The API is a bit noisy due to lifetimes and typestate, so it is recommended to start with the module-level documentation and examples.

The following is a simple example of how to use this library to construct a suffix array in parallel:

use libsais::{SuffixArrayConstruction, ThreadCount};

let text = b"barnabasbabblesaboutbananas";
let suffix_array: Vec<i32> = SuffixArrayConstruction::for_text(text)
    .in_owned_buffer()
    .multi_threaded(ThreadCount::openmp_default())
    .run()
    .expect("The example on the front page should really work")
    .into_vec();

§Examples

There are examples for multiple non-trivial use-cases of this library.

§Multithreading

This library supports the multithreaded implementations of libsais via the optional openmp feature, which is enabled by default.

Modules§

bwt
Construct the Burrows-Wheeler-Transform (BWT) for a text using BwtConstruction.
context
Use an optimization for repeated calls of libsais functions on small inputs.
lcp
Construct the longest common prefix array (LCP) from a permuted LCP array (PLCP) for a suffix array.
plcp
Construct the permuted longest common prefix array (PLCP) for a suffix array and text.
suffix_array
Construct the (generalized) suffix array for a text using SuffixArrayConstruction.
typestate
Typestate model for builder APIs, most likely not relevant to you.
unbwt
Recover the text from a Burrows-Wheeler-Transform.

Structs§

BwtConstruction
One of the two main entry points of this library, for constructing BWTs.
LcpConstruction
Construct the permuted longest common prefix array for a suffix array and PLCP.
PlcpConstruction
Construct the permuted longest common prefix array for a suffix array and text.
SuffixArrayConstruction
One of the two main entry points of this library, for constructing suffix arrays.
ThreadCount
A wrapper type for passing the desired number of threads. Only useful when the openmp feature is activated.
UnBwt
Recover the text from a BWT

Enums§

LibsaisError
The error type of this crate, used for all functions that run a libsais algorithm.

Constants§

LIBSAIS_I32_OUTPUT_MAXIMUM_SIZE
The maximum text size that this library can handle when using i32-based buffers
LIBSAIS_I64_OUTPUT_MAXIMUM_SIZE
The maximum text size that this library can handle when using i64-based buffers
LIBSAIS_VERSION_STRING
The version of the original C library libsais wrapped by this crate

Traits§

InputElement
Possible element types of input texts and output data structures storing text elements implement this trait. You cannot implement it and don’t need to.
IsValidOutputFor
Information about whether an OutputElement type can be used with a given InputElement type.
LargeAlphabet
When using i32/i64-based texts, the API for suffix array construction changes slightly. See suffix_array for details.
OutputElement
Possible element types of output data structures storing indices implement this trait. You cannot implement it and don’t need to.
SmallAlphabet
Certain features such as the Burrows-Wheeler-Transform are only available for u8/u16-based texts.
SupportsPlcpOutputFor
Information about whether an OutputElement type supports the construction of PLCP arrays.