divsufsort-rs 0.5.0

Pure Rust port of libdivsufsort suffix array construction library
Documentation

divsufsort-rs

Crates.io Documentation build codecov

Pure Rust port of libdivsufsort — a fast suffix array construction library based on the induced-sorting (IS) algorithm.

What it does

Constructs the suffix array of a byte string in O(n log n) time and 5n + O(1) memory space. A suffix array is a sorted array of all suffixes of a string, and is a fundamental data structure for string search, data compression (BWT), and bioinformatics.

The implementation closely follows the original C library by Yuta Mori.

The B*-bucket sorting step is parallelised with rayon when the std feature is enabled (default).

no_std support

This crate supports no_std environments (with alloc). Disable the default std feature:

[dependencies]
divsufsort-rs = { version = "...", default-features = false }

When std is disabled, rayon parallelism is unavailable and B*-bucket sorting runs sequentially.

[!IMPORTANT] This crate uses unsafe Rust internally for performance. Specifically, raw pointer aliasing is used to allow the suffix array and its read-only PA view to share the same allocation (mirroring the original C code), and bounds checks are elided in hot inner loops where invariants can be proven statically. The public API is fully safe.

Usage

[dependencies]
divsufsort-rs = "0.4"
use divsufsort_rs::divsufsort;

fn main() {
    let text = b"banana";
    let mut sa = vec![0i32; text.len()];
    divsufsort(text, &mut sa).unwrap();
    // sa == [5, 3, 1, 0, 4, 2]  (indices of sorted suffixes)
    println!("{:?}", sa);
}

BWT construction is also available:

use divsufsort_rs::divbwt;

let text = b"banana";
let mut bwt = vec![0u8; text.len()];
let primary_index = divbwt(text, &mut bwt, None).unwrap();
// bwt == b"nnbaaa", primary_index == 3

Benchmark

Benchmarks run with criterion (sample_size = 10).

Environment

Item Value
CPU Apple M4 Max (16 logical cores)
OS macOS 15.5
Rust 1.92.0

Corpora

Name Description
random_binary LCG pseudo-random bytes (alphabet size 256)
text_26 LCG pseudo-random lowercase ASCII (alphabet size 26)
fibonacci Fibonacci string over {a, b} (highly repetitive)

Results — Rust vs C libdivsufsort (1,000,000 bytes)

Compared against the original C libdivsufsort compiled at -O3.

Corpus Rust (this crate) C libdivsufsort Ratio
random_binary 11.2 ms (84.9 MiB/s) 13.7 ms (69.4 MiB/s) 1.22× faster
text_26 13.2 ms (72.4 MiB/s) 23.8 ms (40.1 MiB/s) 1.80× faster
fibonacci 30.1 ms (31.7 MiB/s) 27.4 ms (34.8 MiB/s) 0.91×

The parallel B*-bucket sort drives the speedup for random_binary and text_26. For fibonacci the input produces only 1–2 non-trivial buckets, so parallelism provides no benefit and C is slightly faster due to lower single-thread overhead.

Results — std (parallel) vs no_std (serial)

Shows the effect of rayon parallelism. std is the default; no_std disables rayon and runs single-threaded.

Corpus Size std (rayon) no_std (serial) Ratio
random_binary 100K 1.29 ms 1.44 ms 1.12×
random_binary 1M 11.8 ms 17.8 ms 1.51×
text_26 100K 1.29 ms 2.07 ms 1.60×
text_26 1M 14.0 ms 28.3 ms 2.02×
fibonacci 100K 2.53 ms 2.47 ms 0.98×
fibonacci 1M 31.5 ms 31.0 ms 0.98×

For corpora with many distinct B*-buckets (random_binary, text_26), rayon parallelism provides 1.5–2× speedup at 1M scale. Highly repetitive inputs (fibonacci) show no difference as they produce too few buckets to benefit from parallelism.

Running the benchmarks

# Rust vs C comparison (bench_compare)
# requires the vendored C submodule — initialize it first:
git submodule update --init
cargo bench --bench bench_compare --features c-bench

# Lightweight benchmark — 3 corpora × 2 sizes, completes in ~2–3 minutes (bench_light)
cargo bench --bench bench_light

# Same benchmark in no_std (serial) mode
cargo bench --bench bench_light --no-default-features

# Full benchmark — larger sizes and more corpora, takes significantly longer (bench)
cargo bench --bench bench