libsais-rs
libsais-rs is a Rust translation of IlyaGrebnov/libsais for suffix array construction and related transforms.
This crate currently tracks upstream libsais version 2.10.4.
- 2026-04-24: Appears to be a functional translation on par with speed. More testing needed, compare with original version before you consider swapping it out
This is an LLM-mediated faithful (hopefully) translation, not the original code!
Most users should probably first see if the existing original code works for them, unless they have reason otherwise. The original source may have newer features and it has had more love in terms of fixing bugs. In fact, we aim to replicate bugs if they are present, for the sake of reproducibility! (but then we might have added a few more in the process)
There are however cases when you might prefer this Rust version. We generally agree with this manifesto but more specifically:
- We have had many issues with ensuring that our software works using existing containers (Docker, PodMan, Singularity). One size does not fit all and it eats our resources trying to keep up with every way of delivering software
- Common package managers do not work well. It was great when we had a few Linux distributions with stable procedures, but now there are just too many ecosystems (Homebrew, Conda). Conda has an NP-complete resolver which does not scale. Homebrew is only so-stable. And our dependencies in Python still break. These can no longer be considered professional serious options. Meanwhile, Cargo enables multiple versions of packages to be available, even within the same program(!)
- The future is the web. We deploy software in the web browser, and until now that has meant Javascript. This is a language where even the == operator is broken. Typescript is one step up, but a game changer is the ability to compile Rust code into webassembly, enabling performance and sharing of code with the backend. Translating code to Rust enables new ways of deployment and running code in the browser has especial benefits for science - researchers do not have deep pockets to run servers, so pushing compute to the user enables deployment that otherwise would be impossible
- Old CLI-based utilities are bad for the environment(!). A large amount of compute resources are spent creating and communicating via small files, which we can bypass by using code as libraries. Even better, we can avoid frequent reloading of databases by hoisting this stage, with up to 100x speedups in some cases. Less compute means faster compute and less electricity wasted
- LLM-mediated translations may actually be safer to use than the original code. This article shows that running the same code on different operating systems can give somewhat different answers. This is a gap that Rust+Cargo can reduce. Typesafe interfaces also reduce coding mistakes and error handling, as opposed to typical command-line scripting
But:
- This approach should still be considered experimental. The LLM technology is immature and has sharp corners. But there are opportunities to reap, and the genie is not going back into the bottle. This translation is as much aimed to learn how to improve the technology and get feedback on the results.
- Translations are not endorsed by the original authors unless otherwise noted. Do not send bug reports to the original developers. Use our Github issues page instead.
- Do not trust the benchmarks on this page. They are used to help evaluate the translation. If you want improved performance, you generally have to use this code as a library, and use the additional tricks it offers. We generally accept performance losses in order to reduce our dependency issues
- Check the original Github pages for information about the package. This README is kept sparse on purpose. It is not meant to be the primary source of information
- If you are the author of the original code and wish to move to Rust, you can obtain ownership of this repository and crate. Until then, our commitment is to offer an as-faithful-as-possible translation of a snapshot of your code. If we find serious bugs, we will report them to you. Otherwise we will just replicate them, to ensure comparability across studies that claim to use package XYZ v.666. Think of this like a fancy Ubuntu .deb-package of your software - that is how we treat it
This blurb might be out of date. Go to this page for the latest information and further information about how we approach translation
Usage
[]
= "0.1.1"
use ;
Notes:
sa.len()must be at leasttext.len() + fsfsis extra scratch space made available at the tail ofsafreq, when used, must have length at least256
Development
Run tests with:
Run the local Rust-vs-C benchmark example with:
Performance
The repository includes examples/bench_vs_c.rs, which compares the current Rust translation against the vendored upstream C implementation in a single-threaded suffix-array-construction configuration.
Latest local snapshot:
README.md len= 6797 iter=200 rust= 0.284 ms c= 0.295 ms ratio= 0.96x
libsais/src/libsais.c len= 388397 iter= 40 rust= 11.770 ms c= 11.213 ms ratio= 1.05x
generated/mixed-1MiB len= 1048576 iter= 10 rust= 40.196 ms c= 38.582 ms ratio= 1.04x
Command used:
These numbers are a local snapshot, not a stability guarantee. Repeated runs can move by a few percent depending on machine, compiler, and system load.
Upstream Sources
The repository vendors the upstream C sources under libsais/ for reference and parity testing.
Upstream project:
License
Apache License 2.0.