# simd-minimizers
[](https://crates.io/crates/simd-minimizers)
[](https://docs.rs/simd-minimizers)
A SIMD-accelerated library to compute random minimizers.
It can compute all the minimizers of a human genome in 4 seconds using a single thread.
It also provides a *canonical* version that ensures that a sequence and its reverse-complement always select the same positions, which takes 6 seconds on a human genome.
This crate builds on [`packed_seq`](https://github.com/rust-seq/packed-seq) and
[`seq-hash`](https://github.com/rust-seq/seq-hash).
The underlying algorithm is described in the following
[**paper**](https://doi.org/10.4230/LIPIcs.SEA.2025.20):
- SimdMinimizers: Computing random minimizers, fast.
Ragnar Groot Koerkamp, Igor Martayan
SEA 2025 [doi.org/10.4230/LIPIcs.SEA.2025.20](https://doi.org/10.4230/LIPIcs.SEA.2025.20)
## Requirements
This library supports AVX2 and NEON instruction sets.
Make sure to set `RUSTFLAGS="-C target-cpu=native"` when compiling to use the instruction sets available on your architecture:
``` sh
RUSTFLAGS="-C target-cpu=native" cargo build --release
```
Or set it in your project or system wide `.cargo/config.toml`:
```toml
rustflags = ["-C", "target-cpu=native"]
```
Enable the `-F scalar` feature flag to fall back to a scalar implementation with
reduced performance.
## Usage example
Full documentation can be found on [docs.rs](https://docs.rs/simd-minimizers).
```rust
use packed_seq::{PackedSeqVec, SeqVec};
let seq = b"ACGTGCTCAGAGACTCAGAGGA";
let packed_seq = PackedSeqVec::from_ascii(seq);
let k = 5;
let w = 7;
let hasher = <seq_hash::NtHasher>::new(k);
// Simple usage with default hasher, returning only positions.
let minimizer_positions = canonical_minimizer_positions(packed_seq.as_slice(), k, w);
assert_eq!(minimizer_positions, vec![0, 7, 9, 15]);
// Advanced usage with custom hasher, super-kmer positions, and minimizer values as well.
let mut minimizer_positions = Vec::new();
let mut super_kmers = Vec::new();
let minimizer_vals: Vec<u64> = canonical_minimizers(k, w)
.hasher(&hasher)
.super_kmers(&mut super_kmers)
.run(packed_seq.as_slice(), &mut minimizer_positions)
.values_u64()
.collect();
```
## Benchmarks
Benchmarks can be found in the `bench` directory in the GitHub repository.
`bench/benches/bench.rs` contains benchmarks used in [this blogpost](https://curiouscoding.nl/posts/fast-minimizers/).
`bench/src/bin/paper.rs` contains benchmarks used in the paper.
Note that the benchmarks require some nightly features, you can install the latest nightly version with
```sh
rustup install nightly
```
To replicate results from the paper, go into `bench` and run
```sh
RUSTFLAGS="-C target-cpu=native" cargo +nightly run --release
python eval.py
```
The human genome we use is from the T2T consortium, and available by following
the first link [here](https://github.com/marbl/CHM13?tab=readme-ov-file#t2t-chm13v20-t2t-chm13y).