simd-minimizers 2.2.0

# simd-minimizers

[![crates.io](https://img.shields.io/crates/v/simd-minimizers)](https://crates.io/crates/simd-minimizers)
[![docs](https://img.shields.io/docsrs/simd-minimizers)](https://docs.rs/simd-minimizers)

A SIMD-accelerated library to compute random minimizers.

It can compute all the minimizers of a human genome in 4 seconds using a single thread.
It also provides a *canonical* version that ensures that a sequence and its reverse-complement always select the same positions, which takes 6 seconds on a human genome.

This crate builds on [`packed_seq`](https://github.com/rust-seq/packed-seq) and
[`seq-hash`](https://github.com/rust-seq/seq-hash).
 
The underlying algorithm is described in the following
[**paper**](https://doi.org/10.4230/LIPIcs.SEA.2025.20): 

- SimdMinimizers: Computing random minimizers, fast.
  Ragnar Groot Koerkamp, Igor Martayan
  SEA 2025 [doi.org/10.4230/LIPIcs.SEA.2025.20](https://doi.org/10.4230/LIPIcs.SEA.2025.20)

## Requirements

This library supports AVX2 and NEON instruction sets.
Make sure to set `RUSTFLAGS="-C target-cpu=native"` when compiling to use the instruction sets available on your architecture:

``` sh
RUSTFLAGS="-C target-cpu=native" cargo build --release
```
Or set it in your project or system wide `.cargo/config.toml`:
```toml
rustflags = ["-C", "target-cpu=native"]
```

Enable the `-F scalar` feature flag to fall back to a scalar implementation with
reduced performance.

## Usage example

Full documentation can be found on [docs.rs](https://docs.rs/simd-minimizers).

```rust
use packed_seq::{PackedSeqVec, SeqVec};

let seq = b"ACGTGCTCAGAGACTCAGAGGA";
let packed_seq = PackedSeqVec::from_ascii(seq);

let k = 5;
let w = 7;
let hasher = <seq_hash::NtHasher>::new(k);

// Simple usage with default hasher, returning only positions.
let minimizer_positions = canonical_minimizer_positions(packed_seq.as_slice(), k, w);
assert_eq!(minimizer_positions, vec![0, 7, 9, 15]);

// Advanced usage with custom hasher, super-kmer positions, and minimizer values as well.
let mut minimizer_positions = Vec::new();
let mut super_kmers = Vec::new();
let minimizer_vals: Vec<u64> = canonical_minimizers(k, w)
    .hasher(&hasher)
    .super_kmers(&mut super_kmers)
    .run(packed_seq.as_slice(), &mut minimizer_positions)
    .values_u64()
    .collect();
```

## Benchmarks

Benchmarks can be found in the `bench` directory in the GitHub repository.

`bench/benches/bench.rs` contains benchmarks used in [this blogpost](https://curiouscoding.nl/posts/fast-minimizers/).

`bench/src/bin/paper.rs` contains benchmarks used in the paper.

Note that the benchmarks require some nightly features, you can install the latest nightly version with

```sh
rustup install nightly
```

To replicate results from the paper, go into `bench` and run
```sh
RUSTFLAGS="-C target-cpu=native" cargo +nightly run --release
python eval.py
```

The human genome we use is from the T2T consortium, and available by following
the first link [here](https://github.com/marbl/CHM13?tab=readme-ov-file#t2t-chm13v20-t2t-chm13y).