simd_minimizers/
lib.rs

1//! A library to quickly compute (canonical) minimizers of DNA and text sequences.
2//!
3//! The main functions are:
4//! - [`minimizer_positions`]: compute the positions of all minimizers of a sequence.
5//! - [`canonical_minimizer_positions`]: compute the positions of all _canonical_ minimizers of a sequence.
6//!
7//! Adjacent equal positions are deduplicated, but since the canonical minimizer is _not_ _forward_, a position could appear more than once.
8//!
9//! The implementation uses SIMD by splitting each sequence into 8 chunks and processing those in parallel.
10//!
11//! When using super-k-mers, use the `_and_superkmer` variants to additionally return a vector containing the index of the first window the minimizer is minimal.
12//!
13//! The minimizer of a single window can be found using [`one_minimizer`] and [`one_canonical_minimizer`], but note that these functions are not nearly as efficient.
14//!
15//! The `scalar` versions are mostly for testing only, and basically always slower.
16//!
17//! ## Minimizers
18//!
19//! The code is explained in detail in our [paper](https://doi.org/10.4230/LIPIcs.SEA.2025.20):
20//!
21//! > SimdMinimizers: Computing random minimizers, fast.
22//! > Ragnar Groot Koerkamp, Igor Martayan, SEA 2025
23//!
24//! Briefly, minimizers are defined using two parameters `k` and `w`.
25//! Given a sequence of characters, all k-mers (substrings of length `k`) are hashed,
26//! and for each _window_ of `k` consecutive k-mers (of length `l = w + k - 1` characters),
27//! (the position of) the smallest k-mer is sampled.
28//!
29//! Minimizers are found as follows:
30//! 1. Split the input to 8 chunks that are processed in parallel using SIMD.
31//! 2. Compute a 32-bit ntHash rolling hash of the k-mers.
32//! 3. Use the 'two stacks' sliding window minimum on the top 16 bits of each hash.
33//! 4. Break ties towards the leftmost position by storing the position in the bottom 16 bits.
34//! 5. Compute 8 consecutive minimizer positions, and dedup them.
35//! 6. Collect the deduplicated minimizer positions from all 8 chunks into a single vector.
36//!
37//! ## Canonical minimizers
38//!
39//! _Canonical_ minimizers have the property that the sampled k-mers of a DNA sequence are the same as those sampled from the _reverse complement_ sequence.
40//!
41//! This works as follows:
42//! 1. ntHash is modified to use the canonical version that computes the xor of the hash of the forward and reverse complement k-mer.
43//! 2. Compute the leftmost and rightmost minimal k-mer.
44//! 3. Compute the 'preferred' strand of the current window as the one with more `TG` characters. This requires `l=w+k-1` to be odd for proper tie-breaking.
45//! 4. Return either the leftmost or rightmost smallest k-mer, depending on the preferred strand.
46//!
47//! ## Syncmers
48//!
49//! _Syncmers_ are (in our notation) windows of length `l = w + k - 1` characters where the minimizer k-mer is a prefix or suffix.
50//! (Or, in classical notation, `k`-mers with the smallest `s`-mer as prefix or suffix.)
51//! These can be computed by using [`fn@syncmers`] or [`canonical_syncmers`] instead of [`minimizers`] or [`canonical_minimizers`].
52//!
53//! Note that canonical syncmers are chosen as the minimum of the forward and reverse-complement k-mer representation.
54//!
55//! ## Input types
56//!
57//! This crate depends on [`packed_seq`] to handle generic types of input sequences.
58//! Most commonly, one should use [`packed_seq::PackedSeqVec`] for packed DNA sequences, but one can also simply wrap a sequence of `ACTGactg` characters in [`packed_seq::AsciiSeqVec`].
59//! Additionally, `simd_minimizers` works on general (ASCII) `&[u8]` text.
60//!
61//! The main function provided by [`packed_seq`] is [`packed_seq::Seq::iter_bp`], which splits the input into 8 chunks and iterates them in parallel using SIMD.
62//!
63//! When dealing with ASCII input, use the `AsciiSeq` and `AsciiSeqVec` types.
64//!
65//! ## Hash function
66//!
67//! By default, the library uses the `ntHash` hash function, which maps each DNA base `ACTG` to a pseudo-random value using a table lookup.
68//! This hash function is specifically designed to be fast for hashing DNA sequences with input type [`packed_seq::PackedSeq`] and [`packed_seq::AsciiSeq`].
69//!
70//! For general ASCII sequences (`&[u8]`), `mulHash` is used instead, which instead multiplies each character value by a pseudo-random constant.
71//! The `mul_hash` module provides functions that _always_ use mulHash, also for DNA sequences.
72//!
73//! ## Performance
74//!
75//! This library depends on AVX2 or NEON SIMD instructions to achieve good performance.
76//! Make sure to compile with `-C target-cpu=native` to enable these instructions.
77//! See the [ensure_simd](https://github.com/ragnargrootkoerkamp/ensure_simd) crate for more details.
78//!
79//! All functions take a `out_vec: &mut Vec<u32>` parameter to which positions are _appended_.
80//! For best performance, re-use the same `out_vec` between invocations, and [`Vec::clear`] it before or after each call.
81//!
82//! ## Examples
83//!
84//! #### Scalar `AsciiSeq`
85//!
86//! ```
87//! // Scalar ASCII version.
88//! use packed_seq::{SeqVec, AsciiSeq};
89//!
90//! let seq = b"ACGTGCTCAGAGACTCAG";
91//! let ascii_seq = AsciiSeq(seq);
92//!
93//! let k = 5;
94//! let w = 7;
95//!
96//! let positions = simd_minimizers::minimizer_positions(ascii_seq, k, w);
97//! assert_eq!(positions, vec![4, 5, 8, 13]);
98//! ```
99//!
100//! #### SIMD `PackedSeq`
101//!
102//! ```
103//! // Packed SIMD version.
104//! use packed_seq::{PackedSeqVec, SeqVec, Seq};
105//!
106//! let seq = b"ACGTGCTCAGAGACTCAGAGGA";
107//! let packed_seq = PackedSeqVec::from_ascii(seq);
108//!
109//! let k = 5;
110//! let w = 7;
111//!
112//! // Unfortunately, `PackedSeqVec` can not `Deref` into a `PackedSeq`, so `as_slice` is needed.
113//! // Since we also need the values, this uses the Builder API.
114//! let mut fwd_pos = vec![];
115//! let fwd_vals: Vec<_> = simd_minimizers::canonical_minimizers(k, w).run(packed_seq.as_slice(), &mut fwd_pos).values_u64().collect();
116//! assert_eq!(fwd_pos, vec![0, 7, 9, 15]);
117//! assert_eq!(fwd_vals, vec![
118//!     // T  G  C  A  C, CACGT is rc of ACGTG at pos 0
119//!     0b10_11_01_00_01,
120//!     // G  A  G  A  C, CAGAG is at pos 7
121//!     0b11_00_11_00_01,
122//!     // C  A  G  A  G, GAGAC is at pos 9
123//!     0b01_00_11_00_11,
124//!     // G  A  G  A  C, CAGAG is at pos 15
125//!     0b11_00_11_00_01
126//! ]);
127//!
128//! // Check that reverse complement sequence has minimizers at 'reverse' positions.
129//! let rc_packed_seq = packed_seq.as_slice().to_revcomp();
130//! let mut rc_pos = Vec::new();
131//! let mut rc_vals: Vec<_> = simd_minimizers::canonical_minimizers(k, w).run(rc_packed_seq.as_slice(), &mut rc_pos).values_u64().collect();
132//! assert_eq!(rc_pos, vec![2, 8, 10, 17]);
133//! for (fwd, &rc) in std::iter::zip(fwd_pos, rc_pos.iter().rev()) {
134//!     assert_eq!(fwd as usize, seq.len() - k - rc as usize);
135//! }
136//! rc_vals.reverse();
137//! assert_eq!(rc_vals, fwd_vals);
138//! ```
139//!
140//! #### Seeded hasher
141//!
142//! ```
143//! // Packed SIMD version with seeded hashes.
144//! use packed_seq::{PackedSeqVec, SeqVec};
145//!
146//! let seq = b"ACGTGCTCAGAGACTCAG";
147//! let packed_seq = PackedSeqVec::from_ascii(seq);
148//!
149//! let k = 5;
150//! let w = 7;
151//! let seed = 101010;
152//! // Canonical by default. Use `NtHasher<false>` for forward-only.
153//! let hasher = <seq_hash::NtHasher>::new_with_seed(k, seed);
154//!
155//! let fwd_pos = simd_minimizers::canonical_minimizers(k, w).hasher(&hasher).run_once(packed_seq.as_slice());
156//! ```
157
158#![allow(clippy::missing_transmute_annotations)]
159
160mod canonical;
161pub mod collect;
162mod minimizers;
163mod sliding_min;
164pub mod syncmers;
165mod intrinsics {
166    mod dedup;
167    pub use dedup::{append_filtered_vals, append_unique_vals, append_unique_vals_2};
168}
169
170#[cfg(test)]
171mod test;
172
173/// Re-exported internals. Used for benchmarking, and not part of the semver-compatible stable API.
174pub mod private {
175    pub mod canonical {
176        pub use crate::canonical::*;
177    }
178    pub mod minimizers {
179        pub use crate::minimizers::*;
180    }
181    pub mod sliding_min {
182        pub use crate::sliding_min::*;
183    }
184    pub use packed_seq::u32x8 as S;
185}
186
187use collect::CollectAndDedup;
188use collect::collect_and_dedup_into_scalar;
189use collect::collect_and_dedup_with_index_into_scalar;
190use minimizers::canonical_minimizers_skip_ambiguous_windows;
191/// Re-export of the `packed-seq` crate.
192pub use packed_seq;
193use packed_seq::PackedNSeq;
194use packed_seq::PackedSeq;
195/// Re-export of the `seq-hash` crate.
196pub use seq_hash;
197
198use minimizers::{
199    canonical_minimizers_seq_scalar, canonical_minimizers_seq_simd, minimizers_seq_scalar,
200    minimizers_seq_simd,
201};
202use packed_seq::Seq;
203use packed_seq::u32x8 as S;
204use seq_hash::KmerHasher;
205
206pub use minimizers::one_minimizer;
207use seq_hash::NtHasher;
208pub use sliding_min::Cache;
209use syncmers::CollectSyncmers;
210use syncmers::collect_syncmers_scalar;
211
212thread_local! {
213    static CACHE: std::cell::RefCell<(Cache, Vec<S>, Vec<S>)> = std::cell::RefCell::new(Default::default());
214}
215
216/// `CANONICAL`: true for canonical minimizers.
217/// `H`: the kmer hasher to use.
218/// `SkPos`: type of super-k-mer position storage. Use `()` to disable super-k-mers.
219/// `SYNCMER`: 0 for minimizers, 1 for closed syncmers, 2 for open syncmers.
220pub struct Builder<'h, const CANONICAL: bool, H: KmerHasher, SkPos, const SYNCMER: u8> {
221    k: usize,
222    w: usize,
223    hasher: Option<&'h H>,
224    sk_pos: SkPos,
225}
226
227pub struct Output<'o, const CANONICAL: bool, S> {
228    /// k for minimizers, k+w-1 for syncmers
229    len: usize,
230    seq: S,
231    min_pos: &'o Vec<u32>,
232}
233
234#[must_use]
235pub const fn minimizers(k: usize, w: usize) -> Builder<'static, false, NtHasher<false>, (), 0> {
236    Builder {
237        k,
238        w,
239        hasher: None,
240        sk_pos: (),
241    }
242}
243
244#[must_use]
245pub const fn canonical_minimizers(
246    k: usize,
247    w: usize,
248) -> Builder<'static, true, NtHasher<true>, (), 0> {
249    Builder {
250        k,
251        w,
252        hasher: None,
253        sk_pos: (),
254    }
255}
256
257/// Return positions/values of *closed* syncmers of length `k+w-1`.
258///
259/// These are windows with the minimizer at the start or end of the window.
260///
261/// `k` here corresponds to `s` in original syncmer notation: the minimizer length.
262/// `k+w-1` corresponds to `k` in original syncmer notation: the length of the extracted string.
263#[must_use]
264pub const fn closed_syncmers(
265    k: usize,
266    w: usize,
267) -> Builder<'static, false, NtHasher<false>, (), 1> {
268    Builder {
269        k,
270        w,
271        hasher: None,
272        sk_pos: (),
273    }
274}
275
276#[must_use]
277pub const fn canonical_closed_syncmers(
278    k: usize,
279    w: usize,
280) -> Builder<'static, true, NtHasher<true>, (), 1> {
281    Builder {
282        k,
283        w,
284        hasher: None,
285        sk_pos: (),
286    }
287}
288
289/// Return positions/values of *open* syncmers of length `k+w-1`.
290///
291/// These are windows with the minimizer in the middle of the window. This requires `w` to be odd.
292///
293/// `k` here corresponds to `s` in original syncmer notation: the minimizer length.
294/// `k+w-1` corresponds to `k` in original syncmer notation: the length of the extracted string.
295#[must_use]
296pub const fn open_syncmers(k: usize, w: usize) -> Builder<'static, false, NtHasher<false>, (), 2> {
297    Builder {
298        k,
299        w,
300        hasher: None,
301        sk_pos: (),
302    }
303}
304
305#[must_use]
306pub const fn canonical_open_syncmers(
307    k: usize,
308    w: usize,
309) -> Builder<'static, true, NtHasher<true>, (), 2> {
310    Builder {
311        k,
312        w,
313        hasher: None,
314        sk_pos: (),
315    }
316}
317
318impl<const CANONICAL: bool, const SYNCMERS: u8>
319    Builder<'static, CANONICAL, NtHasher<CANONICAL>, (), SYNCMERS>
320{
321    #[must_use]
322    pub const fn hasher<'h, H2: KmerHasher>(
323        &self,
324        hasher: &'h H2,
325    ) -> Builder<'h, CANONICAL, H2, (), SYNCMERS> {
326        Builder {
327            k: self.k,
328            w: self.w,
329            sk_pos: (),
330            hasher: Some(hasher),
331        }
332    }
333}
334impl<'h, const CANONICAL: bool, H: KmerHasher> Builder<'h, CANONICAL, H, (), 0> {
335    #[must_use]
336    pub const fn super_kmers<'o2>(
337        &self,
338        sk_pos: &'o2 mut Vec<u32>,
339    ) -> Builder<'h, CANONICAL, H, &'o2 mut Vec<u32>, 0> {
340        Builder {
341            k: self.k,
342            w: self.w,
343            hasher: self.hasher,
344            sk_pos,
345        }
346    }
347}
348
349/// Without-superkmer version
350impl<'h, const CANONICAL: bool, H: KmerHasher, const SYNCMERS: u8>
351    Builder<'h, CANONICAL, H, (), SYNCMERS>
352{
353    pub fn run_scalar_once<'s, SEQ: Seq<'s>>(&self, seq: SEQ) -> Vec<u32> {
354        let mut min_pos = vec![];
355        self.run_impl::<false, _>(seq, &mut min_pos);
356        min_pos
357    }
358
359    pub fn run_once<'s, SEQ: Seq<'s>>(&self, seq: SEQ) -> Vec<u32> {
360        let mut min_pos = vec![];
361        self.run_impl::<true, _>(seq, &mut min_pos);
362        min_pos
363    }
364
365    pub fn run_scalar<'s, 'o, SEQ: Seq<'s>>(
366        &self,
367        seq: SEQ,
368        min_pos: &'o mut Vec<u32>,
369    ) -> Output<'o, CANONICAL, SEQ> {
370        self.run_impl::<false, _>(seq, min_pos)
371    }
372
373    pub fn run<'s, 'o, SEQ: Seq<'s>>(
374        &self,
375        seq: SEQ,
376        min_pos: &'o mut Vec<u32>,
377    ) -> Output<'o, CANONICAL, SEQ> {
378        self.run_impl::<true, _>(seq, min_pos)
379    }
380
381    fn run_impl<'s, 'o, const SIMD: bool, SEQ: Seq<'s>>(
382        &self,
383        seq: SEQ,
384        min_pos: &'o mut Vec<u32>,
385    ) -> Output<'o, CANONICAL, SEQ> {
386        let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
387        let hasher = self
388            .hasher
389            .unwrap_or_else(|| default_hasher.as_ref().unwrap());
390
391        CACHE.with_borrow_mut(|cache| match (SIMD, CANONICAL, SYNCMERS) {
392            (false, false, 0) => collect_and_dedup_into_scalar(
393                minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
394                min_pos,
395            ),
396            (false, false, 1) => collect_syncmers_scalar::<false>(
397                self.w,
398                minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
399                min_pos,
400            ),
401            (false, false, 2) => collect_syncmers_scalar::<true>(
402                self.w,
403                minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
404                min_pos,
405            ),
406            (false, true, 0) => collect_and_dedup_into_scalar(
407                canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
408                min_pos,
409            ),
410            (false, true, 1) => collect_syncmers_scalar::<false>(
411                self.w,
412                canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
413                min_pos,
414            ),
415            (false, true, 2) => collect_syncmers_scalar::<true>(
416                self.w,
417                canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
418                min_pos,
419            ),
420            (true, false, 0) => minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
421                .collect_and_dedup_into::<false>(min_pos),
422            (true, false, 1) => minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
423                .collect_syncmers_into::<false>(self.w, min_pos),
424            (true, false, 2) => minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
425                .collect_syncmers_into::<true>(self.w, min_pos),
426            (true, true, 0) => canonical_minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
427                .collect_and_dedup_into::<false>(min_pos),
428            (true, true, 1) => canonical_minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
429                .collect_syncmers_into::<false>(self.w, min_pos),
430            (true, true, 2) => canonical_minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
431                .collect_syncmers_into::<true>(self.w, min_pos),
432            _ => unreachable!("SYNCMERS generic must be 0 (no syncmers), 1 (closed syncmers), or 2 (open syncmers)."),
433        });
434        Output {
435            len: if SYNCMERS != 0 {
436                self.k + self.w - 1
437            } else {
438                self.k
439            },
440            seq,
441            min_pos,
442        }
443    }
444}
445
446impl<'h, H: KmerHasher, const SYNCMERS: u8> Builder<'h, true, H, (), SYNCMERS> {
447    pub fn run_skip_ambiguous_windows_once<'s>(&self, nseq: PackedNSeq<'s>) -> Vec<u32> {
448        let mut min_pos = vec![];
449        self.run_skip_ambiguous_windows(nseq, &mut min_pos);
450        min_pos
451    }
452    pub fn run_skip_ambiguous_windows<'s, 'o>(
453        &self,
454        nseq: PackedNSeq<'s>,
455        min_pos: &'o mut Vec<u32>,
456    ) -> Output<'o, true, PackedSeq<'s>> {
457        CACHE
458            .with_borrow_mut(|cache| self.run_skip_ambiguous_windows_with_buf(nseq, min_pos, cache))
459    }
460    pub fn run_skip_ambiguous_windows_with_buf<'s, 'o>(
461        &self,
462        nseq: PackedNSeq<'s>,
463        min_pos: &'o mut Vec<u32>,
464        cache: &mut (Cache, Vec<S>, Vec<S>),
465    ) -> Output<'o, true, PackedSeq<'s>> {
466        let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
467        let hasher = self
468            .hasher
469            .unwrap_or_else(|| default_hasher.as_ref().unwrap());
470        match SYNCMERS {
471            0 => canonical_minimizers_skip_ambiguous_windows(nseq, hasher, self.w, cache)
472                .collect_and_dedup_into::<true>(min_pos),
473            1 => canonical_minimizers_skip_ambiguous_windows(nseq, hasher, self.w, cache)
474                .collect_syncmers_into::<false>(self.w, min_pos),
475            2 => canonical_minimizers_skip_ambiguous_windows(nseq, hasher, self.w, cache)
476                .collect_syncmers_into::<true>(self.w, min_pos),
477            _ => panic!(
478                "SYNCMERS generic must be 0 (no syncmers), 1 (closed syncmers), or 2 (open syncmers)."
479            ),
480        }
481        Output {
482            len: if SYNCMERS != 0 {
483                self.k + self.w - 1
484            } else {
485                self.k
486            },
487            seq: nseq.seq,
488            min_pos,
489        }
490    }
491}
492
493/// With-superkmer version
494///
495/// (does not work in combination with syncmers)
496impl<'h, 'o2, const CANONICAL: bool, H: KmerHasher>
497    Builder<'h, CANONICAL, H, &'o2 mut Vec<u32>, 0>
498{
499    pub fn run_scalar_once<'s, SEQ: Seq<'s>>(self, seq: SEQ) -> Vec<u32> {
500        let mut min_pos = vec![];
501        self.run_scalar(seq, &mut min_pos);
502        min_pos
503    }
504
505    pub fn run_scalar<'s, 'o, SEQ: Seq<'s>>(
506        self,
507        seq: SEQ,
508        min_pos: &'o mut Vec<u32>,
509    ) -> Output<'o, CANONICAL, SEQ> {
510        let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
511        let hasher = self
512            .hasher
513            .unwrap_or_else(|| default_hasher.as_ref().unwrap());
514
515        CACHE.with_borrow_mut(|cache| match CANONICAL {
516            false => collect_and_dedup_with_index_into_scalar(
517                minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
518                min_pos,
519                self.sk_pos,
520            ),
521            true => collect_and_dedup_with_index_into_scalar(
522                canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
523                min_pos,
524                self.sk_pos,
525            ),
526        });
527        Output {
528            len: self.k,
529            seq,
530            min_pos,
531        }
532    }
533
534    pub fn run_once<'s, SEQ: Seq<'s>>(self, seq: SEQ) -> Vec<u32> {
535        let mut min_pos = vec![];
536        self.run(seq, &mut min_pos);
537        min_pos
538    }
539
540    pub fn run<'s, 'o, SEQ: Seq<'s>>(
541        self,
542        seq: SEQ,
543        min_pos: &'o mut Vec<u32>,
544    ) -> Output<'o, CANONICAL, SEQ> {
545        CACHE.with_borrow_mut(|cache| self.run_with_buf(seq, min_pos, &mut cache.0))
546    }
547
548    #[inline(always)]
549    fn run_with_buf<'s, 'o, SEQ: Seq<'s>>(
550        self,
551        seq: SEQ,
552        min_pos: &'o mut Vec<u32>,
553        cache: &mut Cache,
554    ) -> Output<'o, CANONICAL, SEQ> {
555        let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
556        let hasher = self
557            .hasher
558            .unwrap_or_else(|| default_hasher.as_ref().unwrap());
559
560        match CANONICAL {
561            false => minimizers_seq_simd(seq, hasher, self.w, cache)
562                .collect_and_dedup_with_index_into(min_pos, self.sk_pos),
563            true => canonical_minimizers_seq_simd(seq, hasher, self.w, cache)
564                .collect_and_dedup_with_index_into(min_pos, self.sk_pos),
565        };
566        Output {
567            len: self.k,
568            seq,
569            min_pos,
570        }
571    }
572}
573
574impl<'s, 'o, const CANONICAL: bool, SEQ: Seq<'s>> Output<'o, CANONICAL, SEQ> {
575    /// Iterator over (canonical) u64 kmer-values associated with all minimizer positions.
576    #[must_use]
577    pub fn values_u64(&self) -> impl ExactSizeIterator<Item = u64> {
578        self.pos_and_values_u64().map(|(_pos, val)| val)
579    }
580    /// Iterator over (canonical) u128 kmer-values associated with all minimizer positions.
581    #[must_use]
582    pub fn values_u128(&self) -> impl ExactSizeIterator<Item = u128> {
583        self.pos_and_values_u128().map(|(_pos, val)| val)
584    }
585    /// Iterator over positions and (canonical) u64 kmer-values associated with all minimizer positions.
586    #[must_use]
587    pub fn pos_and_values_u64(&self) -> impl ExactSizeIterator<Item = (u32, u64)> {
588        self.min_pos.iter().map(
589            #[inline(always)]
590            move |&pos| {
591                let val = if CANONICAL {
592                    let a = self.seq.read_kmer(self.len, pos as usize);
593                    let b = self.seq.read_revcomp_kmer(self.len, pos as usize);
594                    core::cmp::min(a, b)
595                } else {
596                    self.seq.read_kmer(self.len, pos as usize)
597                };
598                (pos, val)
599            },
600        )
601    }
602    /// Iterator over positions and (canonical) u128 kmer-values associated with all minimizer positions.
603    #[must_use]
604    pub fn pos_and_values_u128(&self) -> impl ExactSizeIterator<Item = (u32, u128)> {
605        self.min_pos.iter().map(
606            #[inline(always)]
607            move |&pos| {
608                let val = if CANONICAL {
609                    let a = self.seq.read_kmer_u128(self.len, pos as usize);
610                    let b = self.seq.read_revcomp_kmer_u128(self.len, pos as usize);
611                    core::cmp::min(a, b)
612                } else {
613                    self.seq.read_kmer_u128(self.len, pos as usize)
614                };
615                (pos, val)
616            },
617        )
618    }
619}
620
621/// Positions of all minimizers in the sequence.
622///
623/// See [`minimizers`], [`canonical_minimizers`], and [`Builder`] for more
624/// configurations supporting a custom hasher, super-kmer positions, and
625/// returning kmer-values.
626///
627/// Positions are appended to a reusable `min_pos` vector to avoid allocations.
628pub fn minimizer_positions<'s>(seq: impl Seq<'s>, k: usize, w: usize) -> Vec<u32> {
629    minimizers(k, w).run_once(seq)
630}
631
632/// Positions of all canonical minimizers in the sequence.
633///
634/// See [`minimizers`], [`canonical_minimizers`], and [`Builder`] for more
635/// configurations supporting a custom hasher, super-kmer positions, and
636/// returning kmer-values.
637///
638/// `l=w+k-1` must be odd to determine the strand of each window.
639///
640/// Positions are appended to a reusable `min_pos` vector to avoid allocations.
641pub fn canonical_minimizer_positions<'s>(seq: impl Seq<'s>, k: usize, w: usize) -> Vec<u32> {
642    canonical_minimizers(k, w).run_once(seq)
643}