simd_minimizers/lib.rs
1//! A library to quickly compute (canonical) minimizers of DNA and text sequences.
2//!
3//! The main functions are:
4//! - [`minimizer_positions`]: compute the positions of all minimizers of a sequence.
5//! - [`canonical_minimizer_positions`]: compute the positions of all _canonical_ minimizers of a sequence.
6//!
7//! Adjacent equal positions are deduplicated, but since the canonical minimizer is _not_ _forward_, a position could appear more than once.
8//!
9//! The implementation uses SIMD by splitting each sequence into 8 chunks and processing those in parallel.
10//!
11//! When using super-k-mers, use the `_and_superkmer` variants to additionally return a vector containing the index of the first window the minimizer is minimal.
12//!
13//! The minimizer of a single window can be found using [`one_minimizer`] and [`one_canonical_minimizer`], but note that these functions are not nearly as efficient.
14//!
15//! The `scalar` versions are mostly for testing only, and basically always slower.
16//!
17//! ## Minimizers
18//!
19//! The code is explained in detail in our [paper](https://doi.org/10.4230/LIPIcs.SEA.2025.20):
20//!
21//! > SimdMinimizers: Computing random minimizers, fast.
22//! > Ragnar Groot Koerkamp, Igor Martayan, SEA 2025
23//!
24//! Briefly, minimizers are defined using two parameters `k` and `w`.
25//! Given a sequence of characters, all k-mers (substrings of length `k`) are hashed,
26//! and for each _window_ of `k` consecutive k-mers (of length `l = w + k - 1` characters),
27//! (the position of) the smallest k-mer is sampled.
28//!
29//! Minimizers are found as follows:
30//! 1. Split the input to 8 chunks that are processed in parallel using SIMD.
31//! 2. Compute a 32-bit ntHash rolling hash of the k-mers.
32//! 3. Use the 'two stacks' sliding window minimum on the top 16 bits of each hash.
33//! 4. Break ties towards the leftmost position by storing the position in the bottom 16 bits.
34//! 5. Compute 8 consecutive minimizer positions, and dedup them.
35//! 6. Collect the deduplicated minimizer positions from all 8 chunks into a single vector.
36//!
37//! ## Canonical minimizers
38//!
39//! _Canonical_ minimizers have the property that the sampled k-mers of a DNA sequence are the same as those sampled from the _reverse complement_ sequence.
40//!
41//! This works as follows:
42//! 1. ntHash is modified to use the canonical version that computes the xor of the hash of the forward and reverse complement k-mer.
43//! 2. Compute the leftmost and rightmost minimal k-mer.
44//! 3. Compute the 'preferred' strand of the current window as the one with more `TG` characters. This requires `l=w+k-1` to be odd for proper tie-breaking.
45//! 4. Return either the leftmost or rightmost smallest k-mer, depending on the preferred strand.
46//!
47//! ## Syncmers
48//!
49//! _Syncmers_ are (in our notation) windows of length `l = w + k - 1` characters where the minimizer k-mer is a prefix or suffix.
50//! (Or, in classical notation, `k`-mers with the smallest `s`-mer as prefix or suffix.)
51//! These can be computed by using [`fn@syncmers`] or [`canonical_syncmers`] instead of [`minimizers`] or [`canonical_minimizers`].
52//!
53//! Note that canonical syncmers are chosen as the minimum of the forward and reverse-complement k-mer representation.
54//!
55//! ## Input types
56//!
57//! This crate depends on [`packed_seq`] to handle generic types of input sequences.
58//! Most commonly, one should use [`packed_seq::PackedSeqVec`] for packed DNA sequences, but one can also simply wrap a sequence of `ACTGactg` characters in [`packed_seq::AsciiSeqVec`].
59//! Additionally, `simd_minimizers` works on general (ASCII) `&[u8]` text.
60//!
61//! The main function provided by [`packed_seq`] is [`packed_seq::Seq::iter_bp`], which splits the input into 8 chunks and iterates them in parallel using SIMD.
62//!
63//! When dealing with ASCII input, use the `AsciiSeq` and `AsciiSeqVec` types.
64//!
65//! ## Hash function
66//!
67//! By default, the library uses the `ntHash` hash function, which maps each DNA base `ACTG` to a pseudo-random value using a table lookup.
68//! This hash function is specifically designed to be fast for hashing DNA sequences with input type [`packed_seq::PackedSeq`] and [`packed_seq::AsciiSeq`].
69//!
70//! For general ASCII sequences (`&[u8]`), `mulHash` is used instead, which instead multiplies each character value by a pseudo-random constant.
71//! The `mul_hash` module provides functions that _always_ use mulHash, also for DNA sequences.
72//!
73//! ## Performance
74//!
75//! This library depends on AVX2 or NEON SIMD instructions to achieve good performance.
76//! Make sure to compile with `-C target-cpu=native` to enable these instructions.
77//! See the [ensure_simd](https://github.com/ragnargrootkoerkamp/ensure_simd) crate for more details.
78//!
79//! All functions take a `out_vec: &mut Vec<u32>` parameter to which positions are _appended_.
80//! For best performance, re-use the same `out_vec` between invocations, and [`Vec::clear`] it before or after each call.
81//!
82//! ## Examples
83//!
84//! #### Scalar `AsciiSeq`
85//!
86//! ```
87//! // Scalar ASCII version.
88//! use packed_seq::{SeqVec, AsciiSeq};
89//!
90//! let seq = b"ACGTGCTCAGAGACTCAG";
91//! let ascii_seq = AsciiSeq(seq);
92//!
93//! let k = 5;
94//! let w = 7;
95//!
96//! let positions = simd_minimizers::minimizer_positions(ascii_seq, k, w);
97//! assert_eq!(positions, vec![4, 5, 8, 13]);
98//! ```
99//!
100//! #### SIMD `PackedSeq`
101//!
102//! ```
103//! // Packed SIMD version.
104//! use packed_seq::{PackedSeqVec, SeqVec, Seq};
105//!
106//! let seq = b"ACGTGCTCAGAGACTCAGAGGA";
107//! let packed_seq = PackedSeqVec::from_ascii(seq);
108//!
109//! let k = 5;
110//! let w = 7;
111//!
112//! // Unfortunately, `PackedSeqVec` can not `Deref` into a `PackedSeq`, so `as_slice` is needed.
113//! // Since we also need the values, this uses the Builder API.
114//! let mut fwd_pos = vec![];
115//! let fwd_vals: Vec<_> = simd_minimizers::canonical_minimizers(k, w).run(packed_seq.as_slice(), &mut fwd_pos).values_u64().collect();
116//! assert_eq!(fwd_pos, vec![0, 7, 9, 15]);
117//! assert_eq!(fwd_vals, vec![
118//! // T G C A C, CACGT is rc of ACGTG at pos 0
119//! 0b10_11_01_00_01,
120//! // G A G A C, CAGAG is at pos 7
121//! 0b11_00_11_00_01,
122//! // C A G A G, GAGAC is at pos 9
123//! 0b01_00_11_00_11,
124//! // G A G A C, CAGAG is at pos 15
125//! 0b11_00_11_00_01
126//! ]);
127//!
128//! // Check that reverse complement sequence has minimizers at 'reverse' positions.
129//! let rc_packed_seq = packed_seq.as_slice().to_revcomp();
130//! let mut rc_pos = Vec::new();
131//! let mut rc_vals: Vec<_> = simd_minimizers::canonical_minimizers(k, w).run(rc_packed_seq.as_slice(), &mut rc_pos).values_u64().collect();
132//! assert_eq!(rc_pos, vec![2, 8, 10, 17]);
133//! for (fwd, &rc) in std::iter::zip(fwd_pos, rc_pos.iter().rev()) {
134//! assert_eq!(fwd as usize, seq.len() - k - rc as usize);
135//! }
136//! rc_vals.reverse();
137//! assert_eq!(rc_vals, fwd_vals);
138//! ```
139//!
140//! #### Seeded hasher
141//!
142//! ```
143//! // Packed SIMD version with seeded hashes.
144//! use packed_seq::{PackedSeqVec, SeqVec};
145//!
146//! let seq = b"ACGTGCTCAGAGACTCAG";
147//! let packed_seq = PackedSeqVec::from_ascii(seq);
148//!
149//! let k = 5;
150//! let w = 7;
151//! let seed = 101010;
152//! // Canonical by default. Use `NtHasher<false>` for forward-only.
153//! let hasher = <seq_hash::NtHasher>::new_with_seed(k, seed);
154//!
155//! let fwd_pos = simd_minimizers::canonical_minimizers(k, w).hasher(&hasher).run_once(packed_seq.as_slice());
156//! ```
157
158#![allow(clippy::missing_transmute_annotations)]
159
160mod canonical;
161pub mod collect;
162mod minimizers;
163mod sliding_min;
164pub mod syncmers;
165mod intrinsics {
166 mod dedup;
167 pub use dedup::{append_filtered_vals, append_unique_vals, append_unique_vals_2};
168}
169
170#[cfg(test)]
171mod test;
172
173/// Re-exported internals. Used for benchmarking, and not part of the semver-compatible stable API.
174pub mod private {
175 pub mod canonical {
176 pub use crate::canonical::*;
177 }
178 pub mod minimizers {
179 pub use crate::minimizers::*;
180 }
181 pub mod sliding_min {
182 pub use crate::sliding_min::*;
183 }
184 pub use packed_seq::u32x8 as S;
185}
186
187use collect::CollectAndDedup;
188use collect::collect_and_dedup_into_scalar;
189use collect::collect_and_dedup_with_index_into_scalar;
190use minimizers::canonical_minimizers_skip_ambiguous_windows;
191/// Re-export of the `packed-seq` crate.
192pub use packed_seq;
193use packed_seq::PackedNSeq;
194use packed_seq::PackedSeq;
195/// Re-export of the `seq-hash` crate.
196pub use seq_hash;
197
198use minimizers::{
199 canonical_minimizers_seq_scalar, canonical_minimizers_seq_simd, minimizers_seq_scalar,
200 minimizers_seq_simd,
201};
202use packed_seq::Seq;
203use packed_seq::u32x8 as S;
204use seq_hash::KmerHasher;
205
206pub use minimizers::one_minimizer;
207use seq_hash::NtHasher;
208pub use sliding_min::Cache;
209use syncmers::CollectSyncmers;
210use syncmers::collect_syncmers_scalar;
211
212thread_local! {
213 static CACHE: std::cell::RefCell<(Cache, Vec<S>, Vec<S>)> = std::cell::RefCell::new(Default::default());
214}
215
216/// `CANONICAL`: true for canonical minimizers.
217/// `H`: the kmer hasher to use.
218/// `SkPos`: type of super-k-mer position storage. Use `()` to disable super-k-mers.
219/// `SYNCMER`: 0 for minimizers, 1 for closed syncmers, 2 for open syncmers.
220pub struct Builder<'h, const CANONICAL: bool, H: KmerHasher, SkPos, const SYNCMER: u8> {
221 k: usize,
222 w: usize,
223 hasher: Option<&'h H>,
224 sk_pos: SkPos,
225}
226
227pub struct Output<'o, const CANONICAL: bool, S> {
228 /// k for minimizers, k+w-1 for syncmers
229 len: usize,
230 seq: S,
231 min_pos: &'o Vec<u32>,
232}
233
234#[must_use]
235pub const fn minimizers(k: usize, w: usize) -> Builder<'static, false, NtHasher<false>, (), 0> {
236 Builder {
237 k,
238 w,
239 hasher: None,
240 sk_pos: (),
241 }
242}
243
244#[must_use]
245pub const fn canonical_minimizers(
246 k: usize,
247 w: usize,
248) -> Builder<'static, true, NtHasher<true>, (), 0> {
249 Builder {
250 k,
251 w,
252 hasher: None,
253 sk_pos: (),
254 }
255}
256
257/// Return positions/values of *closed* syncmers of length `k+w-1`.
258///
259/// These are windows with the minimizer at the start or end of the window.
260///
261/// `k` here corresponds to `s` in original syncmer notation: the minimizer length.
262/// `k+w-1` corresponds to `k` in original syncmer notation: the length of the extracted string.
263#[must_use]
264pub const fn closed_syncmers(
265 k: usize,
266 w: usize,
267) -> Builder<'static, false, NtHasher<false>, (), 1> {
268 Builder {
269 k,
270 w,
271 hasher: None,
272 sk_pos: (),
273 }
274}
275
276#[must_use]
277pub const fn canonical_closed_syncmers(
278 k: usize,
279 w: usize,
280) -> Builder<'static, true, NtHasher<true>, (), 1> {
281 Builder {
282 k,
283 w,
284 hasher: None,
285 sk_pos: (),
286 }
287}
288
289/// Return positions/values of *open* syncmers of length `k+w-1`.
290///
291/// These are windows with the minimizer in the middle of the window. This requires `w` to be odd.
292///
293/// `k` here corresponds to `s` in original syncmer notation: the minimizer length.
294/// `k+w-1` corresponds to `k` in original syncmer notation: the length of the extracted string.
295#[must_use]
296pub const fn open_syncmers(k: usize, w: usize) -> Builder<'static, false, NtHasher<false>, (), 2> {
297 Builder {
298 k,
299 w,
300 hasher: None,
301 sk_pos: (),
302 }
303}
304
305#[must_use]
306pub const fn canonical_open_syncmers(
307 k: usize,
308 w: usize,
309) -> Builder<'static, true, NtHasher<true>, (), 2> {
310 Builder {
311 k,
312 w,
313 hasher: None,
314 sk_pos: (),
315 }
316}
317
318impl<const CANONICAL: bool, const SYNCMERS: u8>
319 Builder<'static, CANONICAL, NtHasher<CANONICAL>, (), SYNCMERS>
320{
321 #[must_use]
322 pub const fn hasher<'h, H2: KmerHasher>(
323 &self,
324 hasher: &'h H2,
325 ) -> Builder<'h, CANONICAL, H2, (), SYNCMERS> {
326 Builder {
327 k: self.k,
328 w: self.w,
329 sk_pos: (),
330 hasher: Some(hasher),
331 }
332 }
333}
334impl<'h, const CANONICAL: bool, H: KmerHasher> Builder<'h, CANONICAL, H, (), 0> {
335 #[must_use]
336 pub const fn super_kmers<'o2>(
337 &self,
338 sk_pos: &'o2 mut Vec<u32>,
339 ) -> Builder<'h, CANONICAL, H, &'o2 mut Vec<u32>, 0> {
340 Builder {
341 k: self.k,
342 w: self.w,
343 hasher: self.hasher,
344 sk_pos,
345 }
346 }
347}
348
349/// Without-superkmer version
350impl<'h, const CANONICAL: bool, H: KmerHasher, const SYNCMERS: u8>
351 Builder<'h, CANONICAL, H, (), SYNCMERS>
352{
353 pub fn run_scalar_once<'s, SEQ: Seq<'s>>(&self, seq: SEQ) -> Vec<u32> {
354 let mut min_pos = vec![];
355 self.run_impl::<false, _>(seq, &mut min_pos);
356 min_pos
357 }
358
359 pub fn run_once<'s, SEQ: Seq<'s>>(&self, seq: SEQ) -> Vec<u32> {
360 let mut min_pos = vec![];
361 self.run_impl::<true, _>(seq, &mut min_pos);
362 min_pos
363 }
364
365 pub fn run_scalar<'s, 'o, SEQ: Seq<'s>>(
366 &self,
367 seq: SEQ,
368 min_pos: &'o mut Vec<u32>,
369 ) -> Output<'o, CANONICAL, SEQ> {
370 self.run_impl::<false, _>(seq, min_pos)
371 }
372
373 pub fn run<'s, 'o, SEQ: Seq<'s>>(
374 &self,
375 seq: SEQ,
376 min_pos: &'o mut Vec<u32>,
377 ) -> Output<'o, CANONICAL, SEQ> {
378 self.run_impl::<true, _>(seq, min_pos)
379 }
380
381 fn run_impl<'s, 'o, const SIMD: bool, SEQ: Seq<'s>>(
382 &self,
383 seq: SEQ,
384 min_pos: &'o mut Vec<u32>,
385 ) -> Output<'o, CANONICAL, SEQ> {
386 let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
387 let hasher = self
388 .hasher
389 .unwrap_or_else(|| default_hasher.as_ref().unwrap());
390
391 CACHE.with_borrow_mut(|cache| match (SIMD, CANONICAL, SYNCMERS) {
392 (false, false, 0) => collect_and_dedup_into_scalar(
393 minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
394 min_pos,
395 ),
396 (false, false, 1) => collect_syncmers_scalar::<false>(
397 self.w,
398 minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
399 min_pos,
400 ),
401 (false, false, 2) => collect_syncmers_scalar::<true>(
402 self.w,
403 minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
404 min_pos,
405 ),
406 (false, true, 0) => collect_and_dedup_into_scalar(
407 canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
408 min_pos,
409 ),
410 (false, true, 1) => collect_syncmers_scalar::<false>(
411 self.w,
412 canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
413 min_pos,
414 ),
415 (false, true, 2) => collect_syncmers_scalar::<true>(
416 self.w,
417 canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
418 min_pos,
419 ),
420 (true, false, 0) => minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
421 .collect_and_dedup_into::<false>(min_pos),
422 (true, false, 1) => minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
423 .collect_syncmers_into::<false>(self.w, min_pos),
424 (true, false, 2) => minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
425 .collect_syncmers_into::<true>(self.w, min_pos),
426 (true, true, 0) => canonical_minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
427 .collect_and_dedup_into::<false>(min_pos),
428 (true, true, 1) => canonical_minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
429 .collect_syncmers_into::<false>(self.w, min_pos),
430 (true, true, 2) => canonical_minimizers_seq_simd(seq, hasher, self.w, &mut cache.0)
431 .collect_syncmers_into::<true>(self.w, min_pos),
432 _ => unreachable!("SYNCMERS generic must be 0 (no syncmers), 1 (closed syncmers), or 2 (open syncmers)."),
433 });
434 Output {
435 len: if SYNCMERS != 0 {
436 self.k + self.w - 1
437 } else {
438 self.k
439 },
440 seq,
441 min_pos,
442 }
443 }
444}
445
446impl<'h, H: KmerHasher, const SYNCMERS: u8> Builder<'h, true, H, (), SYNCMERS> {
447 pub fn run_skip_ambiguous_windows_once<'s>(&self, nseq: PackedNSeq<'s>) -> Vec<u32> {
448 let mut min_pos = vec![];
449 self.run_skip_ambiguous_windows(nseq, &mut min_pos);
450 min_pos
451 }
452 pub fn run_skip_ambiguous_windows<'s, 'o>(
453 &self,
454 nseq: PackedNSeq<'s>,
455 min_pos: &'o mut Vec<u32>,
456 ) -> Output<'o, true, PackedSeq<'s>> {
457 CACHE
458 .with_borrow_mut(|cache| self.run_skip_ambiguous_windows_with_buf(nseq, min_pos, cache))
459 }
460 pub fn run_skip_ambiguous_windows_with_buf<'s, 'o>(
461 &self,
462 nseq: PackedNSeq<'s>,
463 min_pos: &'o mut Vec<u32>,
464 cache: &mut (Cache, Vec<S>, Vec<S>),
465 ) -> Output<'o, true, PackedSeq<'s>> {
466 let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
467 let hasher = self
468 .hasher
469 .unwrap_or_else(|| default_hasher.as_ref().unwrap());
470 match SYNCMERS {
471 0 => canonical_minimizers_skip_ambiguous_windows(nseq, hasher, self.w, cache)
472 .collect_and_dedup_into::<true>(min_pos),
473 1 => canonical_minimizers_skip_ambiguous_windows(nseq, hasher, self.w, cache)
474 .collect_syncmers_into::<false>(self.w, min_pos),
475 2 => canonical_minimizers_skip_ambiguous_windows(nseq, hasher, self.w, cache)
476 .collect_syncmers_into::<true>(self.w, min_pos),
477 _ => panic!(
478 "SYNCMERS generic must be 0 (no syncmers), 1 (closed syncmers), or 2 (open syncmers)."
479 ),
480 }
481 Output {
482 len: if SYNCMERS != 0 {
483 self.k + self.w - 1
484 } else {
485 self.k
486 },
487 seq: nseq.seq,
488 min_pos,
489 }
490 }
491}
492
493/// With-superkmer version
494///
495/// (does not work in combination with syncmers)
496impl<'h, 'o2, const CANONICAL: bool, H: KmerHasher>
497 Builder<'h, CANONICAL, H, &'o2 mut Vec<u32>, 0>
498{
499 pub fn run_scalar_once<'s, SEQ: Seq<'s>>(self, seq: SEQ) -> Vec<u32> {
500 let mut min_pos = vec![];
501 self.run_scalar(seq, &mut min_pos);
502 min_pos
503 }
504
505 pub fn run_scalar<'s, 'o, SEQ: Seq<'s>>(
506 self,
507 seq: SEQ,
508 min_pos: &'o mut Vec<u32>,
509 ) -> Output<'o, CANONICAL, SEQ> {
510 let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
511 let hasher = self
512 .hasher
513 .unwrap_or_else(|| default_hasher.as_ref().unwrap());
514
515 CACHE.with_borrow_mut(|cache| match CANONICAL {
516 false => collect_and_dedup_with_index_into_scalar(
517 minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
518 min_pos,
519 self.sk_pos,
520 ),
521 true => collect_and_dedup_with_index_into_scalar(
522 canonical_minimizers_seq_scalar(seq, hasher, self.w, &mut cache.0),
523 min_pos,
524 self.sk_pos,
525 ),
526 });
527 Output {
528 len: self.k,
529 seq,
530 min_pos,
531 }
532 }
533
534 pub fn run_once<'s, SEQ: Seq<'s>>(self, seq: SEQ) -> Vec<u32> {
535 let mut min_pos = vec![];
536 self.run(seq, &mut min_pos);
537 min_pos
538 }
539
540 pub fn run<'s, 'o, SEQ: Seq<'s>>(
541 self,
542 seq: SEQ,
543 min_pos: &'o mut Vec<u32>,
544 ) -> Output<'o, CANONICAL, SEQ> {
545 CACHE.with_borrow_mut(|cache| self.run_with_buf(seq, min_pos, &mut cache.0))
546 }
547
548 #[inline(always)]
549 fn run_with_buf<'s, 'o, SEQ: Seq<'s>>(
550 self,
551 seq: SEQ,
552 min_pos: &'o mut Vec<u32>,
553 cache: &mut Cache,
554 ) -> Output<'o, CANONICAL, SEQ> {
555 let default_hasher = self.hasher.is_none().then(|| H::new(self.k));
556 let hasher = self
557 .hasher
558 .unwrap_or_else(|| default_hasher.as_ref().unwrap());
559
560 match CANONICAL {
561 false => minimizers_seq_simd(seq, hasher, self.w, cache)
562 .collect_and_dedup_with_index_into(min_pos, self.sk_pos),
563 true => canonical_minimizers_seq_simd(seq, hasher, self.w, cache)
564 .collect_and_dedup_with_index_into(min_pos, self.sk_pos),
565 };
566 Output {
567 len: self.k,
568 seq,
569 min_pos,
570 }
571 }
572}
573
574impl<'s, 'o, const CANONICAL: bool, SEQ: Seq<'s>> Output<'o, CANONICAL, SEQ> {
575 /// Iterator over (canonical) u64 kmer-values associated with all minimizer positions.
576 #[must_use]
577 pub fn values_u64(&self) -> impl ExactSizeIterator<Item = u64> {
578 self.pos_and_values_u64().map(|(_pos, val)| val)
579 }
580 /// Iterator over (canonical) u128 kmer-values associated with all minimizer positions.
581 #[must_use]
582 pub fn values_u128(&self) -> impl ExactSizeIterator<Item = u128> {
583 self.pos_and_values_u128().map(|(_pos, val)| val)
584 }
585 /// Iterator over positions and (canonical) u64 kmer-values associated with all minimizer positions.
586 #[must_use]
587 pub fn pos_and_values_u64(&self) -> impl ExactSizeIterator<Item = (u32, u64)> {
588 self.min_pos.iter().map(
589 #[inline(always)]
590 move |&pos| {
591 let val = if CANONICAL {
592 let a = self.seq.read_kmer(self.len, pos as usize);
593 let b = self.seq.read_revcomp_kmer(self.len, pos as usize);
594 core::cmp::min(a, b)
595 } else {
596 self.seq.read_kmer(self.len, pos as usize)
597 };
598 (pos, val)
599 },
600 )
601 }
602 /// Iterator over positions and (canonical) u128 kmer-values associated with all minimizer positions.
603 #[must_use]
604 pub fn pos_and_values_u128(&self) -> impl ExactSizeIterator<Item = (u32, u128)> {
605 self.min_pos.iter().map(
606 #[inline(always)]
607 move |&pos| {
608 let val = if CANONICAL {
609 let a = self.seq.read_kmer_u128(self.len, pos as usize);
610 let b = self.seq.read_revcomp_kmer_u128(self.len, pos as usize);
611 core::cmp::min(a, b)
612 } else {
613 self.seq.read_kmer_u128(self.len, pos as usize)
614 };
615 (pos, val)
616 },
617 )
618 }
619}
620
621/// Positions of all minimizers in the sequence.
622///
623/// See [`minimizers`], [`canonical_minimizers`], and [`Builder`] for more
624/// configurations supporting a custom hasher, super-kmer positions, and
625/// returning kmer-values.
626///
627/// Positions are appended to a reusable `min_pos` vector to avoid allocations.
628pub fn minimizer_positions<'s>(seq: impl Seq<'s>, k: usize, w: usize) -> Vec<u32> {
629 minimizers(k, w).run_once(seq)
630}
631
632/// Positions of all canonical minimizers in the sequence.
633///
634/// See [`minimizers`], [`canonical_minimizers`], and [`Builder`] for more
635/// configurations supporting a custom hasher, super-kmer positions, and
636/// returning kmer-values.
637///
638/// `l=w+k-1` must be odd to determine the strand of each window.
639///
640/// Positions are appended to a reusable `min_pos` vector to avoid allocations.
641pub fn canonical_minimizer_positions<'s>(seq: impl Seq<'s>, k: usize, w: usize) -> Vec<u32> {
642 canonical_minimizers(k, w).run_once(seq)
643}