Expand description
A crate for streaming hashing of k-mers via KmerHasher
.
This builds on packed_seq
and is used by e.g. [simd_minimizers
].
The default NtHasher
is canonical.
If that’s not needed, NtHasher<false>
will be slightly faster.
For non-DNA sequences with >2-bit alphabets, use MulHasher
instead.
Note that KmerHasher
objects need k
on their construction, so that they can precompute required constants.
Prefer reusing the same KmerHasher
.
This crate also includes AntiLexHasher
, see this blogpost.
§Typical usage
Construct a default NtHasher
via let hasher = <NtHasher>::new(k)
.
Then call either hasher.hash_kmers_simd(seq, context)
,
or use the underlying ‘mapper’ via hasher.in_out_mapper_simd(seq)
.
use packed_seq::{AsciiSeqVec, PackedSeqVec, SeqVec};
use seq_hash::{KmerHasher, NtHasher};
let k = 3;
// Default `NtHasher` is canonical.
let hasher = <NtHasher>::new(k);
let kmer = PackedSeqVec::from_ascii(b"ACG");
let kmer_rc = PackedSeqVec::from_ascii(b"CGT");
// Normally, prefer `hash_kmers_simd` over `hash_seq`.
assert_eq!(
hasher.hash_seq(kmer.as_slice()),
hasher.hash_seq(kmer_rc.as_slice())
);
let fwd_hasher = NtHasher::<false>::new(k);
assert_ne!(
fwd_hasher.hash_seq(kmer.as_slice()),
fwd_hasher.hash_seq(kmer_rc.as_slice())
);
let seq = b"ACGGCAGCGCATATGTAGT";
let ascii_seq = AsciiSeqVec::from_ascii(seq);
let packed_seq = PackedSeqVec::from_ascii(seq);
// hasher.hash_kmers_scalar(seq.as_slice()); // Panics since `NtHasher` does not support ASCII.
let hashes_1: Vec<_> = hasher.hash_kmers_scalar(ascii_seq.as_slice()).collect();
let hashes_2: Vec<_> = hasher.hash_kmers_scalar(packed_seq.as_slice()).collect();
// Hashes are equal for [`packed_seq::AsciiSeq`] and [`packed_seq::PackedSeq`].
assert_eq!(hashes_1, hashes_2);
assert_eq!(hashes_1.len(), seq.len() - (k-1));
// Consider a 'context' of a single kmer.
let hashes_3: Vec<_> = hasher.hash_kmers_simd(ascii_seq.as_slice(), 1).collect();
let hashes_4: Vec<_> = hasher.hash_kmers_simd(packed_seq.as_slice(), 1).collect();
assert_eq!(hashes_1, hashes_3);
assert_eq!(hashes_1, hashes_4);
Re-exports§
pub use packed_seq;
Structs§
- Anti
LexHasher - A hash function that compares strings reverse-lexicographically, with the last (most significant) character inverted.
- MulHasher
MulHasher
multiplies each character by a constant and xor’s them together under rotations.- NtHasher
u32
variant of NtHash.
Traits§
- Kmer
Hasher - A hasher that can hash all k-mers in a string.