Crate seq_hash

Crate seq_hash 

Source
Expand description

A crate for streaming hashing of k-mers via KmerHasher.

This builds on packed_seq and is used by e.g. [simd_minimizers].

The default NtHasher is canonical. If that’s not needed, NtHasher<false> will be slightly faster. For non-DNA sequences with >2-bit alphabets, use MulHasher instead.

Note that KmerHasher objects need k on their construction, so that they can precompute required constants. Prefer reusing the same KmerHasher.

This crate also includes AntiLexHasher, see this blogpost.

§Typical usage

Construct a default NtHasher via let hasher = <NtHasher>::new(k). Then call either hasher.hash_kmers_simd(seq, context), or use the underlying ‘mapper’ via hasher.in_out_mapper_simd(seq).

use packed_seq::{AsciiSeqVec, PackedSeqVec, SeqVec};
use seq_hash::{KmerHasher, NtHasher};
let k = 3;

// Default `NtHasher` is canonical.
let hasher = <NtHasher>::new(k);
let kmer = PackedSeqVec::from_ascii(b"ACG");
let kmer_rc = PackedSeqVec::from_ascii(b"CGT");
// Normally, prefer `hash_kmers_simd` over `hash_seq`.
assert_eq!(
    hasher.hash_seq(kmer.as_slice()),
    hasher.hash_seq(kmer_rc.as_slice())
);

let fwd_hasher = NtHasher::<false>::new(k);
assert_ne!(
    fwd_hasher.hash_seq(kmer.as_slice()),
    fwd_hasher.hash_seq(kmer_rc.as_slice())
);

let seq = b"ACGGCAGCGCATATGTAGT";
let ascii_seq = AsciiSeqVec::from_ascii(seq);
let packed_seq = PackedSeqVec::from_ascii(seq);

// hasher.hash_kmers_scalar(seq.as_slice()); // Panics since `NtHasher` does not support ASCII.
let hashes_1: Vec<_> = hasher.hash_kmers_scalar(ascii_seq.as_slice()).collect();
let hashes_2: Vec<_> = hasher.hash_kmers_scalar(packed_seq.as_slice()).collect();
// Hashes are equal for [`packed_seq::AsciiSeq`] and [`packed_seq::PackedSeq`].
assert_eq!(hashes_1, hashes_2);
assert_eq!(hashes_1.len(), seq.len() - (k-1));

// Consider a 'context' of a single kmer.
let hashes_3: Vec<_> = hasher.hash_kmers_simd(ascii_seq.as_slice(), 1).collect();
let hashes_4: Vec<_> = hasher.hash_kmers_simd(packed_seq.as_slice(), 1).collect();
assert_eq!(hashes_1, hashes_3);
assert_eq!(hashes_1, hashes_4);

Re-exports§

pub use packed_seq;

Structs§

AntiLexHasher
A hash function that compares strings reverse-lexicographically, with the last (most significant) character inverted.
MulHasher
MulHasher multiplies each character by a constant and xor’s them together under rotations.
NtHasher
u32 variant of NtHash.

Traits§

KmerHasher
A hasher that can hash all k-mers in a string.