strobemers 0.1.1

A toolkit for generating strobemers
Documentation
  • Coverage
  • 64.29%
    18 out of 28 items documented1 out of 25 items with examples
  • Size
  • Source code size: 24.5 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 3.44 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 18s Average build duration of successful builds.
  • all releases: 17s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • mttmartin

Strobemers

A Rust crate to generate strobemers. Strobemers are a type of fuzzy seed originally designed for bioinformatics use-cases to perform well with substitutions and especially insertions/deletions. For more information see the paper: Kristoffer Sahlin, Effective sequence similarity detection with strobemers, Genome Res. November 2021 31: 2080-2094.

This crate aims to provide a toolkit for reproducing existing strobemer implementations while allowing individual components to be easily swapped out (e.g. hash function, window generator, or strobe selector). The StrobeHasher implements the hash function used when generating strobemers, the WindowGenerator creates the windows strobes are selected in, and StrobeSelector actually selects the strobe within their windows.

Currently the only supported pre-made implementations are intended to generate identical strobemers as the original C++ implementation here. The randstrobe is RandstrobeSahlin2021 and minstrobe is MinstrobeSahlin2021.

Example using RandstrobeSahlin2021

use strobemers::StrobemerBuilder;
use strobemers::implementations::RandstrobeSahlin2021;
let reference = b"ACGCGTACGAATCACGCCGGGTGTGTGTGATCG";
let n: usize = 2;
let k: usize = 15;
let w_min: usize = 16;
let w_max: usize = 30;

let mut randstrobe_iter = StrobemerBuilder::from_implementation(RandstrobeSahlin2021)
    .reference(reference)
    .n(n)
    .k(k)
    .w_min(w_min)
    .build()
    .unwrap();
for strobe in randstrobe_iter {
    println!("randstrobe start positions: {:?}", strobe);
}

Example starting with RandstrobeSahlin2021 and replacing the hash function

use strobemers::StrobemerBuilder;
use strobemers::implementations::{RandstrobeSahlin2021, StrobeHasher};
use wyhash::wyhash;

let reference = b"ACGCGTACGAATCACGCCGGGTGTGTGTGATCG";
let n: usize = 2;
let k: usize = 15;
let w_min: usize = 16;
let w_max: usize = 30;

struct WyHasher;
impl StrobeHasher for WyHasher {
    fn hash(&self, input: &[u8], k: usize) -> Vec<u64> {
        let mut input_hashes = Vec::new();
        for i in 0..input.len() - k {
            input_hashes.push(wyhash(&input[i..i + k], 42));
        }
        input_hashes
    }
}

let mut randstrobe_iter = StrobemerBuilder::from_implementation(RandstrobeSahlin2021)
    .reference(reference)
    .n(n)
    .k(k)
    .w_min(w_min)
    .hasher(Box::new(WyHasher))
    .build()
    .unwrap();
for strobe in randstrobe_iter {
    println!("randstrobe start positions: {:?}", strobe);
}