Expand description
Bit-packed and well-typed biological sequences
- seq: A
Seqis a heap allocated sequences of variable length that owns it’s own data. ASeqSliceis a read-only window into aSeq. - kmer:
Kmers are short fixed length sequences. They generally implementCopyand can efficiently be passed on the stack. - codec: Encodings of genomic data types to be packed into sequences.
- translation: Amino acid translation tables
This crate is designed to facilitate common bioinformatics tasks, incuding amino acid translation, k-mer minimisation and hashing, and nucleotide sequence manipulation.
Add bio-seq to Cargo.toml:
[dependencies]
bio-seq = "0.12"
use bio_seq::prelude::*;
let seq = dna!("ATACGATCGATCGATCGATCCGT");
// iterate over the 8-mers of the reverse complement
for kmer in seq.revcomp().kmers::<8>() {
println!("{kmer}");
}
// ACGGATCG
// CGGATCGA
// GGATCGAT
// GATCGATC
// ATCGATCG
// ...The 4-bit encoding of IUPAC nucleotide ambiguity codes naturally represent a set of bases for each position (0001: A, 1111: N, 0000: *, …):
use bio_seq::prelude::*;
let seq = iupac!("AGCTNNCAGTCGACGTATGTA");
let pattern = iupac!("AYG");
for slice in seq.windows(pattern.len()) {
if pattern.contains(slice) {
println!("{slice} matches pattern");
}
}
// ACG matches pattern
// ATG matches patternLogical or is the union:
assert_eq!(iupac!("AS-GYTNA") | iupac!("ANTGCAT-"), iupac!("ANTGYWNA"));Logical and is the intersection of two iupac sequences:
assert_eq!(iupac!("ACGTSWKM") & iupac!("WKMSTNNA"), iupac!("A----WKA"));Modules§
- Coding/Decoding trait for bit-packable enums representing biological alphabets
- Kmers
- Arbitrary length sequences of bit-packed genomic data, stored on the heap.
- Genetic Code Translation