nucs
This is a personal experiment with an API for working with nucleotide and amino acid sequences. Its design is heavily based off of my experience using and helping maintain https://github.com/SecureDNA/quickdna. My goals were to design an API that...
- ...is solely focused on Rust.
- ...integrates with the Rust
stdlibrary (e.g. representing codons as[Nuc; 3]allows thestdlibrary to understand that codons can be cheaply flattened into nucleotides). - ...is (largely) collection-agnostic.
- ...tries to be consistent with Zipf's law of abbreviation when naming things.
use ;
let mut dna: Dna = "ACACACATATCTTACGCTTAGGAAATCTGACCCGAACCAACCATTGATGAG".parse.unwrap;
let codons = dna.as_codons_mut;
// Selects this: v---------------------------------------------------------v
// ACAC ACA TAT CTT ACG CTT AGG AAA TCT GAC CCG AAC CAA CCA TTG ATG AG
codons.revcomp;
// Reverse complements this: v-----------------v
// ACAC ACA TAT CTT ACG CTT AGG AAA TCT GAC CCG AAC CAA CCA TTG ATG AG
// Changing it to: AGA TTT CCT AAG CGT
let peptide: Peptide = dna.translate.collect;
assert_eq!;
Non-Vec containers are supported too, and it's possible to work with DNA non-destructively
via iterators:
use VecDeque;
use ;
let mut dna: = "ACTCTATCACCTACTCAGAGCGCTCCACCGCGCGTGT".parse.unwrap;
// Prepend things to the `VecDeque`; it's no longer stored contiguously.
for _ in 0..4
let immutable_dna = dna;
// Apply reverse compliment and NCBI1 non-destructively.
let peptide: Peptide = immutable_dna
.iter
.revcomped
.translate
.collect;
assert_eq!;
Ambiguous nucleotides and amino acids are supported:
use ;
use ;
// `lit` returns an array without allocating
let mut dna = lit;
// Because `dna` contains ambiguous nucleotides, translating it produces an ambiguous peptide
let peptide: AmbiPeptide = dna.translate.collect;
assert_eq!;
dna |= A | C;
dna |= A;
dna |= A;
assert_eq!;
let peptide: AmbiPeptide = dna.translate.collect;
assert_eq!;
Planned functionality
- Packing
- FASTA parsing
serdeintegration- Expansion of ambiguous k-mers into concrete k-mers
- Base canonicalization
- Unsafe casts for
VecandArc - Better efficiency
Incompatibility with quickdna
Note that while nucs is heavily inspired by https://github.com/SecureDNA/quickdna,
there are subtle-yet-important incompatibilities with the order and representation of
nucleotides and amino acids. In particular nucleotides are ordered alphabetically
in nucs, to keep the ordering identical to strings as well as (hopefully) making
future bit-packing work easier.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.