Expand description
Coding/Decoding trait for bit-packable enums representing sets of genomic symbols
The dna, iupac, text, and amino alphabets are built in.
This trait implements the translation between the UTF-8 representation of an alphabet and its efficient bit-packing.
The BITS attribute stores the number of bits used by the representation.
use bio_seq::prelude::{Dna, Codec};
use bio_seq::codec::text;
assert_eq!(Dna::BITS, 2);
assert_eq!(text::Dna::BITS, 8);§Deriving custom Codecs
Custom encodings can be easily defined on enums using the derivable Codec trait.
ⓘ
use bio_seq::prelude;
use bio_seq::prelude::Codec;
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Codec)]
pub enum Dna {
A = 0b00,
C = 0b01,
G = 0b10,
T = 0b11,
}§Implementing custom Codecs
Custom encodings can be defined on enums by implementing the Codec trait.
use bio_seq::prelude;
use bio_seq::prelude::Codec;
#[derive(Copy, Clone, Eq, PartialEq, Hash, Debug)]
pub enum Dna {
A = 0b00,
C = 0b01,
G = 0b10,
T = 0b11,
}
impl From<Dna> for u8 {
fn from(base: Dna) -> u8 {
match base {
Dna::A => 0b00,
Dna::C => 0b01,
Dna::G => 0b10,
Dna::T => 0b11,
}
}
}
impl Codec for Dna {
const BITS: u8 = 2;
fn unsafe_from_bits(bits: u8) -> Self {
if let Some(base) = Self::try_from_bits(bits) {
base
} else {
panic!("Unrecognised bit pattern!")
}
}
fn try_from_bits(bits: u8) -> Option<Self> {
match bits {
0b00 => Some(Dna::A),
0b01 => Some(Dna::C),
0b10 => Some(Dna::G),
0b11 => Some(Dna::T),
_ => None,
}
}
fn unsafe_from_ascii(chr: u8) -> Self {
if let Some(base) = Self::try_from_ascii(chr) {
base
} else {
panic!("Unrecognised bit pattern!")
}
}
fn try_from_ascii(chr: u8) -> Option<Self> {
match chr {
b'A' => Some(Dna::A),
b'C' => Some(Dna::C),
b'G' => Some(Dna::G),
b'T' => Some(Dna::T),
_ => None,
}
}
fn to_char(self) -> char {
match self {
Dna::A => 'A',
Dna::C => 'C',
Dna::G => 'G',
Dna::T => 'T',
}
}
fn to_bits(self) -> u8 {
self as u8
}
fn items() -> impl Iterator<Item = Self> {
vec![Dna::A, Dna::C, Dna::G, Dna::T].into_iter()
}
}
Modules§
- amino
- 6-bit representation of amino acids
- degenerate
- Experimental encodings for degenerate representations (eg 1-bit)
- dna
- 2-bit DNA representation:
A: 00, C: 01, G: 10, T: 11 - iupac
- 4-bit IUPAC nucleotide ambiguity codes
- masked
- Experimental encodings with maskable bases
- text
- 8-bit ASCII representation of nucleotides
Traits§
- Codec
- The binary encoding of an alphabet’s symbols can be represented with any type.
Encoding from ASCII bytes and decoding the representation is implemented through
the
Codectrait.