Module codec

Source
Expand description

Coding/Decoding trait for bit-packable enums representing sets of genomic symbols

The dna, iupac, text, and amino alphabets are built in.

This trait implements the translation between the UTF-8 representation of an alphabet and its efficient bit-packing. The BITS attribute stores the number of bits used by the representation.

use bio_seq::prelude::{Dna, Codec};
use bio_seq::codec::text;
assert_eq!(Dna::BITS, 2);
assert_eq!(text::Dna::BITS, 8);

§Deriving custom Codecs

Custom encodings can be easily defined on enums using the derivable Codec trait.

use bio_seq::prelude;
use bio_seq::prelude::Codec;

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Codec)]
pub enum Dna {
    A = 0b00,
    C = 0b01,
    G = 0b10,
    T = 0b11,
}

§Implementing custom Codecs

Custom encodings can be defined on enums by implementing the Codec trait.

use bio_seq::prelude;
use bio_seq::prelude::Codec;

#[derive(Copy, Clone, Eq, PartialEq, Hash, Debug)]
pub enum Dna {
    A = 0b00,
    C = 0b01,
    G = 0b10,
    T = 0b11,
}

impl From<Dna> for u8 {
   fn from(base: Dna) -> u8 {
        match base {
            Dna::A => 0b00,
            Dna::C => 0b01,
            Dna::G => 0b10,
            Dna::T => 0b11,
        }
   }
}

impl Codec for Dna {
    const BITS: u8 = 2;

    fn unsafe_from_bits(bits: u8) -> Self {
        if let Some(base) = Self::try_from_bits(bits) {
            base
        } else {
            panic!("Unrecognised bit pattern!")
        }
    }

    fn try_from_bits(bits: u8) -> Option<Self> {
        match bits {
            0b00 => Some(Dna::A),
            0b01 => Some(Dna::C),
            0b10 => Some(Dna::G),
            0b11 => Some(Dna::T),
            _ => None,
        }
    }

    fn unsafe_from_ascii(chr: u8) -> Self {
        if let Some(base) = Self::try_from_ascii(chr) {
            base
        } else {
            panic!("Unrecognised bit pattern!")
        }
    }

    fn try_from_ascii(chr: u8) -> Option<Self> {
        match chr {
            b'A' => Some(Dna::A),
            b'C' => Some(Dna::C),
            b'G' => Some(Dna::G),
            b'T' => Some(Dna::T),
            _ => None,
        }
    }

    fn to_char(self) -> char {
        match self {
            Dna::A => 'A',
            Dna::C => 'C',
            Dna::G => 'G',
            Dna::T => 'T',
        }
    }

    fn to_bits(self) -> u8 {
        self as u8
    }

    fn items() -> impl Iterator<Item = Self> {
        vec![Dna::A, Dna::C, Dna::G, Dna::T].into_iter()
    }
}

Modules§

amino
6-bit representation of amino acids
degenerate
Experimental encodings for degenerate representations (eg 1-bit)
dna
2-bit DNA representation: A: 00, C: 01, G: 10, T: 11
iupac
4-bit IUPAC nucleotide ambiguity codes
masked
Experimental encodings with maskable bases
text
8-bit ASCII representation of nucleotides

Traits§

Codec
The binary encoding of an alphabet’s symbols can be represented with any type. Encoding from ASCII bytes and decoding the representation is implemented through the Codec trait.

Derive Macros§

Codec