Module bio_seq::codec

source ·
Expand description

Coding/Decoding trait for bit-packable enums representing biological alphabets

The dna, iupac, text, and amino alphabets are built in.

This trait implements the translation between the UTF-8 representation of an alphabet and it’s efficient bit-packing. The BITS attribute stores the number of bits used by the representation.

use bio_seq::prelude::{Dna, Codec};
use bio_seq::codec::text;
assert_eq!(Dna::BITS, 2);
assert_eq!(text::Dna::BITS, 8);

§Deriving custom Codecs

Custom encodings can be easily defined on enums using the derivable Codec trait.

use bio_seq::prelude;
use bio_seq::prelude::Codec;

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Codec)]
pub enum Dna {
    A = 0b00,
    C = 0b01,
    G = 0b10,
    T = 0b11,
}

Modules§

  • 6-bit representation of amino acids
  • 2-bit DNA representation: A: 00, C: 01, G: 10, T: 11
  • 4-bit IUPAC nucleotide ambiguity codes
  • 8-bit UTF-8/ASCII representation of nucleotides

Traits§

  • The binary encodings of an alphabet’s characters are represented with u8s. Encoding from UTF-8 or a raw u8 will always be fallible but often can be assumed safe.
  • Nucleotide alphabets that can be complemented implement Complement

Derive Macros§