bitnuc

Function as_2bit

Source
pub fn as_2bit(seq: &[u8]) -> Result<u64, NucleotideError>
Expand description

Converts a nucleotide sequence into a 2-bit packed representation.

Each nucleotide is encoded using 2 bits:

  • A/a = 00
  • C/c = 01
  • G/g = 10
  • T/t = 11

The bases are packed from least significant to most significant bits. For example, “ACGT” becomes 0b11100100.

§Arguments

  • seq - A byte slice containing ASCII nucleotides (A,C,G,T, case insensitive)

§Returns

Returns a u64 containing the packed representation.

§Errors

Returns NucleotideError::InvalidBase if the sequence contains any characters other than A,C,G,T (case insensitive).

Returns NucleotideError::SequenceTooLong if the input sequence is longer than 32 bases (as a u64 can only store 32 * 2 bits).

§Examples

Basic packing:

use bitnuc::as_2bit;

let packed = as_2bit(b"ACGT")?;
assert_eq!(packed, 0b11100100);

Case insensitivity:

use bitnuc::as_2bit;

assert_eq!(as_2bit(b"acgt")?, as_2bit(b"ACGT")?);

Error handling:

use bitnuc::{as_2bit, NucleotideError};

// Invalid base
assert!(matches!(
    as_2bit(b"ACGN"),
    Err(NucleotideError::InvalidBase(b'N'))
));

// Sequence too long
let long_seq = vec![b'A'; 33];
assert!(matches!(
    as_2bit(&long_seq),
    Err(NucleotideError::SequenceTooLong(33))
));