pub fn as_2bit(seq: &[u8]) -> Result<u64, NucleotideError>Expand description
Converts a nucleotide sequence into a 2-bit packed representation.
Each nucleotide is encoded using 2 bits:
- A/a = 00
- C/c = 01
- G/g = 10
- T/t = 11
The bases are packed from least significant to most significant bits. For example, “ACGT” becomes 0b11100100.
§Arguments
seq- A byte slice containing ASCII nucleotides (A,C,G,T, case insensitive)
§Returns
Returns a u64 containing the packed representation.
§Errors
Returns NucleotideError::InvalidBase if the sequence contains any characters
other than A,C,G,T (case insensitive).
Returns NucleotideError::SequenceTooLong if the input sequence is longer
than 32 bases (as a u64 can only store 32 * 2 bits).
§Examples
Basic packing:
use bitnuc::as_2bit;
let packed = as_2bit(b"ACGT")?;
assert_eq!(packed, 0b11100100);Case insensitivity:
use bitnuc::as_2bit;
assert_eq!(as_2bit(b"acgt")?, as_2bit(b"ACGT")?);Error handling:
use bitnuc::{as_2bit, NucleotideError};
// Invalid base
assert!(matches!(
as_2bit(b"ACGN"),
Err(NucleotideError::InvalidBase(b'N'))
));
// Sequence too long
let long_seq = vec![b'A'; 33];
assert!(matches!(
as_2bit(&long_seq),
Err(NucleotideError::SequenceTooLong(33))
));