Expand description
4-bit IUPAC nucleotide ambiguity codes
IUPAC nucleotide ambiguity codes are represented with 4 bits
| A | C | G | T | |
|---|---|---|---|---|
| A | 1 | 0 | 0 | 0 |
| C | 0 | 1 | 0 | 0 |
| G | 0 | 0 | 1 | 0 |
| T | 0 | 0 | 0 | 1 |
| Y | 0 | 1 | 0 | 1 |
| R | 1 | 0 | 1 | 0 |
| W | 1 | 0 | 0 | 1 |
| S | 0 | 1 | 1 | 0 |
| K | 0 | 0 | 1 | 1 |
| M | 1 | 1 | 0 | 0 |
| D | 1 | 0 | 1 | 1 |
| V | 1 | 1 | 1 | 0 |
| H | 1 | 1 | 0 | 1 |
| B | 0 | 1 | 1 | 1 |
| N | 1 | 1 | 1 | 1 |
| X/- | 0 | 0 | 0 | 0 |
This means that we can treat each symbol as a set and we get meaningful bitwise operations:
use bio_seq::prelude::*;
// Set union:
let union = iupac!("AS-GYTNAN") | iupac!("ANTGCAT-N");
assert_eq!(union, iupac!("ANTGYWNAN"));
// Set intersection:
let intersection = iupac!("ACGTSWKMN") & iupac!("WKMSTNNAN");
assert_eq!(intersection, iupac!("A----WKAN"));Which can be used to implement pattern matching:
use bio_seq::prelude::*;
let seq = iupac!("AGCTNNCAGTCGACGTATGTA");
let pattern = iupac!("AYG");
for slice in seq.windows(pattern.len()) {
if pattern.contains(slice) {
println!("{slice} matches pattern");
}
}
// ACG matches pattern
// ATG matches pattern