Function needletail::sequence::normalize[][src]

pub fn normalize(seq: &[u8], allow_iupac: bool) -> Option<Vec<u8>>

Transform a nucleic acid sequence into its “normalized” form.

The normalized form is:

  • only AGCTN and possibly - (for gaps)
  • strip out any whitespace or line endings
  • lowercase versions of these are uppercased
  • U is converted to T (make everything a DNA sequence)
  • some other punctuation is converted to gaps
  • IUPAC bases may be converted to N’s depending on the parameter passed in
  • everything else is considered a N