Trait bio::pattern_matching::pssm::Motif
source · pub trait Motif {
const LK: [u8; 127] = _;
const MONOS: &'static [u8] = b"";
const MONO_CT: usize = 0usize;
fn rev_lk(idx: usize) -> u8;
fn len(&self) -> usize;
fn degenerate_consensus(&self) -> Vec<u8> ⓘ;
fn get_scores(&self) -> &Array2<f32>;
fn get_min_score(&self) -> f32;
fn get_max_score(&self) -> f32;
fn get_bits() -> f32;
fn seqs_to_weights(
seqs: &Vec<Vec<u8>>,
_pseudos: Option<&[f32]>
) -> Result<Array2<f32>, PSSMError> { ... }
fn lookup(mono: u8) -> Result<usize, PSSMError> { ... }
fn raw_score<C, T>(
&self,
seq_it: T
) -> Result<(usize, f32, Vec<f32>), PSSMError>
where
C: Borrow<u8>,
T: IntoIterator<Item = C>,
{ ... }
fn score<C, T>(&self, seq_it: T) -> Result<ScoredPos, PSSMError>
where
C: Borrow<u8>,
T: IntoIterator<Item = C>,
{ ... }
fn info_content(&self) -> f32 { ... }
}
Expand description
Trait containing code shared between DNA and protein implementations of the position-specific scoring matrix.
Provided Associated Constants
Required Methods
sourcefn rev_lk(idx: usize) -> u8
fn rev_lk(idx: usize) -> u8
Returns the monomer associated with the given index; the reverse of lookup
.
Returns INVALID_MONO if the index isn’t associated with a monomer.
Arguments
idx
- the index in question
sourcefn degenerate_consensus(&self) -> Vec<u8> ⓘ
fn degenerate_consensus(&self) -> Vec<u8> ⓘ
Returns a representation of the motif using ambiguous codes. Primarily useful for DNA motifs, where ambiguous codes are common (eg, ‘M’ for ‘A or C’); less so for proteins, where we represent any position without a dominant amino acid as an ‘X’
sourcefn get_scores(&self) -> &Array2<f32>
fn get_scores(&self) -> &Array2<f32>
Accessor - returns scores matrix
sourcefn get_min_score(&self) -> f32
fn get_min_score(&self) -> f32
Return sum of “worst” base at each position
sourcefn get_max_score(&self) -> f32
fn get_max_score(&self) -> f32
Return sum of “best” base at each position
Provided Methods
sourcefn seqs_to_weights(
seqs: &Vec<Vec<u8>>,
_pseudos: Option<&[f32]>
) -> Result<Array2<f32>, PSSMError>
fn seqs_to_weights(
seqs: &Vec<Vec<u8>>,
_pseudos: Option<&[f32]>
) -> Result<Array2<f32>, PSSMError>
Returns a weight matrix representing the sequences provided.
This code is shared by implementations of from_seqs
Arguments
seqs
- sequences incorporated into motifpseudos
- array slice with a pseudocount for each monomer; defaults to DEF_PSEUDO for all if None is supplied
FIXME: pseudos should be an array of size MONO_CT, but that is currently unsupported
sourcefn raw_score<C, T>(&self, seq_it: T) -> Result<(usize, f32, Vec<f32>), PSSMError>where
C: Borrow<u8>,
T: IntoIterator<Item = C>,
fn raw_score<C, T>(&self, seq_it: T) -> Result<(usize, f32, Vec<f32>), PSSMError>where
C: Borrow<u8>,
T: IntoIterator<Item = C>,
sourcefn score<C, T>(&self, seq_it: T) -> Result<ScoredPos, PSSMError>where
C: Borrow<u8>,
T: IntoIterator<Item = C>,
fn score<C, T>(&self, seq_it: T) -> Result<ScoredPos, PSSMError>where
C: Borrow<u8>,
T: IntoIterator<Item = C>,
Returns a ScoredPos
struct representing the best match within the query sequence
see:
MATCHTM: a tool for searching transcription factor binding sites in DNA sequences
Nucleic Acids Res. 2003 Jul 1; 31(13): 3576–3579
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC169193/
Arguments
seq_it
- iterator representing the query sequence
Errors
PSSMError::InvalidMonomer(mono)
- sequenceseq_it
contained invalid monomermono
PSSMError::QueryTooShort
- sequenceseq_id
was too short
Example
let pssm = DNAMotif::from_seqs(vec![ b“AAAA“.to_vec(), b“AATA“.to_vec(), b“AAGA“.to_vec(), b“AAAA“.to_vec(), ].as_ref(), None).unwrap(); let start_pos = pssm.score(b“CCCCCAATA“).unwrap().loc;
sourcefn info_content(&self) -> f32
fn info_content(&self) -> f32
Returns a float representing the information content of a motif; roughly the inverse of Shannon Entropy. Adapted from the information content described here: https://en.wikipedia.org/wiki/Sequence_logo#Logo_creation