Struct bio::pattern_matching::pssm::DNAMotif

source ·

pub struct DNAMotif {
    pub scores: Array2<f32>,
    pub min_score: f32,
    pub max_score: f32,
}

Expand description

Position-specific scoring matrix for DNA sequences

Fields

scores: Array2<f32>

matrix holding weights at each position, indexed by [position, base]

min_score: f32

sum of “worst” base at each position

max_score: f32

sum of “best” base at each position

Implementations

source

impl DNAMotif

source

pub fn from_seqs(
seqs: &Vec<Vec<u8>>,
pseudos: Option<&[f32]>
) -> Result<Self, PSSMError>

Returns a Motif representing the sequences provided.

Arguments

seqs - sequences incorporated into motif
pseudos - array slice with a pseudocount for each monomer; defaults to pssm::DEF_PSEUDO for all if None is supplied

FIXME: pseudos should be an array of size MONO_CT, but that is currently impossible - see https://github.com/rust-lang/rust/issues/42863

Trait Implementations

source

impl Clone for DNAMotif

source

fn clone(&self) -> DNAMotif

Returns a copy of the value. Read more

1.0.0 · source

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

source

impl Debug for DNAMotif

source

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

source

impl From<ArrayBase<OwnedRepr<f32>, Dim<[usize; 2]>>> for DNAMotif

Return a DNAMotif wrapping an Array2 representing amino acid weights at each position. The dimensions and contents of this array are unchecked, and it is incumbent on the user to ensure the correct dimensions are used (ie, SEQ_LEN x 4), and no zeros appear in the array.

source

fn from(scores: Array2<f32>) -> Self

Converts to this type from the input type.

source

impl Motif for DNAMotif

source

const LK: [u8; 127] = _

Lookup table mapping monomer -> index

source

const MONOS: &'static [u8] = b"ATGC"

All monomers, in order corresponding to lookup table

source

const MONO_CT: usize = 4usize

Monomer count - equal to length of MONOS

source

fn rev_lk(idx: usize) -> u8

Returns the monomer associated with the given index; the reverse of lookup. Returns INVALID_MONO if the index isn’t associated with a monomer. Read more

source

fn len(&self) -> usize

Returns the length of motif

source

fn get_scores(&self) -> &Array2<f32>

Accessor - returns scores matrix

source

fn get_min_score(&self) -> f32

Return sum of “worst” base at each position

source

fn get_max_score(&self) -> f32

Return sum of “best” base at each position

source

fn get_bits() -> f32

Returns information content of a single position. Used info_content method. FIXME: this should be replaced with a CTFE … or maybe just a constant Read more

source

fn degenerate_consensus(&self) -> Vec<u8> ⓘ

Returns a representation of the motif using ambiguous codes. Primarily useful for DNA motifs, where ambiguous codes are common (eg, ‘M’ for ‘A or C’); less so for proteins, where we represent any position without a dominant amino acid as an ‘X’ Read more

source

fn seqs_to_weights(
seqs: &Vec<Vec<u8>>,
_pseudos: Option<&[f32]>
) -> Result<Array2<f32>, PSSMError>

Returns a weight matrix representing the sequences provided. This code is shared by implementations of from_seqs Read more

source

fn lookup(mono: u8) -> Result<usize, PSSMError>

Returns the index of given monomer in the scores matrix using the lookup table LK Read more

source

fn raw_score<C, T>(&self, seq_it: T) -> Result<(usize, f32, Vec<f32>), PSSMError>where
C: Borrow<u8>,
T: IntoIterator<Item = C>,

Returns the un-normalized sum of matching bases, useful for comparing matches from motifs of different lengths Read more

source

fn score<C, T>(&self, seq_it: T) -> Result<ScoredPos, PSSMError>where
C: Borrow<u8>,
T: IntoIterator<Item = C>,

Returns a ScoredPos struct representing the best match within the query sequence see: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences Nucleic Acids Res. 2003 Jul 1; 31(13): 3576–3579 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC169193/ Read more

source

fn info_content(&self) -> f32

Returns a float representing the information content of a motif; roughly the inverse of Shannon Entropy. Adapted from the information content described here: https://en.wikipedia.org/wiki/Sequence_logo#Logo_creation Read more

source