Module bio::stats::pairhmm [−][src]
This module contains the implementation of a classic PairHMM
as described in
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis.
Current Topics in Genome Analysis 2008. http://doi.org/10.1017/CBO9780511790492.
It also contains a modified variant HomopolyPairHMM
with additional homopolymer states suited
for dealing with homopolymer runs in sequencing as often encountered in Oxford Nanopore
sequencing data.
Traits defined in this module apply to both PairHMM
and HomopolyPairHMM
.
Examples
use approx::assert_relative_eq; use bio::stats::pairhmm::{ EmissionParameters, GapParameters, PairHMM, StartEndGapParameters, XYEmission, }; use bio::stats::{LogProb, Prob}; use num_traits::Zero; // Two sequences for which we'd like to know if they are likely related. let x = b"AAAA"; let y = b"AAAT"; // For this example, we disallow gaps, so all probabilities are zero here. struct GapParams; impl GapParameters for GapParams { fn prob_gap_x(&self) -> LogProb { LogProb::zero() } fn prob_gap_y(&self) -> LogProb { LogProb::zero() } fn prob_gap_x_extend(&self) -> LogProb { LogProb::zero() } fn prob_gap_y_extend(&self) -> LogProb { LogProb::zero() } } let gap_params = GapParams; // The PairHMM instance stores the gap params, since these are constant. let mut pairhmm = PairHMM::new(&gap_params); // However, emission parameters depend on the actual sequences struct EmissionParams { x: &'static [u8], y: &'static [u8], } const PROB_SUBSTITUTION: f64 = 0.1; const PROB_NO_SUBSTITUION: f64 = 1. - PROB_SUBSTITUTION; impl EmissionParameters for EmissionParams { fn prob_emit_xy(&self, i: usize, j: usize) -> XYEmission { if self.x[i] == self.y[j] { // if two bases match, emit a Match! XYEmission::Match(LogProb::from(Prob(PROB_NO_SUBSTITUION))) } else { // otherwise emit a Mismatch! // Note that the probability here is `mismatch / 3`, since probabilities should sum // to 1 and there are 3 possible mismatch configurations XYEmission::Mismatch(LogProb::from(Prob(PROB_SUBSTITUTION / 3.))) } } // In this example, emitting x[i] is as likely as not observing a mismatch. // In more complex cases, this might e.g. depend on base qualities reported by the sequencer fn prob_emit_x(&self, i: usize) -> LogProb { LogProb::from(Prob(PROB_NO_SUBSTITUION)) } fn prob_emit_y(&self, j: usize) -> LogProb { LogProb::from(Prob(PROB_NO_SUBSTITUION)) } fn len_x(&self) -> usize { self.x.len() } fn len_y(&self) -> usize { self.y.len() } } // Since we want to do global alignment here, disallow free start and end gaps in x. struct GlobalAlignmentMode; impl StartEndGapParameters for GlobalAlignmentMode { fn free_start_gap_x(&self) -> bool { false } fn free_end_gap_x(&self) -> bool { false } } // Finally calculate the probability of relatedness between x and y! let prob_related = pairhmm.prob_related(&EmissionParams { x, y }, &GlobalAlignmentMode, None); // … and compare it to a rough estimation let prob_expected = LogProb::from(Prob(PROB_NO_SUBSTITUION.powi(3) * PROB_SUBSTITUTION / 3.)); assert_relative_eq!(*prob_related, *prob_expected, epsilon = 1e-5);
Structs
HomopolyPairHMM | A pair Hidden Markov Model for comparing sequences x and y as described by Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis. Current Topics in Genome Analysis 2008. http://doi.org/10.1017/CBO9780511790492. The default model has been extended to consider homopolymer errors, at the cost of more states and transitions. |
PairHMM | A pair Hidden Markov Model for comparing sequences x and y as described by Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis. Current Topics in Genome Analysis 2008. http://doi.org/10.1017/CBO9780511790492. |
Enums
XYEmission |
Traits
Emission | Trait needed for the |
EmissionParameters | Trait for parametrization of |
GapParameters | Trait for parametrization of |
HopParameters | Trait for parametrization of |
StartEndGapParameters | Trait for parametrization of |