Module bio::stats::pairhmm[][src]

This module contains the implementation of a classic PairHMM as described in Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis. Current Topics in Genome Analysis 2008. http://doi.org/10.1017/CBO9780511790492. It also contains a modified variant HomopolyPairHMM with additional homopolymer states suited for dealing with homopolymer runs in sequencing as often encountered in Oxford Nanopore sequencing data.

Traits defined in this module apply to both PairHMM and HomopolyPairHMM.

Examples

use approx::assert_relative_eq;
use bio::stats::pairhmm::{
    EmissionParameters, GapParameters, PairHMM, StartEndGapParameters, XYEmission,
};
use bio::stats::{LogProb, Prob};
use num_traits::Zero;

// Two sequences for which we'd like to know if they are likely related.
let x = b"AAAA";
let y = b"AAAT";

// For this example, we disallow gaps, so all probabilities are zero here.
struct GapParams;
impl GapParameters for GapParams {
    fn prob_gap_x(&self) -> LogProb {
        LogProb::zero()
    }
    fn prob_gap_y(&self) -> LogProb {
        LogProb::zero()
    }
    fn prob_gap_x_extend(&self) -> LogProb {
        LogProb::zero()
    }
    fn prob_gap_y_extend(&self) -> LogProb {
        LogProb::zero()
    }
}
let gap_params = GapParams;

// The PairHMM instance stores the gap params, since these are constant.
let mut pairhmm = PairHMM::new(&gap_params);

// However, emission parameters depend on the actual sequences
struct EmissionParams {
    x: &'static [u8],
    y: &'static [u8],
}

const PROB_SUBSTITUTION: f64 = 0.1;
const PROB_NO_SUBSTITUION: f64 = 1. - PROB_SUBSTITUTION;
impl EmissionParameters for EmissionParams {
    fn prob_emit_xy(&self, i: usize, j: usize) -> XYEmission {
        if self.x[i] == self.y[j] {
            // if two bases match, emit a Match!
            XYEmission::Match(LogProb::from(Prob(PROB_NO_SUBSTITUION)))
        } else {
            // otherwise emit a Mismatch!
            // Note that the probability here is `mismatch / 3`, since probabilities should sum
            // to 1 and there are 3 possible mismatch configurations
            XYEmission::Mismatch(LogProb::from(Prob(PROB_SUBSTITUTION / 3.)))
        }
    }

    // In this example, emitting x[i] is as likely as not observing a mismatch.
    // In more complex cases, this might e.g. depend on base qualities reported by the sequencer
    fn prob_emit_x(&self, i: usize) -> LogProb {
        LogProb::from(Prob(PROB_NO_SUBSTITUION))
    }
    fn prob_emit_y(&self, j: usize) -> LogProb {
        LogProb::from(Prob(PROB_NO_SUBSTITUION))
    }

    fn len_x(&self) -> usize {
        self.x.len()
    }
    fn len_y(&self) -> usize {
        self.y.len()
    }
}

// Since we want to do global alignment here, disallow free start and end gaps in x.
struct GlobalAlignmentMode;
impl StartEndGapParameters for GlobalAlignmentMode {
    fn free_start_gap_x(&self) -> bool {
        false
    }
    fn free_end_gap_x(&self) -> bool {
        false
    }
}

// Finally calculate the probability of relatedness between x and y!
let prob_related = pairhmm.prob_related(&EmissionParams { x, y }, &GlobalAlignmentMode, None);

// … and compare it to a rough estimation
let prob_expected = LogProb::from(Prob(PROB_NO_SUBSTITUION.powi(3) * PROB_SUBSTITUTION / 3.));
assert_relative_eq!(*prob_related, *prob_expected, epsilon = 1e-5);

Structs

HomopolyPairHMM

A pair Hidden Markov Model for comparing sequences x and y as described by Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis. Current Topics in Genome Analysis 2008. http://doi.org/10.1017/CBO9780511790492. The default model has been extended to consider homopolymer errors, at the cost of more states and transitions.

PairHMM

A pair Hidden Markov Model for comparing sequences x and y as described by Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis. Current Topics in Genome Analysis 2008. http://doi.org/10.1017/CBO9780511790492.

Enums

XYEmission

Traits

Emission

Trait needed for the HomopolyPairHMM, because its implementation details depend on the actual bases to distinguish between Match states.

EmissionParameters

Trait for parametrization of PairHMM emission behavior.

GapParameters

Trait for parametrization of PairHMM gap behavior.

HopParameters

Trait for parametrization of PairHMM hop behavior.

StartEndGapParameters

Trait for parametrization of PairHMM start and end gap behavior. This trait can be used to implement global and semiglobal alignments.