Skip to main content

Crate hmmer_pure_rs

Crate hmmer_pure_rs 

Source
Expand description

HMMER - Biological sequence analysis using profile hidden Markov models.

This is a Rust port of HMMER 3.4, a C tool for searching sequence databases for homologous sequences using profile HMMs.

§Quick Start

use hmmer_pure_rs::{Alphabet, Bg, Hmm, Profile, Pipeline, TopHits, OProfile};
use hmmer_pure_rs::hmmfile;
use hmmer_pure_rs::profile::{profile_config, reconfig_length, P7_LOCAL};
use hmmer_pure_rs::sequence::Sequence;
use std::path::Path;

// Load an HMM
let hmms = hmmfile::read_hmm_file(Path::new("query.hmm")).unwrap();
let hmm = &hmms[0];

// Set up alphabet, background model, and scoring profile
let abc = Alphabet::new(hmm.abc_type);
let bg = Bg::new(&abc);
let mut gm = Profile::new(hmm.m, &abc);
profile_config(hmm, &bg, &mut gm, 400, P7_LOCAL);
let mut om = OProfile::convert(&gm);

// Create pipeline and hits collector
let mut pli = Pipeline::new();
pli.new_model(&gm);
let mut th = TopHits::new();

// Search a sequence programmatically
let dsq = abc.digitize(b"ACDEFGHIKLMNPQRSTVWY");
let sq = Sequence { name: "query".into(), acc: String::new(), desc: String::new(), dsq, n: 20, l: 20 };
pli.run(&mut gm, &mut om, &bg, hmm, &sq, &mut th);

// Access results
th.sort_by_sortkey();
for hit in &th.hits {
    println!("{}: score={:.1} bits", hit.name, hit.score);
}

Re-exports§

pub use alphabet::Alphabet;
pub use bg::Bg;
pub use hmm::Hmm;
pub use pipeline::Pipeline;
pub use profile::Profile;
pub use sequence::Sequence;
pub use simd::oprofile::OProfile;
pub use tophits::TopHits;

Modules§

alphabet
Biological sequence alphabets (DNA, RNA, amino acid). Direct port of Easel’s esl_alphabet.
bg
P7_BG - Null/background model for statistical significance. Direct port of p7_bg.c.
builder
HMM builder — construct profile HMMs from multiple sequence alignments. Simplified port of p7_builder.c and build.c.
calibrate
E-value calibration by simulation. Port of evalues.c p7_Calibrate().
domaindef
Domain definition using posterior probabilities. Port of p7_domaindef.c p7_domaindef_ByPosteriorHeuristics().
dp
dsqdata
Binary sequence database format for fast reading. Simplified port of esl_dsqdata — stores digital sequences in a compact binary format.
errors
HMMER error types, mirroring Easel’s eslOK/eslFAIL/eslERANGE system.
eweight
Effective sequence number estimation. Port of eweight.c — entropy-based Neff computation.
fm_index
FM-index for fast pattern searching in DNA sequences. Used by nhmmer for long-target scanning.
hmm
P7_HMM - Core profile HMM model. Direct port of HMMER’s P7_HMM data structure.
hmmfile
HMM file I/O: reading and writing HMMER3 format HMM files. Direct port of p7_hmmfile.c.
hmmfile_binary
Binary HMM file I/O — reading C HMMER’s .h3m format. Enables interoperability with C hmmpress output.
logsum
Fast table-driven log-sum approximation for the Forward algorithm.
msa
Multiple Sequence Alignment (MSA) I/O — Stockholm format.
output
Output formatting utilities to match C printf behavior.
pipeline
Search pipeline using SIMD MSV + Viterbi + Forward filters.
pressed
Read/write C HMMER pressed database format (.h3f, .h3p, .h3i). Enables reading databases created by C hmmpress.
prior
Dirichlet mixture priors for HMM parameterization. Simplified port of p7_prior.c.
profile
P7_PROFILE - Scoring profile for sequence comparison. Direct port of p7_profile.c and modelconfig.c.
seqmodel
Build an HMM from a single sequence using a substitution matrix. Simplified port of seqmodel.c p7_Seqmodel().
sequence
Biological sequence type and FASTA I/O. Simplified port of esl_sq and esl_sqio_ascii for FASTA format.
simd
spensemble
Segment pair ensemble clustering for multidomain regions. Simplified port of p7_spensemble.c.
ssi
Simple sequence/HMM index for fast random access by name. Simplified alternative to Easel’s esl_ssi.c.
stats
tophits
P7_TOPHITS - Ranked hit list for search results. Simplified port focused on what hmmsearch needs.
trace
P7_TRACE — Viterbi traceback path through the HMM. Port of p7_trace.c and generic_vtrace.c.
util