Crate salmon_infer

Expand description

Collapsed EM / VBEM abundance estimation over equivalence classes.

Ports salmon’s CollapsedEMOptimizer (src/inference/CollapsedEMOptimizer.cpp): given a finalized set of equivalence classes (each a transcript label, a count, and per-transcript combined_weights), iteratively estimate the expected number of fragments originating from each transcript.

The update rules match the C++ exactly:

EM: alphaOut[t] += count * (alphaIn[t] * w_t) / sum_j(alphaIn[j] * w_j), with single-transcript classes assigned their full count.
VBEM: replaces alphaIn[t] with expTheta[t] = exp(digamma(alphaIn[t] + prior_t) - digamma(sum_j(alphaIn[j] + prior_j))).

Parallelization with rayon and SQUAREM acceleration are deferred; plain iteration converges to the same fixpoint.

Re-exports§

pub use uncertainty::ambiguity_counts;
pub use uncertainty::bootstrap;
pub use uncertainty::gibbs_sample;
pub use uncertainty::GibbsOptions;

Modules§

uncertainty: Posterior uncertainty: multinomial bootstrap (CollapsedEMOptimizer::gatherBootstraps) and the non-collapsed Gibbs sampler (CollapsedGibbsSampler).

Structs§

EmOptions: Optimizer configuration. Defaults mirror salmon’s command-line defaults.
EmResult: Result of an optimization run.
OnlineInference: Shared online-inference state, updated concurrently by the mapping workers.
PackedEqClasses: Flat CSR equivalence classes (only valid groups are retained).

Functions§

optimize: Run the optimizer to convergence (parallel EM/VBEM over the packed layout).
optimize_packed: Core convergence loop over a PackedEqClasses. parallel selects the rayon M-step (for the single main run) vs. the sequential one (used by bootstrap, which parallelizes across replicates instead). The per-class counts are the packed structure’s own (bootstrap passes resampled counts through [run_em_counts]).
optimize_packed_with_init: As optimize_packed, but seeds the abundances from init_alphas (a warm start, e.g. salmon’s online-estimate-blended-with-uniform initialization) when supplied; otherwise starts uniform.
optimize_with_init: As optimize, but warm-starts the abundances from init_alphas (per transcript id) when its length matches num_txps — used to seed the EM with salmon’s count-blended initialization (online estimates blended with uniform), which reduces the iteration count to convergence.