Expand description
Collapsed EM / VBEM abundance estimation over equivalence classes.
Ports salmon’s CollapsedEMOptimizer (src/inference/CollapsedEMOptimizer.cpp):
given a finalized set of equivalence classes (each a transcript label, a
count, and per-transcript combined_weights), iteratively estimate the
expected number of fragments originating from each transcript.
The update rules match the C++ exactly:
- EM:
alphaOut[t] += count * (alphaIn[t] * w_t) / sum_j(alphaIn[j] * w_j), with single-transcript classes assigned their full count. - VBEM: replaces
alphaIn[t]withexpTheta[t] = exp(digamma(alphaIn[t] + prior_t) - digamma(sum_j(alphaIn[j] + prior_j))).
Parallelization with rayon and SQUAREM acceleration are deferred; plain iteration converges to the same fixpoint.
Re-exports§
pub use uncertainty::ambiguity_counts;pub use uncertainty::bootstrap;pub use uncertainty::gibbs_sample;pub use uncertainty::GibbsOptions;
Modules§
- uncertainty
- Posterior uncertainty: multinomial bootstrap (
CollapsedEMOptimizer::gatherBootstraps) and the non-collapsed Gibbs sampler (CollapsedGibbsSampler).
Structs§
- EmOptions
- Optimizer configuration. Defaults mirror salmon’s command-line defaults.
- EmResult
- Result of an optimization run.
- Online
Inference - Shared online-inference state, updated concurrently by the mapping workers.
- Packed
EqClasses - Flat CSR equivalence classes (only
validgroups are retained).
Functions§
- optimize
- Run the optimizer to convergence (parallel EM/VBEM over the packed layout).
- optimize_
packed - Core convergence loop over a
PackedEqClasses.parallelselects the rayon M-step (for the single main run) vs. the sequential one (used by bootstrap, which parallelizes across replicates instead). The per-classcountsare the packed structure’s own (bootstrap passes resampled counts through [run_em_counts]). - optimize_
packed_ with_ init - As
optimize_packed, but seeds the abundances frominit_alphas(a warm start, e.g. salmon’s online-estimate-blended-with-uniform initialization) when supplied; otherwise starts uniform. - optimize_
with_ init - As
optimize, but warm-starts the abundances frominit_alphas(per transcript id) when its length matchesnum_txps— used to seed the EM with salmon’s count-blended initialization (online estimates blended with uniform), which reduces the iteration count to convergence.