Expand description
mixt provides implementations for several optimization algorithms that
infer the K mixture model weights for a N x K log-likelihood matrix.
§Features
- GPU support
- Rust library
- C++ bindings
- Minimal CLI
§Installation
By default, mixt is available as a Rust library with support for the Wgpu and NdArray backends for burn. Several other options are available.
§Command-line client
A minimal CLI for testing purposes can be compiled with --feature cli.
§The burn backend
The Wgpu backend can be changed to use the alternative vulkan or webgpu
implementations by adding --feature vulkan or --feature webgpu.
More backends from burn can be implemented if needed, make a request in the source repository or implement it yourself in BurnBackend and optimize_flat, and by adding the appropriate feature to Cargo.toml. The rest of the code is designed in a backend-agnostic manner.
§C++ bindings
mixt provides bindings for running inference with 32-bit floating point
inputs. These can be compiled by adding --feature cxxbridge.
The C++ bindings create the libmixt.a, mixt_cxx.cpp, and
mixt_cxx.h files that can be included in a C++ project to call the
mixt C++ API.
A CMake file is provided to configure the flags passed to cargo when building the bindings.
§API
The library provides several high-level functions to run on 2D vector inputs, flattened vectors, or tensor data.
Low-level functions are available in the optimizer module.
Returns the mixing proportions that best fit the model corresponding to
log_likelihood with integer weights for each column given in counts.
Typically, counts is the number of times the likelihood vector in each
column was observed but can be any weight vector.
§Inputs and outputs
See one of the following functions, in order of suggested preference:
§C++ API
The C++ API provides four functions to peform inference:
rcg_optl_cpu: run rcg with the NdArray backend.rcg_optl_gpu: run rcg with the Wgpu backend.em_optl_cpu: run em with the NdArray backend.em_optl_gpu: run em with the Wgpu backend.
An additional convenience function mixture_components is provided to
convert the inference results to mixing proportions.
§Inputs
The C++ API main functions all take the following inputs:
logl: flattened column-majorn_cols x n_rowslog-likelihood matrix.log_times_observed:n_rowsvector of natural logarithm of the weights forlogl_frows.alpha0:n_colsvector of prior counts for the Dirichlet model.tol: optimizer tolerance for convergence checking.max_iters: maximum number of iterations to run the optimizer for.
The first 3 arguments expect a std::vector<float>, the tolerance is given
as a double and maximum iterations as a size_t
§Outputs
All four functions return a single Rust vector that contains the flattened
n_cols x n_rows column-major matrix containing inferred probabilities that
the row i was generated from cluster j.
The output can be converted to a std::vector by using for example the following code
auto probs_rs = mixt::rcg_optl_gpu(loglls, log_counts, alpha0, (double)0.00001, (size_t)1000);
probs_cpp.reserve((uint64_t)((uint64_t)n_groups * (uint64_t)n_obs));
for (auto &val : probs_rs) {
probs_cpp.push_back(val);
}§Using the optimizers
The high-level API can be customized with several options and prior counts, detailed below.
§Options
Use OptimizerOpts to change the following:
- Tolerance for convergence checking via
opts.tolerance. - Maximum number of iterations via
opts.max_iters. - Run on CPU or GPU using
opts.device(see BurnBackend for details). - Set floating point precision to 32 or 64 bits via
opts.device.
See OptimizerOpts for more details.
§Prior
Prior for the Dirichlet model mixing proportions is given via prior.
Values in prior can be interpreted as the observation counts from each
category that were observed before generating the log likelihood matrix
logl for the current data.
Assumes a conjugate Dirichlet model, meaning that the mixing proportions from a previously fitted model (weighted by the total observation count) can be used as a prior when estimating a new dataset.
§Reading
The mixt variational inference algorithm rcg was originally a part of the mSWEEP software described in:
- Mäklin et al. (2020) “High-resolution sweep metagenomics using fast probabilistic inference”, Wellcome open research. doi: 10.12688/wellcomeopenres.15639.2.
- Mäklin (2022) “Probabilistic methods for high-resolution metagenomics” chapter 2.3.7, Series of publications A / Department of Computer Science, University of Helsinki. ISBN: 978-951-51-8695-9.
The expectation-maximization algorithm em and the original rcg GPU implementations are described in
- Pietiläinen (2025) “Accelerating mixture model inference for bacterial community estimation using GPU computing”, University of Helsinki. urn: hulib-202501301212.
Modules§
Structs§
- Optimizer
Opts - Options for optimizer algorithms.
Enums§
- Burn
Backend - Backend type for burn
Functions§
- optimize
- Run optimizer on 2D f32 log-likelihoods and integer weights.
- optimize_
flat - Run on flattened f32 vector inputs
- optimize_
tensor - Run on Tensor inputs
- run_
optimizer - Helper function to run on a generic backend