Crate mixt

Crate mixt 

Source
Expand description

mixt provides implementations for several optimization algorithms that infer the K mixture model weights for a N x K log-likelihood matrix.

§Features

  • GPU support
  • Rust library
  • C++ bindings
  • Minimal CLI

§Installation

By default, mixt is available as a Rust library with support for the Wgpu and NdArray backends for burn. Several other options are available.

§Command-line client

A minimal CLI for testing purposes can be compiled with --feature cli.

§The burn backend

The Wgpu backend can be changed to use the alternative vulkan or webgpu implementations by adding --feature vulkan or --feature webgpu.

More backends from burn can be implemented if needed, make a request in the source repository or implement it yourself in BurnBackend and optimize_flat, and by adding the appropriate feature to Cargo.toml. The rest of the code is designed in a backend-agnostic manner.

§C++ bindings

mixt provides bindings for running inference with 32-bit floating point inputs. These can be compiled by adding --feature cxxbridge.

The C++ bindings create the libmixt.a, mixt_cxx.cpp, and mixt_cxx.h files that can be included in a C++ project to call the mixt C++ API.

A CMake file is provided to configure the flags passed to cargo when building the bindings.

§API

The library provides several high-level functions to run on 2D vector inputs, flattened vectors, or tensor data.

Low-level functions are available in the optimizer module.

Returns the mixing proportions that best fit the model corresponding to log_likelihood with integer weights for each column given in counts. Typically, counts is the number of times the likelihood vector in each column was observed but can be any weight vector.

§Inputs and outputs

See one of the following functions, in order of suggested preference:

§C++ API

The C++ API provides four functions to peform inference:

  • rcg_optl_cpu: run rcg with the NdArray backend.
  • rcg_optl_gpu: run rcg with the Wgpu backend.
  • em_optl_cpu: run em with the NdArray backend.
  • em_optl_gpu: run em with the Wgpu backend.

An additional convenience function mixture_components is provided to convert the inference results to mixing proportions.

§Inputs

The C++ API main functions all take the following inputs:

  • logl: flattened column-major n_cols x n_rows log-likelihood matrix.
  • log_times_observed: n_rows vector of natural logarithm of the weights for logl_f rows.
  • alpha0: n_cols vector of prior counts for the Dirichlet model.
  • tol: optimizer tolerance for convergence checking.
  • max_iters: maximum number of iterations to run the optimizer for.

The first 3 arguments expect a std::vector<float>, the tolerance is given as a double and maximum iterations as a size_t

§Outputs

All four functions return a single Rust vector that contains the flattened n_cols x n_rows column-major matrix containing inferred probabilities that the row i was generated from cluster j.

The output can be converted to a std::vector by using for example the following code

auto probs_rs = mixt::rcg_optl_gpu(loglls, log_counts, alpha0, (double)0.00001, (size_t)1000);
probs_cpp.reserve((uint64_t)((uint64_t)n_groups * (uint64_t)n_obs));
for (auto &val : probs_rs) {
    probs_cpp.push_back(val);
}

§Using the optimizers

The high-level API can be customized with several options and prior counts, detailed below.

§Options

Use OptimizerOpts to change the following:

  • Tolerance for convergence checking via opts.tolerance.
  • Maximum number of iterations via opts.max_iters.
  • Run on CPU or GPU using opts.device (see BurnBackend for details).
  • Set floating point precision to 32 or 64 bits via opts.device.

See OptimizerOpts for more details.

§Prior

Prior for the Dirichlet model mixing proportions is given via prior. Values in prior can be interpreted as the observation counts from each category that were observed before generating the log likelihood matrix logl for the current data.

Assumes a conjugate Dirichlet model, meaning that the mixing proportions from a previously fitted model (weighted by the total observation count) can be used as a prior when estimating a new dataset.

§Reading

The mixt variational inference algorithm rcg was originally a part of the mSWEEP software described in:

  • Mäklin et al. (2020) “High-resolution sweep metagenomics using fast probabilistic inference”, Wellcome open research. doi: 10.12688/wellcomeopenres.15639.2.
  • Mäklin (2022) “Probabilistic methods for high-resolution metagenomics” chapter 2.3.7, Series of publications A / Department of Computer Science, University of Helsinki. ISBN: 978-951-51-8695-9.

The expectation-maximization algorithm em and the original rcg GPU implementations are described in

  • Pietiläinen (2025) “Accelerating mixture model inference for bacterial community estimation using GPU computing”, University of Helsinki. urn: hulib-202501301212.

Modules§

math
Tensor math used in optimizer algorithms
optimizer
Algorithm implementations

Structs§

OptimizerOpts
Options for optimizer algorithms.

Enums§

BurnBackend
Backend type for burn

Functions§

optimize
Run optimizer on 2D f32 log-likelihoods and integer weights.
optimize_flat
Run on flattened f32 vector inputs
optimize_tensor
Run on Tensor inputs
run_optimizer
Helper function to run on a generic backend