flowmatch
Flow matching.
Problem
You have a set of target points -- protein backbone angles, earthquake epicenters, token embeddings -- and want to train a vector field that transforms Gaussian noise into samples from the same distribution. Flow matching [1] does this by regressing a conditional vector field along straight (or geodesic) interpolation paths, then sampling via ODE integration.
This library provides the training loop, OT-based coupling, ODE integration, and evaluation metrics. It works on flat spaces and on Riemannian manifolds.
Examples
Transport noise to discrete targets (simplest case). Semidiscrete FM pairs Gaussian noise with fixed target points via optimal transport, trains a linear conditional field, and integrates an ODE to produce samples:
n=16 d=8
pot_cfg: steps=2000 batch=1024 seed=7
fm_cfg: steps=800 batch=256 lr=0.008 seed=9 euler_steps=40
sample_mse_to_assigned_y = 0.0367
Straighter trajectories via minibatch OT. Rectified flow matching [7] uses Sinkhorn coupling within each minibatch so that noise-to-data paths cross less, reducing integration error:
sample_mse_to_assigned_y = 0.0684
Protein torsion angles on a torus. Backbone phi/psi angles live on S1 x S1. This example trains on real angles from PDB 1BPI (BPTI), then measures sample quality by JS divergence between generated and observed Ramachandran histograms:
PDB 1BPI φ/ψ (n=56) as a torus via R^4 embedding
Ramachandran histogram JS divergence (lower is better):
- baseline (Gaussian decode): 0.6391
- trained (RFM+minibatch OT): 0.4105
- ratio trained/baseline: 0.642
Earthquake locations on a sphere. USGS M6+ earthquake epicenters (2024) mapped to S^2. Evaluation uses entropic OT cost between generated and observed locations:
USGS earthquakes (n=50), embedding=R^3 with S^2 projection
OT cost (lower is better):
- baseline (near-noise): 0.6496
- trained (RFM+minibatch OT): 0.3129
- ratio trained/baseline: 0.482
Some generated samples (lat, lon):
0: lat= 12.63°, lon= -104.96°
1: lat= 58.20°, lon= 169.16°
2: lat= -13.11°, lon= -167.62°
3: lat= -35.47°, lon= -79.28°
Geodesics on the Poincare ball. Riemannian ODE integration on hyperbolic space, using the skel::Manifold trait implemented by hyperball:
See examples/ for more, including discrete CTMC, two-moons, conditional 2D, Burn backend variants (--features burn), Riemannian Poincare disk (--features riemannian), and cluster/community evaluation (--features sheaf-evals).
What it provides
Training: Semidiscrete FM, rectified flow matching with minibatch OT coupling, time schedules (uniform, U-shaped, logit-normal).
Sampling: Fixed-step ODE integrators (Euler, Heun) for Euclidean and Riemannian manifolds.
Coupling: Sinkhorn OT pairing, greedy matching, partial/selective pairing for outlier handling.
Discrete FM: CTMC generator scaffolding with cosine-squared schedule [3], conditional probability paths, conditional rate matrices.
Evaluation: JS divergence on histograms, entropic OT cost.
Status
MSRV: 1.80.
Tests
References
- Lipman et al., Flow Matching for Generative Modeling (2022)
- Lipman et al., Flow Matching Guide and Code (2024) -- comprehensive tutorial
- Gat et al., Discrete Flow Matching (NeurIPS 2024) -- CTMC-based discrete FM
- Chen & Lipman, Riemannian Flow Matching on General Geometries (2023)
- de Kruiff et al., Pullback Flow Matching on Data Manifolds (2024) -- FM on implicit manifolds without closed-form exp/log maps
- Sherry & Smets, Flow Matching on Lie Groups (2025) -- specialization to SO(3) and SE(3)
- Liu et al., Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow (2022) -- rectified flow
License
MIT OR Apache-2.0