Crate wass

Expand description

§wass

Optimal transport: move mass from one distribution to another at minimum cost.

§The Problem

Given two probability distributions (piles of sand), find the cheapest way to transform one into the other. The “cost” is how much mass moves times how far it moves.

§Key Functions

Function	Use Case	Complexity
`wasserstein_1d`	1D distributions	O(n log n)
`sinkhorn`	General transport (dense)	O(n² × iterations)
`sparse::solve_semidual_l2`	Sparse transport (L2)	O(n² × L-BFGS iter)
`earth_mover_distance`	Exact transport	O(n³)

§Quick Start

use wass::{wasserstein_1d, sinkhorn, sinkhorn_divergence};
use ndarray::array;

// 1D Wasserstein (fast, closed-form)
let a = [0.0, 0.25, 0.5, 0.25];
let b = [0.25, 0.5, 0.25, 0.0];
let w1 = wasserstein_1d(&a, &b);

// General transport with Sinkhorn
let cost = array![[0.0, 1.0], [1.0, 0.0]];
let a = array![0.5, 0.5];
let b = array![0.5, 0.5];
let (plan, distance) = sinkhorn(&a, &b, &cost, 0.1, 100);

// Sinkhorn Divergence (Robust Distribution Metric)
let div = sinkhorn_divergence(&a, &b, &cost, 0.1, 100);

§Why Optimal Transport?

Better distance metric: Unlike KL divergence, OT compares distributions with different supports (no divide-by-zero issues)
Geometry-aware: Respects the underlying metric space
Interpolation: Can create meaningful “in-between” distributions

§Applications in ML

Wasserstein GAN: More stable training via W1 critic
Domain adaptation: Align source/target feature distributions
Embedding comparison: Compare document/image embeddings as distributions
Fair ML: Measure/minimize distribution shift across groups

§Connections

rkhs: MMD vs Wasserstein—both compare distributions
logp: KL divergence vs optimal transport
fynch: Differentiable sorting/ranking (includes a Sinkhorn-based sorter)
sparse: Sparse OT with L2 regularization (interpretable alignments)

§What Can Go Wrong

Sinkhorn not converging: Decrease epsilon or increase iterations.
Numerical overflow: Large cost/small epsilon → exp overflow. Scale costs.
Marginal mismatch: Sinkhorn assumes both margins sum to 1. Normalize inputs.
Sliced approximation bias: Fewer projections = noisier estimate.
1D vs general: wasserstein_1d only works for 1D histograms on same bins.

§References

Kantorovich (1942). “On the Translocation of Masses”
Cuturi (2013). “Sinkhorn Distances: Lightspeed Computation of Optimal Transport”
Peyré & Cuturi (2019). “Computational Optimal Transport”
Memoli (2011). “Gromov-Wasserstein Distances and Metric Measure Spaces”
Blondel, Seguy, Rolet (2018). “Smooth and Sparse OT” (AISTATS)
Chizat et al. (2018). “Scaling Algorithms for Unbalanced OT Problems”
Sejourne et al. (2023). “Unbalanced OT Meets Sliced-Wasserstein”
Rabin et al. (2012). “Wasserstein Barycenter and Its Application to Texture Mixing”

Re-exports§

pub use flow::flow_drift;
pub use flow::VectorField;

Modules§

flow: Flow / drift primitives.
gromov: Entropic Gromov-Wasserstein optimal transport.
semidiscrete: Semidiscrete optimal transport (SD-OT) primitives.
sparse: Sparse optimal transport via L2 regularization.

Enums§

Error: Optimal Transport error variants.

Functions§

earth_mover_distance: Approximate Earth Mover’s Distance (EMD) via log-domain Sinkhorn.
euclidean_cost_matrix: Create Euclidean cost matrix from point positions.
sinkhorn: Sinkhorn algorithm for entropic-regularized optimal transport (matrix scaling).
sinkhorn_divergenceDeprecated: Sinkhorn Divergence (de-biased entropic OT).
sinkhorn_divergence_general: Correct Sinkhorn divergence for different supports.
sinkhorn_divergence_same_support: De-biased Sinkhorn divergence for distributions on the same support.
sinkhorn_log: Log-domain Sinkhorn algorithm for entropic optimal transport.
sinkhorn_log_with_convergence: Sinkhorn in log-space with a convergence check on marginal constraints.
sinkhorn_with_convergence: Sinkhorn with convergence check.
sliced_wasserstein: Sliced Wasserstein distance – a scalable approximation for high dimensions.
sq_euclidean_cost_matrix: Create squared Euclidean cost matrix from point positions.
unbalanced_sinkhorn_divergence_general: Unbalanced Sinkhorn divergence for different supports.
unbalanced_sinkhorn_divergence_same_support: Unbalanced Sinkhorn divergence for measures on the same support.
unbalanced_sinkhorn_log_with_convergence: Unbalanced Sinkhorn in log-space (KL-penalized marginals), scaling form.
wasserstein_1d: 1D Wasserstein distance (Earth Mover’s Distance) via CDF integration.

Type Aliases§

Result: Result type for Optimal Transport operations.

Crate wass

Crate wass Copy item path

§wass

§The Problem

§Key Functions

§Quick Start

§Why Optimal Transport?

§Applications in ML

§Connections

§What Can Go Wrong

§References

Re-exports§

Modules§

Enums§

Functions§

Type Aliases§

Crate wass