Skip to main content

Crate wass

Crate wass 

Source
Expand description

§wass

Optimal transport: move mass from one distribution to another at minimum cost.

§The Problem

Given two probability distributions (piles of sand), find the cheapest way to transform one into the other. The “cost” is how much mass moves times how far it moves.

§Key Functions

FunctionUse CaseComplexity
wasserstein_1d1D distributionsO(n log n)
sinkhornGeneral transport (dense)O(n² × iterations)
sparse::solve_semidual_l2Sparse transport (L2)O(n² × L-BFGS iter)
earth_mover_distanceExact transportO(n³)

§Quick Start

use wass::{wasserstein_1d, sinkhorn, sinkhorn_divergence};
use ndarray::array;

// 1D Wasserstein (fast, closed-form)
let a = [0.0, 0.25, 0.5, 0.25];
let b = [0.25, 0.5, 0.25, 0.0];
let w1 = wasserstein_1d(&a, &b);

// General transport with Sinkhorn
let cost = array![[0.0, 1.0], [1.0, 0.0]];
let a = array![0.5, 0.5];
let b = array![0.5, 0.5];
let (plan, distance) = sinkhorn(&a, &b, &cost, 0.1, 100);

// Sinkhorn Divergence (Robust Distribution Metric)
let div = sinkhorn_divergence(&a, &b, &cost, 0.1, 100);

§Why Optimal Transport?

  • Better distance metric: Unlike KL divergence, OT compares distributions with different supports (no divide-by-zero issues)
  • Geometry-aware: Respects the underlying metric space
  • Interpolation: Can create meaningful “in-between” distributions

§Applications in ML

  • Wasserstein GAN: More stable training via W1 critic
  • Domain adaptation: Align source/target feature distributions
  • Embedding comparison: Compare document/image embeddings as distributions
  • Fair ML: Measure/minimize distribution shift across groups

§Connections

  • rkhs: MMD vs Wasserstein—both compare distributions
  • logp: KL divergence vs optimal transport
  • fynch: Differentiable sorting/ranking (includes a Sinkhorn-based sorter)
  • sparse: Sparse OT with L2 regularization (interpretable alignments)

§What Can Go Wrong

  1. Sinkhorn not converging: Decrease epsilon or increase iterations.
  2. Numerical overflow: Large cost/small epsilon → exp overflow. Scale costs.
  3. Marginal mismatch: Sinkhorn assumes both margins sum to 1. Normalize inputs.
  4. Sliced approximation bias: Fewer projections = noisier estimate.
  5. 1D vs general: wasserstein_1d only works for 1D histograms on same bins.

§References

  • Kantorovich (1942). “On the Translocation of Masses”
  • Cuturi (2013). “Sinkhorn Distances: Lightspeed Computation of Optimal Transport”
  • Peyré & Cuturi (2019). “Computational Optimal Transport”
  • Memoli (2011). “Gromov-Wasserstein Distances and Metric Measure Spaces”
  • Blondel, Seguy, Rolet (2018). “Smooth and Sparse OT” (AISTATS)
  • Chizat et al. (2018). “Scaling Algorithms for Unbalanced OT Problems”
  • Sejourne et al. (2023). “Unbalanced OT Meets Sliced-Wasserstein”
  • Rabin et al. (2012). “Wasserstein Barycenter and Its Application to Texture Mixing”

Re-exports§

pub use flow::flow_drift;
pub use flow::VectorField;

Modules§

flow
Flow / drift primitives.
gromov
Entropic Gromov-Wasserstein optimal transport.
semidiscrete
Semidiscrete optimal transport (SD-OT) primitives.
sparse
Sparse optimal transport via L2 regularization.

Enums§

Error
Optimal Transport error variants.

Functions§

earth_mover_distance
Approximate Earth Mover’s Distance (EMD) via log-domain Sinkhorn.
euclidean_cost_matrix
Create Euclidean cost matrix from point positions.
sinkhorn
Sinkhorn algorithm for entropic-regularized optimal transport (matrix scaling).
sinkhorn_divergenceDeprecated
Sinkhorn Divergence (de-biased entropic OT).
sinkhorn_divergence_general
Correct Sinkhorn divergence for different supports.
sinkhorn_divergence_same_support
De-biased Sinkhorn divergence for distributions on the same support.
sinkhorn_log
Log-domain Sinkhorn algorithm for entropic optimal transport.
sinkhorn_log_with_convergence
Sinkhorn in log-space with a convergence check on marginal constraints.
sinkhorn_with_convergence
Sinkhorn with convergence check.
sliced_wasserstein
Sliced Wasserstein distance – a scalable approximation for high dimensions.
sq_euclidean_cost_matrix
Create squared Euclidean cost matrix from point positions.
unbalanced_sinkhorn_divergence_general
Unbalanced Sinkhorn divergence for different supports.
unbalanced_sinkhorn_divergence_same_support
Unbalanced Sinkhorn divergence for measures on the same support.
unbalanced_sinkhorn_log_with_convergence
Unbalanced Sinkhorn in log-space (KL-penalized marginals), scaling form.
wasserstein_1d
1D Wasserstein distance (Earth Mover’s Distance) via CDF integration.

Type Aliases§

Result
Result type for Optimal Transport operations.