Crate wass

Crate wass 

Source
Expand description

§wass

Optimal transport: move mass from one distribution to another at minimum cost.

§The Problem

Given two probability distributions (piles of sand), find the cheapest way to transform one into the other. The “cost” is how much mass moves times how far it moves.

§Key Functions

FunctionUse CaseComplexity
wasserstein_1d1D distributionsO(n)
sinkhornGeneral transport (dense)O(n² × iterations)
sparse::solve_semidual_l2Sparse transport (L2)O(n² × L-BFGS iter)
earth_mover_distanceExact transportO(n³)

§Quick Start

use wass::{wasserstein_1d, sinkhorn, sinkhorn_divergence_same_support};
use ndarray::array;

// 1D Wasserstein (fast, closed-form)
let a = [0.0, 0.25, 0.5, 0.25];
let b = [0.25, 0.5, 0.25, 0.0];
let w1 = wasserstein_1d(&a, &b);

// General transport with Sinkhorn
let cost = array![[0.0, 1.0], [1.0, 0.0]];
let a = array![0.5, 0.5];
let b = array![0.5, 0.5];
let (plan, distance) = sinkhorn(&a, &b, &cost, 0.1, 100);

// Sinkhorn Divergence (debiased; requires same-support cost matrix)
let div = sinkhorn_divergence_same_support(&a, &b, &cost, 0.1, 1000, 1e-4).unwrap();

§Why Optimal Transport?

  • Better distance metric: Unlike KL divergence, OT compares distributions with different supports (no divide-by-zero issues)
  • Geometry-aware: Respects the underlying metric space
  • Interpolation: Can create meaningful “in-between” distributions

§Applications in ML

  • Wasserstein GAN: More stable training via W1 critic
  • Domain adaptation: Align source/target feature distributions
  • Embedding comparison: Compare document/image embeddings as distributions
  • Fair ML: Measure/minimize distribution shift across groups

§Connections

  • rkhs: MMD vs Wasserstein—both compare distributions
  • logp: KL divergence vs optimal transport
  • fynch: Differentiable sorting/ranking (includes a Sinkhorn-based sorter)
  • sparse: Sparse OT with L2 regularization (interpretable alignments)

§What Can Go Wrong

  1. Sinkhorn not converging: Decrease epsilon or increase iterations.
  2. Numerical overflow: Large cost/small epsilon → exp overflow. Scale costs.
  3. Marginal mismatch: Sinkhorn assumes both margins sum to 1. Normalize inputs.
  4. Sliced approximation bias: Fewer projections = noisier estimate.
  5. 1D vs general: wasserstein_1d only works for 1D histograms on same bins.

§References

  • Kantorovich (1942). “On the Translocation of Masses”
  • Cuturi (2013). “Sinkhorn Distances: Lightspeed Computation of Optimal Transport”
  • Peyré & Cuturi (2019). “Computational Optimal Transport”

Re-exports§

pub use flow::flow_drift;
pub use flow::VectorField;

Modules§

flow
Flow / drift primitives.
gromov
Entropic Gromov-Wasserstein Optimal Transport.
sparse
Sparse optimal transport via L2 regularization.

Structs§

AndersonConfig
Anderson acceleration configuration for Sinkhorn fixed-point iterations.
SinkhornLogState
State for warm-starting balanced Sinkhorn iterations in log-space.

Enums§

Error
Optimal Transport error variants.

Functions§

earth_mover_distance
Earth mover’s distance with custom ground metric.
euclidean_cost_matrix
Create Euclidean cost matrix from point positions.
sinkhorn
Sinkhorn algorithm for entropic regularized optimal transport.
sinkhorn_divergenceDeprecated
Sinkhorn Divergence (De-biased Entropic OT).
sinkhorn_divergence_general
Correct Sinkhorn divergence for different supports.
sinkhorn_divergence_same_support
Correct Sinkhorn divergence when a and b live on the same support.
sinkhorn_log
Sinkhorn algorithm in log-space for numerical stability.
sinkhorn_log_progressive_with_convergence
Progressive-(\varepsilon) Sinkhorn (balanced OT), warm-starting each stage.
sinkhorn_log_with_convergence
Sinkhorn in log-space with a convergence check on marginal constraints.
sinkhorn_log_with_convergence_state
Sinkhorn in log-space with convergence checks, optionally warm-started.
sinkhorn_log_with_convergence_state_accel
Same as sinkhorn_log_with_convergence_state, but with optional Anderson acceleration.
sinkhorn_with_convergence
Sinkhorn with convergence check.
sliced_wasserstein
Sliced Wasserstein distance (fast approximation for high dimensions).
unbalanced_sinkhorn_divergence_general
Unbalanced Sinkhorn divergence for different supports.
unbalanced_sinkhorn_divergence_same_support
Unbalanced Sinkhorn divergence for measures on the same support.
unbalanced_sinkhorn_log_with_convergence
Unbalanced Sinkhorn in log-space (KL-penalized marginals), scaling form.
wasserstein_1d
1D Wasserstein distance (Earth Mover’s Distance).

Type Aliases§

Result
Result type for Optimal Transport operations.