wass

Optimal transport primitives for geometry-aware distribution comparison. Implements the Sinkhorn algorithm for entropy-regularized OT, including unbalanced transport for robust partial matching.

Contract

Invariants (must never change):
- Numerics: All probability inputs are treated as f32.
- Divergence: sinkhorn_divergence_same_support is the canonical de-biased divergence; it guarantees non-negativity (modulo float noise) and symmetry.
- Convergence: *_with_convergence methods return Err if the residual does not fall below tol within max_iter; they do not return a partial result silently.
Support / Dependencies:
- Input: Expects ndarray::Array1 (masses) and Array2 (costs).
- Geometry: The cost matrix $C$ encodes the ground geometry.
- Unbalanced: $\rho$ (rho) controls the penalty for mass destruction; $\rho \to \infty$ recovers balanced OT.
Exports:
- sinkhorn: Standard dense OT.
- sinkhorn_divergence_same_support: MMD-like geometric divergence.
- wasserstein_1d: $O(n)$ exact solver for 1D.

Dual-licensed under MIT or Apache-2.0.

crates.io | docs.rs

use wass::{wasserstein_1d, sinkhorn, sinkhorn_divergence_same_support};
use ndarray::array;

// 1D Wasserstein (fast, closed-form)
let a = [0.0, 0.25, 0.5, 0.25];
let b = [0.25, 0.5, 0.25, 0.0];
let w1 = wasserstein_1d(&a, &b);

// General transport with Sinkhorn
let cost = array![[0.0, 1.0], [1.0, 0.0]];
let a = array![0.5, 0.5];
let b = array![0.5, 0.5];
let (plan, distance) = sinkhorn(&a, &b, &cost, 0.1, 100);

// Sinkhorn Divergence (debiased; requires same-support cost matrix)
let div = sinkhorn_divergence_same_support(&a, &b, &cost, 0.1, 1000, 1e-4).unwrap();

Key Features

Balanced OT: Standard Sinkhorn for probability distributions.
Unbalanced OT: Robust transport for partial matches, outliers, and unnormalized measures (e.g. document alignment).
Sparse OT: L2-regularized transport for interpretable, sparse alignments (via sparse module).
Log-domain stabilization: Numerically stable implementations for small epsilon / large costs.
Divergences: Proper debiased Sinkhorn divergences (positive, definite) for metric use.

Examples

Run these to see OT in action:

Robust Document Alignment: Shows how unbalanced OT aligns core topics while ignoring outliers (headers/footers/typos).
```
cargo run -p wass --example noisy_ocr_matching
```
Mass Mismatch: Shows how divergence scales with the unbalanced penalty parameter.
```
cargo run -p wass --example unbalanced_sinkhorn_mass_mismatch
```

Balanced Divergence:

cargo run -p wass --example sinkhorn_divergence_same_support

Functions

Function	Use Case	Complexity
`wasserstein_1d`	1D distributions	O(n)
`sinkhorn`	General transport (dense)	O(n^2 x iter)
`sinkhorn_with_convergence`	With early stopping	O(n^2 x iter)
`sinkhorn_divergence_same_support`	Debiased divergence (same support)	O(n^2 x iter)
`sinkhorn_divergence_general`	Debiased divergence (different supports)	O(mn x iter)
`unbalanced_sinkhorn_divergence_general`	Robust comparison (different supports)	O(mn x iter)
`sparse::solve_semidual_l2`	Sparse transport (L2)	O(n^2 x iter)
`sliced_wasserstein`	High-dim approx	O(n_proj x n log n)

Note: sinkhorn_divergence is deprecated; it only computes a true divergence when the cost is square. Use the explicit *_same_support / *_general variants instead.

Why Optimal Transport?