wass 0.1.1

Optimal transport: Wasserstein distance, Sinkhorn algorithm, and Sinkhorn divergence
Documentation

wass

Optimal transport primitives for geometry-aware distribution comparison. Implements the Sinkhorn algorithm for entropy-regularized OT, including unbalanced transport for robust partial matching.

Contract

  • Invariants (must never change):

    • Numerics: All probability inputs are treated as f32.
    • Divergence: sinkhorn_divergence_same_support is the canonical de-biased divergence; it guarantees non-negativity (modulo float noise) and symmetry.
    • Convergence: *_with_convergence methods return Err if the residual does not fall below tol within max_iter; they do not return a partial result silently.
  • Support / Dependencies:

    • Input: Expects ndarray::Array1 (masses) and Array2 (costs).
    • Geometry: The cost matrix $C$ encodes the ground geometry.
    • Unbalanced: $\rho$ (rho) controls the penalty for mass destruction; $\rho \to \infty$ recovers balanced OT.
  • Exports:

    • sinkhorn: Standard dense OT.
    • sinkhorn_divergence_same_support: MMD-like geometric divergence.
    • wasserstein_1d: $O(n)$ exact solver for 1D.

Dual-licensed under MIT or Apache-2.0.

crates.io | docs.rs

use wass::{wasserstein_1d, sinkhorn, sinkhorn_divergence_same_support};
use ndarray::array;

// 1D Wasserstein (fast, closed-form)
let a = [0.0, 0.25, 0.5, 0.25];
let b = [0.25, 0.5, 0.25, 0.0];
let w1 = wasserstein_1d(&a, &b);

// General transport with Sinkhorn
let cost = array![[0.0, 1.0], [1.0, 0.0]];
let a = array![0.5, 0.5];
let b = array![0.5, 0.5];
let (plan, distance) = sinkhorn(&a, &b, &cost, 0.1, 100);

// Sinkhorn Divergence (debiased; requires same-support cost matrix)
let div = sinkhorn_divergence_same_support(&a, &b, &cost, 0.1, 1000, 1e-4).unwrap();

Key Features

  • Balanced OT: Standard Sinkhorn for probability distributions.
  • Unbalanced OT: Robust transport for partial matches, outliers, and unnormalized measures (e.g. document alignment).
  • Sparse OT: L2-regularized transport for interpretable, sparse alignments (via sparse module).
  • Log-domain stabilization: Numerically stable implementations for small epsilon / large costs.
  • Divergences: Proper debiased Sinkhorn divergences (positive, definite) for metric use.

Examples

Run these to see OT in action:

  • Robust Document Alignment: Shows how unbalanced OT aligns core topics while ignoring outliers (headers/footers/typos).

    cargo run -p wass --example noisy_ocr_matching
    
  • Mass Mismatch: Shows how divergence scales with the unbalanced penalty parameter.

    cargo run -p wass --example unbalanced_sinkhorn_mass_mismatch
    
  • Balanced Divergence:

    cargo run -p wass --example sinkhorn_divergence_same_support
    

Functions

Function Use Case Complexity
wasserstein_1d 1D distributions O(n)
sinkhorn General transport (dense) O(n^2 x iter)
sinkhorn_with_convergence With early stopping O(n^2 x iter)
sinkhorn_divergence_same_support Debiased divergence (same support) O(n^2 x iter)
sinkhorn_divergence_general Debiased divergence (different supports) O(mn x iter)
unbalanced_sinkhorn_divergence_general Robust comparison (different supports) O(mn x iter)
sparse::solve_semidual_l2 Sparse transport (L2) O(n^2 x iter)
sliced_wasserstein High-dim approx O(n_proj x n log n)

Note: sinkhorn_divergence is deprecated; it only computes a true divergence when the cost is square. Use the explicit *_same_support / *_general variants instead.

Why Optimal Transport?

  • No support issues: Unlike KL divergence, OT compares distributions with disjoint supports.
  • Geometry-aware: Respects the underlying metric space (e.g. word embedding distance).
  • Robustness: Unbalanced OT handles outliers and noise ("pizza" vs "sushi") without breaking the alignment of the signal ("AI" vs "ML").