# wass
Optimal transport primitives for geometry-aware distribution comparison.
Implements the Sinkhorn algorithm for entropy-regularized OT, including **unbalanced** transport for robust partial matching.
## Contract
- **Invariants (must never change)**:
- **Numerics**: All probability inputs are treated as `f32`.
- **Divergence**: `sinkhorn_divergence_same_support` is the canonical de-biased divergence; it guarantees non-negativity (modulo float noise) and symmetry.
- **Convergence**: `*_with_convergence` methods return `Err` if the residual does not fall below `tol` within `max_iter`; they do not return a partial result silently.
- **Support / Dependencies**:
- **Input**: Expects `ndarray::Array1` (masses) and `Array2` (costs).
- **Geometry**: The cost matrix $C$ encodes the ground geometry.
- **Unbalanced**: $\rho$ (rho) controls the penalty for mass destruction; $\rho \to \infty$ recovers balanced OT.
- **Exports**:
- `sinkhorn`: Standard dense OT.
- `sinkhorn_divergence_same_support`: MMD-like geometric divergence.
- `wasserstein_1d`: $O(n)$ exact solver for 1D.
Dual-licensed under MIT or Apache-2.0.
```rust
use wass::{wasserstein_1d, sinkhorn, sinkhorn_divergence_same_support};
use ndarray::array;
// 1D Wasserstein (fast, closed-form)
let a = [0.0, 0.25, 0.5, 0.25];
let b = [0.25, 0.5, 0.25, 0.0];
let w1 = wasserstein_1d(&a, &b);
// General transport with Sinkhorn
let cost = array![[0.0, 1.0], [1.0, 0.0]];
let a = array![0.5, 0.5];
let b = array![0.5, 0.5];
let (plan, distance) = sinkhorn(&a, &b, &cost, 0.1, 100);
// Sinkhorn Divergence (debiased; requires same-support cost matrix)
let div = sinkhorn_divergence_same_support(&a, &b, &cost, 0.1, 1000, 1e-4).unwrap();
```
## Key Features
- **Balanced OT**: Standard Sinkhorn for probability distributions.
- **Unbalanced OT**: Robust transport for partial matches, outliers, and unnormalized measures (e.g. document alignment).
- **Sparse OT**: L2-regularized transport for interpretable, sparse alignments (via `sparse` module).
- **Log-domain stabilization**: Numerically stable implementations for small epsilon / large costs.
- **Divergences**: Proper debiased Sinkhorn divergences (positive, definite) for metric use.
## Examples
Run these to see OT in action:
- **Robust Document Alignment**: Shows how unbalanced OT aligns core topics while ignoring outliers (headers/footers/typos).
```bash
cargo run -p wass --example noisy_ocr_matching
```
- **Mass Mismatch**: Shows how divergence scales with the unbalanced penalty parameter.
```bash
cargo run -p wass --example unbalanced_sinkhorn_mass_mismatch
```
- **Balanced Divergence**:
```bash
cargo run -p wass --example sinkhorn_divergence_same_support
```
## Functions
| `wasserstein_1d` | 1D distributions | O(n) |
| `sinkhorn` | General transport (dense) | O(n^2 x iter) |
| `sinkhorn_with_convergence` | With early stopping | O(n^2 x iter) |
| `sinkhorn_divergence_same_support` | Debiased divergence (same support) | O(n^2 x iter) |
| `sinkhorn_divergence_general` | Debiased divergence (different supports) | O(mn x iter) |
| `unbalanced_sinkhorn_divergence_general` | Robust comparison (different supports) | O(mn x iter) |
| `sparse::solve_semidual_l2` | Sparse transport (L2) | O(n^2 x iter) |
| `sliced_wasserstein` | High-dim approx | O(n_proj x n log n) |
Note: `sinkhorn_divergence` is deprecated; it only computes a true divergence when the
cost is square. Use the explicit `*_same_support` / `*_general` variants instead.
## Why Optimal Transport?
- **No support issues**: Unlike KL divergence, OT compares distributions with disjoint supports.
- **Geometry-aware**: Respects the underlying metric space (e.g. word embedding distance).
- **Robustness**: Unbalanced OT handles outliers and noise ("pizza" vs "sushi") without breaking the alignment of the signal ("AI" vs "ML").