wass 0.1.1

Optimal transport: Wasserstein distance, Sinkhorn algorithm, and Sinkhorn divergence
Documentation
# wass

Optimal transport primitives for geometry-aware distribution comparison.
Implements the Sinkhorn algorithm for entropy-regularized OT, including **unbalanced** transport for robust partial matching.

## Contract

- **Invariants (must never change)**:
  - **Numerics**: All probability inputs are treated as `f32`.
  - **Divergence**: `sinkhorn_divergence_same_support` is the canonical de-biased divergence; it guarantees non-negativity (modulo float noise) and symmetry.
  - **Convergence**: `*_with_convergence` methods return `Err` if the residual does not fall below `tol` within `max_iter`; they do not return a partial result silently.

- **Support / Dependencies**:
  - **Input**: Expects `ndarray::Array1` (masses) and `Array2` (costs).
  - **Geometry**: The cost matrix $C$ encodes the ground geometry.
  - **Unbalanced**: $\rho$ (rho) controls the penalty for mass destruction; $\rho \to \infty$ recovers balanced OT.

- **Exports**:
  - `sinkhorn`: Standard dense OT.
  - `sinkhorn_divergence_same_support`: MMD-like geometric divergence.
  - `wasserstein_1d`: $O(n)$ exact solver for 1D.

Dual-licensed under MIT or Apache-2.0.

[crates.io]https://crates.io/crates/wass | [docs.rs]https://docs.rs/wass

```rust
use wass::{wasserstein_1d, sinkhorn, sinkhorn_divergence_same_support};
use ndarray::array;

// 1D Wasserstein (fast, closed-form)
let a = [0.0, 0.25, 0.5, 0.25];
let b = [0.25, 0.5, 0.25, 0.0];
let w1 = wasserstein_1d(&a, &b);

// General transport with Sinkhorn
let cost = array![[0.0, 1.0], [1.0, 0.0]];
let a = array![0.5, 0.5];
let b = array![0.5, 0.5];
let (plan, distance) = sinkhorn(&a, &b, &cost, 0.1, 100);

// Sinkhorn Divergence (debiased; requires same-support cost matrix)
let div = sinkhorn_divergence_same_support(&a, &b, &cost, 0.1, 1000, 1e-4).unwrap();
```

## Key Features

- **Balanced OT**: Standard Sinkhorn for probability distributions.
- **Unbalanced OT**: Robust transport for partial matches, outliers, and unnormalized measures (e.g. document alignment).
- **Sparse OT**: L2-regularized transport for interpretable, sparse alignments (via `sparse` module).
- **Log-domain stabilization**: Numerically stable implementations for small epsilon / large costs.
- **Divergences**: Proper debiased Sinkhorn divergences (positive, definite) for metric use.

## Examples

Run these to see OT in action:

- **Robust Document Alignment**: Shows how unbalanced OT aligns core topics while ignoring outliers (headers/footers/typos).
  ```bash
  cargo run -p wass --example noisy_ocr_matching
  ```

- **Mass Mismatch**: Shows how divergence scales with the unbalanced penalty parameter.
  ```bash
  cargo run -p wass --example unbalanced_sinkhorn_mass_mismatch
  ```

- **Balanced Divergence**:
  ```bash
  cargo run -p wass --example sinkhorn_divergence_same_support
  ```

## Functions

| Function | Use Case | Complexity |
|----------|----------|------------|
| `wasserstein_1d` | 1D distributions | O(n) |
| `sinkhorn` | General transport (dense) | O(n^2 x iter) |
| `sinkhorn_with_convergence` | With early stopping | O(n^2 x iter) |
| `sinkhorn_divergence_same_support` | Debiased divergence (same support) | O(n^2 x iter) |
| `sinkhorn_divergence_general` | Debiased divergence (different supports) | O(mn x iter) |
| `unbalanced_sinkhorn_divergence_general` | Robust comparison (different supports) | O(mn x iter) |
| `sparse::solve_semidual_l2` | Sparse transport (L2) | O(n^2 x iter) |
| `sliced_wasserstein` | High-dim approx | O(n_proj x n log n) |

Note: `sinkhorn_divergence` is deprecated; it only computes a true divergence when the
cost is square. Use the explicit `*_same_support` / `*_general` variants instead.

## Why Optimal Transport?

- **No support issues**: Unlike KL divergence, OT compares distributions with disjoint supports.
- **Geometry-aware**: Respects the underlying metric space (e.g. word embedding distance).
- **Robustness**: Unbalanced OT handles outliers and noise ("pizza" vs "sushi") without breaking the alignment of the signal ("AI" vs "ML").