umap-rs 0.4.5

Fast, parallel, memory-efficient Rust implementation of UMAP
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
# umap-rs

Fast, parallel Rust implementation of the core UMAP algorithm. Clean, modern Rust implementation focused on performance and correctness.

Scales to 264 million samples in 2 hours on a 126-core machine (see [Performance](#performance)).

## What this is

- Core UMAP dimensionality reduction algorithm
- Fully parallelized via Rayon with memory-efficient sparse matrix construction
- Extensible metric system (Euclidean + custom metrics)
- Checkpointing and fault-tolerant training
- Dense arrays only (no sparse matrix support)
- Fit only (transform for new points not yet implemented)

See [DIVERGENCES.md](DIVERGENCES.md) for detailed comparison to Python umap-learn.

## Usage

```rust
use ndarray::Array2;
use umap::{GraphParams, Umap, UmapConfig};

// Configure UMAP parameters
let config = UmapConfig {
  n_components: 2,
  graph: GraphParams {
    n_neighbors: 15,
    ..Default::default()
  },
  ..Default::default()
};

// Create UMAP instance
let umap = Umap::new(config);

// Precompute KNN (use your favorite ANN library: pynndescent, hnswlib, etc.)
let knn_indices: Array2<u32> = /* shape (n_samples, n_neighbors) */;
let knn_dists: Array2<f32> = /* shape (n_samples, n_neighbors) */;

// Provide initialization (see Initialization section below)
// Common choices: random, PCA, or your own custom embedding
let init: Array2<f32> = /* shape (n_samples, n_components) */;

// Fit UMAP to data
let model = umap.fit(
  data.view(),
  knn_indices.view(),
  knn_dists.view(),
  init.view(),
);

// Get the embedding
let embedding = model.embedding();  // Returns ArrayView2<f32>

// Or take ownership of the embedding
let embedding = model.into_embedding();  // Returns Array2<f32>
```

### Checkpointing

UMAP training has two phases: **learning the manifold** (building the graph) and **optimizing the embedding** (running gradient descent). The first is deterministic and expensive, the second is iterative and can be interrupted.

For long training runs, you can checkpoint the optimization and resume if interrupted:

```rust
use umap_rs::{Metric, Optimizer};

// Phase 1: Learn the manifold structure from your data
// This builds the fuzzy topological graph. It's slow but deterministic - 
// same inputs always give same manifold.
let manifold = umap.learn_manifold(data.view(), knn_indices.view(), knn_dists.view());

// Phase 2: Optimize the embedding via gradient descent
// Create an optimizer that will run 500 epochs of SGD
let metric = umap_rs::EuclideanMetric;
let mut opt = Optimizer::new(manifold, init, 500, &config, metric.metric_type());

// Train in chunks of 10 epochs at a time
while opt.remaining_epochs() > 0 {
  opt.step_epochs(10, &metric);  // Run 10 more epochs
  
  // Periodically save a checkpoint (embedding + all optimization state)
  if opt.current_epoch() % 50 == 0 {
    std::fs::write(
      format!("checkpoint_{}.bin", opt.current_epoch()),
      bincode::serialize(&opt)?
    )?;
  }
}

// Training done - convert to final lightweight model
let fitted = opt.into_fitted(config);
```

If your process is interrupted, load the checkpoint and continue:

```rust
// Deserialize the optimizer state from disk
let mut opt: Optimizer = bincode::deserialize(&std::fs::read("checkpoint_250.bin")?)?;

// Continue from epoch 250 to 500
while opt.remaining_epochs() > 0 {
  opt.step_epochs(10, &metric);
}

let fitted = opt.into_fitted(config);
```

The checkpoint contains everything: current embedding, epoch counters, and the manifold. When training completes, convert to a final model:

```rust
// Training done - drop the heavy optimization state
let fitted = opt.into_fitted(config);

// Access the embedding
let embedding = fitted.embedding();  // Zero-copy view

// Or take ownership
let embedding = fitted.into_embedding();

// Save the final model (much smaller than checkpoints)
std::fs::write("model.bin", bincode::serialize(&fitted)?)?;
```

The serialized `FittedUmap` contains just the manifold and embedding, not the optimization state, making it lightweight for long-term storage.

## Initialization

**You must provide your own initialization.** This library is designed to be minimal and focused on the core UMAP optimization - initialization is left to the caller.

### Recommended Approaches

**Random initialization** (simplest):
```rust
use ndarray::Array2;
use rand::Rng;

fn random_init(n_samples: usize, n_components: usize) -> Array2<f32> {
  let mut rng = rand::thread_rng();
  Array2::from_shape_fn((n_samples, n_components), |_| {
    rng.gen_range(-10.0..10.0)
  })
}
```

**PCA initialization** (recommended for better convergence):
```rust
// Use any PCA library (e.g., linfa-reduction, ndarray-stats, etc.)
// Project data to first n_components principal components
// Scale to roughly [-10, 10] range
```

**Custom initialization**:
- Spectral embedding (use sparse eigensolvers like arpack-ng for large datasets)
- t-SNE initialization
- Pre-trained neural network embeddings
- Domain-specific embeddings

## Configuration

UMAP parameters are grouped logically:

### Basic

```rust
use umap::UmapConfig;

let config = UmapConfig {
    n_components: 2,  // Output dimensions
    ..Default::default()
};
```

### Manifold parameters

```rust
use umap::config::ManifoldParams;

let manifold = ManifoldParams {
    min_dist: 0.1,   // Minimum distance in embedding
    spread: 1.0,     // Scale of embedding
    a: None,         // Auto-computed from min_dist/spread
    b: None,         // Auto-computed from min_dist/spread
};
```

### Graph construction

```rust
use umap::config::GraphParams;

let graph = GraphParams {
    n_neighbors: 15,              // Number of nearest neighbors
    local_connectivity: 1.0,      // Local neighborhood connectivity
    set_op_mix_ratio: 1.0,        // Fuzzy union (1.0) vs intersection (0.0)
    disconnection_distance: None, // Auto-computed from metric
    symmetrize: true,             // Symmetrize graph (set false to save memory)
};
```

The `symmetrize` option controls whether the fuzzy graph is symmetrized via fuzzy set union. For very large datasets, setting `symmetrize: false` roughly halves memory usage with minimal impact on 2D visualization quality.

### Optimization

```rust
use umap::config::OptimizationParams;

let optimization = OptimizationParams {
    n_epochs: None,           // Auto-determined from dataset size
    learning_rate: 1.0,       // SGD learning rate
    negative_sample_rate: 5,  // Negative samples per positive
    repulsion_strength: 1.0,  // Weight for negative samples
};
```

### Complete example

```rust
let config = UmapConfig {
    n_components: 3,
    manifold: ManifoldParams {
        min_dist: 0.05,
        ..Default::default()
    },
    graph: GraphParams {
        n_neighbors: 30,
        ..Default::default()
    },
    optimization: OptimizationParams {
        n_epochs: Some(500),
        ..Default::default()
    },
};
```

## Custom distance metrics

Implement the `Metric` trait:

```rust
use umap::Metric;
use ndarray::{Array1, ArrayView1};

#[derive(Debug)]
struct MyMetric;

impl Metric for MyMetric {
    fn distance(&self, a: ArrayView1<f32>, b: ArrayView1<f32>) -> (f32, Array1<f32>) {
        // Return (distance, gradient)
        // gradient = ∂distance/∂a
        todo!()
    }

    // Optional: provide fast squared distance for optimization
    fn squared_distance(&self, a: ArrayView1<f32>, b: ArrayView1<f32>) -> Option<f32> {
        None  // Return Some(dist_sq) if available
    }
}

// Use custom metric
let umap = Umap::with_metrics(
    config,
    Box::new(MyMetric),       // Input space metric
    Box::new(EuclideanMetric), // Output space metric
);
```

See `src/distances.rs` for the Euclidean implementation example.

## Build

```bash
cargo build --release
```

## Performance

This implementation is designed for large-scale datasets (100M+ samples) on high-core-count machines.

### Real-world benchmark

264 million samples embedded to 2D in 2 hours on a 126-core AMD EPYC 9J45 with 1.4 TB RAM:
- Precomputed KNN (n_neighbors=100)
- Precomputed PCA initialization
- Symmetrization disabled
- ~1 TB peak memory

### Parallelization

Every phase is fully parallelized via Rayon:

- **Graph construction**: Parallel smooth KNN distance, parallel CSR matrix construction
- **Set operations**: Parallel CSC structure building, parallel symmetrization
- **Optimizer initialization**: Parallel edge filtering, parallel epoch scheduling
- **SGD optimization**: Lock-free Hogwild! algorithm for parallel gradient descent

### Memory Efficiency

Optimized for minimal memory footprint at scale:

- **Direct CSR construction**: Builds sparse matrices in-place without intermediate COO/triplet format. Avoids O(nnz) temporary allocations and O(nnz log nnz) global sorting.
- **u32 indices**: Uses 4-byte indices instead of 8-byte, halving index memory for datasets up to ~4B samples.
- **CSC structure-only**: Transpose operations store only structure (indptr + indices), looking up values in original CSR via O(log k) binary search.
- **Sequential array allocation**: Large arrays are allocated one at a time to avoid memory spikes.
- **No cloning**: Avoids sequential `.clone()` on large arrays; uses parallel copies when needed.

### Scaling Guidelines

Memory scales with `n_samples × n_neighbors`:

| n_samples | n_neighbors | Approx. Memory |
|-----------|-------------|----------------|
| 10M       | 30          | ~10 GB         |
| 100M      | 30          | ~100 GB        |
| 250M      | 30          | ~250 GB        |
| 250M      | 256         | ~2 TB          |

To reduce memory:
- Use smaller `n_neighbors` (15-50 is typical for visualization)
- Disable symmetrization: `config.graph.symmetrize = false`
- Slice KNN arrays to use fewer neighbors than computed

### Configuration for Large Datasets

```rust
let config = UmapConfig {
    graph: GraphParams {
        n_neighbors: 30,      // Lower = less memory
        symmetrize: false,    // Skip symmetrization to save memory
        ..Default::default()
    },
    ..Default::default()
};
```

### Timing Logs

The library emits structured logs via the `tracing` crate. Enable a subscriber to see timing for each phase:

```rust
tracing_subscriber::fmt::init();  // or your preferred subscriber
```

Example output:
```
INFO umap_rs::umap::fuzzy_simplicial_set: smooth_knn_dist complete duration_ms=52033
INFO umap_rs::umap::fuzzy_simplicial_set: csr row_counts complete duration_ms=48495
INFO umap_rs::umap::fuzzy_simplicial_set: csr indptr complete duration_ms=560 nnz=62586367074
INFO umap_rs::optimizer: optimizer edge filtering complete duration_ms=725 total_edges=23276942679
```

### Advanced: Accessing the Graph

The fuzzy simplicial set graph is exposed as `SparseMat` (a `CsMatI<f32, u32, usize>`):

```rust
use umap_rs::SparseMat;

let manifold = umap.learn_manifold(data.view(), knn_indices.view(), knn_dists.view());
let graph: &SparseMat = manifold.graph();
// graph uses u32 column indices (memory efficient) and usize row pointers (handles large nnz)
```

## Documentation

- [UMAP.md]UMAP.md - How UMAP works (algorithm explanation)
- [DIVERGENCES.md]DIVERGENCES.md - Differences from Python umap-learn
- [AGENTS.md]AGENTS.md - Developer notes

Run `cargo doc --open` to browse the API documentation.

## Design principles

1. **Minimal** - Core algorithm only, no feature creep
2. **Fast** - Parallel by default, zero-copy where possible
3. **Explicit** - Caller provides KNN, initialization, etc.
4. **Rust-native** - Idiomatic patterns, not Python translations

## Limitations

- **Maximum ~4 billion samples**: Uses `u32` indices internally for memory efficiency
- No input validation (assumes clean data)
- Transform not yet implemented
- Dense arrays only
- Panics on invalid input (not Result-based errors)
- Requires external KNN computation and initialization

### KNN Sentinel Values

If your KNN search couldn't find `k` neighbors for some points (e.g., isolated points), use `u32::MAX` as a sentinel index and any distance value (commonly `f32::INFINITY`). These entries are automatically skipped during graph construction:

```rust
// Point 5 only has 2 real neighbors, rest are sentinels
knn_indices[[5, 0]] = 10;           // real neighbor
knn_indices[[5, 1]] = 23;           // real neighbor  
knn_indices[[5, 2]] = u32::MAX;     // sentinel - skipped
knn_indices[[5, 3]] = u32::MAX;     // sentinel - skipped
```

This is a specialized tool for the core algorithm. Wrap it in validation/error handling for production use.

## License

BSD-3-Clause (see LICENSE file)

## References

- Original paper: McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426
- Hogwild! SGD: Recht, B., et al. (2011). Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. NIPS 2011
- Python umap-learn: https://github.com/lmcinnes/umap