Expand description
Fast, parallel Rust implementation of the UMAP dimensionality reduction algorithm.
This library provides a streamlined implementation of UMAP (Uniform Manifold Approximation and Projection) focused on performance and correctness. It requires precomputed k-nearest neighbors and initialization, allowing you to use your preferred libraries for these steps.
§Example
ⓘ
use umap::{Umap, UmapConfig};
use umap::EuclideanMetric;
// Configure UMAP
let config = UmapConfig::default();
let umap = Umap::new(config);
// Fit to data (requires precomputed KNN and initialization)
let model = umap.fit(
data.view(),
knn_indices.view(),
knn_dists.view(),
init.view(),
);
// Get the embedding
let embedding = model.embedding();§Features
- Parallel optimization: Rayon’s parallel SGD (Hogwild! algorithm)
- Extensible metrics: Custom distance functions via the
Metrictrait - Zero-copy views: Efficient array handling with
ndarray
§Limitations
- Dense arrays only (no sparse matrix support)
- Fit only (transform for new points not yet implemented)
- Panics on invalid input (no Result-based error handling)
- Requires external KNN computation and initialization
§Public API
The library exposes a minimal, well-defined API:
Umap- Main algorithm structFittedUmap- Fitted model with embeddingsUmapConfig- Configuration parametersMetric- Distance metric traitEuclideanMetric- Euclidean distance implementation
Re-exports§
pub use config::GraphParams;pub use config::ManifoldParams;pub use config::OptimizationParams;pub use config::UmapConfig;pub use manifold::LearnedManifold;pub use metric::Metric;pub use metric::MetricType;pub use optimizer::Optimizer;
Modules§
Structs§
- Euclidean
Metric - Euclidean (L2) distance metric.
- Fitted
Umap - A fitted UMAP model containing the learned manifold and embedding.
- Umap
- UMAP dimensionality reduction algorithm.
Type Aliases§
- Sparse
Mat - Sparse matrix with u32 column indices to save memory (4 bytes vs 8 bytes per index). Uses usize for indptr since nnz can exceed u32::MAX for very large datasets. Valid for n_samples < 2^32 (~4 billion).