manifolds-rs
High-performance manifold learning and dimensionality reduction algorithms implemented in Rust. Contains as for now:
- UMAP
- Has different optimisers: SGD (traditional), Adam and a parallelised version of Adam for very fast fitting.
- Optional GPU-accelerated kNN search via
cubecl.
- Parametric UMAP (optional feature)
- tSNE
- Barnes Hut tSNE (With a
O(n log n)complexity). - Fast Fourier Transform-accelerated Interpolation-based t-SNE (Flt-SNE)
(optional feature; with a
O(n)complexity for large datasets). - Optional GPU-accelerated kNN search.
- Barnes Hut tSNE (With a
- PHATE
- PaCMAP
- Diffusion Maps
- Classical diffusion maps (Coifman & Lafon, 2006) with anisotropic (alpha) normalisation for density correction.
- Optional landmark-based approximation with Nystroem extension for larger datasets.
Description
Rust implementations of various methods to project data onto two dimensions,
i.e, learn low dimensional manifolds from the data. The current crate contains
the big classic UMAP and tSNE (with the
Barnes-Hut implementation and optionally the
FFT-acceleration version).
These are typically used methods for visualising high-dimensional biological
data, but not without controversy.
Moreover, the crate also provides via the Burn DL framework optionally
parametric UMAP that can be optionally be
used via the prospective feature flag. Since release 0.1.8, we also have
PHATE. With 0.1.9,
PaCMAP has been also implemented. More
recently, classical diffusion maps
have been added as well.
Changelog can be found here)
Features
- UMAP algorithm: Complete implementation of the UMAP dimensionality reduction algorithm with several optimisations: SGD, Adam and a parallelised version of ADAM for increased optimisation speed.
- tSNE algorithm: Implementation of the Barnes-Hut accelerated version and the FFT-accelerated version (optional).
- PHATE: Implementation of Potential of Heat-diffusion for Affinity-based Trajectory Embedding with different landmark methods.
- Diffusion Maps: Classical diffusion maps with anisotropic normalisation (alpha in [0, 1] controlling density correction from the normalised graph Laplacian to the Laplace-Beltrami operator), Von Neumann entropy-based diffusion time selection, and an optional landmark variant with Nystroem extension for larger datasets.
- GPU acceleration (optional feature
gpu): The most expensive part of UMAP and tSNE for large datasets is the nearest neighbour search. With thegpufeature enabled, kNN search runs on the GPU viacubecl, with backends for Vulkan, Metal, DirectX 12 (through wgpu) and CUDA. Graph construction and optimisation remain on CPU. - Multiple ANN backends via
ann-search-rs:- Exhaustive - If you want precise results and have a small data set in which the approximate nearest neighbour index building is actually slower.
- KmKnn - An exact nearest neighbour search algorithm, leveraging k-means clustering under the hood for speed. If you need exact results.
- BallTree - A small, fast index for smaller data sets with lower dimensions.
- Annoy (Approximate Nearest Neighbours Oh Yeah) - Good for medium low- dimensionality datasets.
- NNDescent (Nearest Neighbour Descent) - good for larger datasets with higher dimensionality.
- HNSW (Hierarchical Navigable Small World) - good for (very) larger datasets with higher dimensionality.
- IVF (inversted file index) - Another fast and optimised index.
- GPU ANN backends (with
gpufeature):- Exhaustive GPU - Brute-force kNN on the GPU; deterministic and accurate.
- IVF GPU - Inverted-file index on the GPU; good default for larger data.
- NNDescent GPU - CAGRA-style graph construction on the GPU.
- Distance metrics:
- Euclidean
- Cosine
- Maybe more to come over time ... ?
- Multiple initialisations:
- Graph Laplacian eigenvector-based initialisation using Lanczos iteration
- Random initialisation
- PCA-based initialisation
- Customisable parameters: Full control over fuzzy simplicial set construction, graph symmetrisation, and optimisation parameters for tSNE, UMAP and PHATE.
- High performance: Parallel processing with Rayon, efficient sparse matrix operations, cache-friendly structures and optimised SGD and Adam optimisers for UMAP (for the latter also a parallelised version...) and fast optimisers for tSNE and also PHATE.
- Synthetic datasets: Some synthetic datasets are available for testing and experimentation: Swiss role, clustered data and a trajectory-like structure.
Installation
Add this to your Cargo.toml:
[]
= "*"
If you want to enable parametric UMAP, please use:
[]
= { = "*", = [ "parametric" ] }
If you want to enable the FFT-accelerated version of tSNE, please use:
[]
= { = "*", = [ "fft_tsne" ] }
If you want to enable GPU-accelerated kNN search, please use:
[]
= { = "*", = [ "gpu" ] }
Feature flags can be combined, e.g. features = [ "gpu", "fft_tsne" ].
Notes
Please use version 0.1.3 and higher. These ones are not extensively tested
against real data.
Note on GPU support: kNN runs on GPU through wgpu (Vulkan, Metal, DX12) or
CUDA via cubecl. Computations on the GPU side are performed in f32; this
is a limitation of WGSL (the wgpu shader language) which has no f64, and
also reflects the fact that f64 throughput on consumer GPUs is typically
1/32 to 1/64 of f32. If you need double precision, stick to the CPU path.
GPU results are not bit-reproducible across runs (parallel reductions do not
have a fixed accumulation order), but structural quality is consistent.
Usage
R package
This crate powers manifoldsR, an R package leveraging the incredible speed that Rust offers.
UMAP Example
Below are examples of how to use UMAP.
use *;
// Generate synthetic clustered data
let = generate_clustered_data;
// Configure UMAP parameters
let params = default_2d;
// Run UMAP
let embedding = umap;
// embedding[0] contains x-coordinates
// embedding[1] contains y-coordinates
t-SNE Example
Below are examples of how to use t-SNE.
use *;
// Generate synthetic clustered data
let = generate_clustered_data;
// Configure t-SNE parameters
let params = new;
// Run t-SNE (Barnes-Hut)
let embedding = tsne;
// embedding[0] contains x-coordinates
// embedding[1] contains y-coordinates
Using Precomputed k-NN
Both algorithms support precomputed k-nearest neighbour graphs for efficiency when running multiple embeddings:
use *;
let = generate_clustered_data;
// Compute k-NN once
let nn_params = default;
let = run_ann_search;
// Use precomputed k-NN for UMAP
let params = default_2d;
let embedding = umap;
GPU-Accelerated UMAP and tSNE (requires gpu feature)
With the gpu feature, the nearest neighbour search runs on the GPU while
graph construction and optimisation stay on CPU. This is typically the
bottleneck for larger datasets (say, 50k+ samples).
use *;
use ;
// Generate synthetic clustered data (use f32 for GPU compatibility)
let = generate_clustered_data;
// Configure GPU UMAP parameters
let params = default_2d;
// Run GPU UMAP. ann_type defaults to "ivf_gpu"; alternatives are
// "exhaustive_gpu" (deterministic, brute force) and "nndescent_gpu".
let device = default;
let embedding = ;
GPU t-SNE works analogously:
use *;
use ;
let = generate_clustered_data;
let params = new;
let device = default;
let embedding = ;
To use CUDA instead of wgpu, swap WgpuRuntime/WgpuDevice for
CudaRuntime/CudaDevice from cubecl::cuda. On Linux CI, Vulkan via
mesa-vulkan-drivers (lavapipe) is the simplest path for headless testing.
Parametric UMAP Example (requires parametric feature)
Parametric UMAP learns a neural network encoder that can transform new data points:
use *;
use ;
use Autodiff;
type Backend = ;
// Generate synthetic clustered data
let = generate_clustered_data;
// Configure parametric UMAP
let fit_params = from_min_dist_spread;
let params = new;
// Set up device
let device = Cpu;
// Train parametric UMAP
let embedding = ;
// embedding[0] contains x-coordinates
// embedding[1] contains y-coordinates
PHATE Example
PHATE is well-suited for data with continuous structure and branching trajectories, such as single-cell differentiation data.
use *;
// Generate a synthetic branching trajectory
let branches = generate_example_branches;
let = generate_trajectory;
// Configure PHATE parameters
let params = new;
// Run PHATE
let embedding = phate;
// embedding[0] contains x-coordinates
// embedding[1] contains y-coordinates
PaCMAP Example
PaCMAP preserves both local and global structure via three pair types (near, mid-near, and further pairs) and a phased optimisation schedule.
use *;
// Generate synthetic clustered data
let = generate_clustered_data;
// Configure PaCMAP parameters
let params = default;
// Run PaCMAP
let embedding = pacmap;
// embedding[0] contains x-coordinates
// embedding[1] contains y-coordinates
Diffusion Maps Example
Classical diffusion maps embed the data via the top eigenvectors of a
row-stochastic diffusion operator, optionally with anisotropic normalisation
to correct for non-uniform sampling density. Setting alpha_norm = 1.0
recovers the Laplace-Beltrami operator; 0.5 gives the Fokker-Planck
operator; 0.0 gives the unnormalised graph Laplacian.
use *;
// Generate synthetic clustered data
let = generate_clustered_data;
// Configure diffusion maps parameters
let params = new;
// Run diffusion maps
let embedding = diffusion_maps;
// embedding[0] contains the first non-trivial diffusion component
// embedding[1] contains the second non-trivial diffusion component
Licence
MIT Licence
Copyright (c) 2025 Gregor Alexander Lueg
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.