Skip to main content

Crate fast_umap

Crate fast_umap 

Source
Expand description

§fast-umap

GPU-accelerated parametric UMAP (Uniform Manifold Approximation and Projection) in Rust, built on burn + CubeCL.

Up to 4.7× faster than umap-rs on datasets ≥ 10 000 samples, with the ability to transform() new unseen data (something classical UMAP cannot do).

§Quick start

use fast_umap::prelude::*;

// 100 samples × 10 features
let data: Vec<Vec<f64>> = generate_test_data(100, 10)
    .chunks(10)
    .map(|c| c.iter().map(|&x: &f32| x as f64).collect())
    .collect();

// Configure and fit UMAP
let config = UmapConfig::default();
// let umap = Umap::<MyAutodiffBackend>::new(config);
// let fitted = umap.fit(data.clone(), None);
// let embedding = fitted.embedding();
// let new_embedding = fitted.transform(new_data);
// fitted.save("model.umap")?; // Save trained model
// let loaded = FittedUmap::<MyAutodiffBackend>::load("model.umap", config, input_size, device)?;

§Interface

The public API mirrors the umap-rs crate:

§Performance

Datasetfast-umapumap-rsSpeedup
5 000 × 1006.75s2.31s0.34× (umap-rs faster)
10 000 × 1005.93s8.68s1.5× faster
20 000 × 1007.32s34.10s4.7× faster

Benchmarked on Apple M3 Max. Reproduce with cargo run --release --example crate_comparison.

§Architecture

The dimensionality reduction is performed by a small feed-forward neural network (UMAPModel) trained with the UMAP cross-entropy loss using sparse edge subsampling and negative sampling:

attraction  =  mean_{sampled k-NN edges}   [ −log q_ij ]
repulsion   =  mean_{negative samples}     [ −log (1 − q_ij) ]
loss        =  attraction  +  repulsion_strength × repulsion

where q_ij = 1 / (1 + a · d_ij^(2b)) is the UMAP kernel applied to embedding distances (a and b are fitted from min_dist / spread). Per-epoch cost is O(min(n·k, 50K)) regardless of dataset size.

§Modules

ModuleDescription
modelUMAPModel neural network and config builder
[train]Training loop, UmapConfig, sparse training, loss computation
chart2-D scatter plots and loss curves (plotters, optional)
utilsData generation, tensor conversion, normalisation
kernelsCustom CubeCL GPU kernels (Euclidean distance, k-NN)
backendBackend trait extension for custom kernel dispatch
distancesCPU-side distance functions (Euclidean, cosine, Minkowski…)
serializeModel weight serialization/deserialization
preludeRe-exports of the most commonly used items

Re-exports§

pub use train::EpochProgress;
pub use train::GraphParams;
pub use train::LossReduction;
pub use train::ManifoldParams;
pub use train::Metric;
pub use train::OptimizationParams;
pub use train::TrainingConfig;
pub use train::TrainingConfigBuilder;
pub use train::UmapConfig;

Modules§

backend
chart
cpu_backend
CPU backend implementation
distances
kernels
macros
model
normalizer
prelude
serialize
train
utils

Macros§

print_if
Conditionally print a formatted message to stdout.

Structs§

FittedUmap
A fitted UMAP model containing the trained neural network and embedding.
UMAP
Legacy UMAP struct — Deprecated: Use Umap and FittedUmap instead.
Umap
UMAP dimensionality reduction algorithm (GPU-accelerated, parametric).