fast-umap
UMAP (Uniform Manifold Approximation and Projection) in Rust
This repository contains a Rust implementation of UMAP (Uniform Manifold Approximation and Projection), a dimensionality reduction algorithm that preserves both the local and global structure of data. UMAP is widely used for visualizing high-dimensional data in 2D or 3D space.
This implementation leverages the burn machine learning framework, which provides automatic differentiation and GPU support, allowing you to train and apply UMAP models on high-dimensional datasets efficiently.
Features
- Dimensionality Reduction: Reduces high-dimensional data to a lower-dimensional space (e.g., 2D or 3D) for visualization or further analysis.
- Customizable UMAP Model: The model architecture can be configured with different numbers of input features, hidden layer sizes, and output dimensions.
- GPU Support: Powered by the
burnframework with support for training on CPU and GPU using thewgpubackend. - Flexible Data Handling: Functions for converting between vectors and tensors, generating synthetic data, and more.
Roadmap
- Add
k-neighborselement and add precalculated KNN tree to do faster calculations on high dimensional data. - Add MNIST dataset example
- Add different distance functions, not just euclidean
- Create testbench to compare different hyper parameters (
patiencevsn_featuresvsepochsetc.)
Installation
-
Clone the repository:
git clone https://github.com/eugenehp/fast-umap.git cd fast-umap -
Build the project:
cargo build --release
Example Usage
1. Fitting a UMAP Model
You can fit a UMAP model to your data using the UMAP::fit function. Here's how to do it:
use Autodiff;
use ;
use *;
2. Transforming Data
Once the UMAP model is trained, you can transform new high-dimensional data into its low-dimensional representation:
let transformed_data = model.transform;
This function will take new_data in the form of Vec<Vec<f64>> and return its 2D or 3D representation, suitable for visualization.
3. Generating Test Data
You can generate synthetic test data to experiment with the UMAP model using the generate_test_data function:
let data = generate_test_data; // 100 samples, each with 50 features
4. Visualizing Data
After transforming the data to a 2D or 3D space, you can use external charting libraries (e.g., plotters in Rust or matplotlib in Python) to visualize the results.
Model Configuration
The UMAP model configuration is customizable through the UMAPModelConfigBuilder. You can set the following parameters:
- input_size: Number of input features (i.e., the dimensionality of the data).
- hidden_size: The number of neurons in the hidden layers.
- output_size: The target number of dimensions (typically 2 or 3 for visualization).
Here's how to configure and build the model:
let model_config = default
.input_size // Input features: 50 dimensions
.hidden_size // Hidden layer size: 100 neurons
.output_size // Output size: 2 (for 2D visualization)
.build
.unwrap;
Training the UMAP Model
You can train the UMAP model on your dataset by calling the fit method on the UMAP struct. The training process will optimize the model's weights to reduce the data's dimensionality.
Training configuration parameters include:
- epochs: The number of epochs to train the model.
- batch_size: The number of samples per training batch.
- learning_rate: The learning rate for gradient descent.
- beta1, beta2: Optimizer hyperparameters for Adam optimization.
For example:
let model = UMAP::<Autodiff<Wgpu>>::fit(data, WgpuDevice::default());
Examples
Simple
cargo run --example simple
Sample code:
use *;
use Rng;
Generates this plot:

Advanced
cargo run --example advanced
Sample code:
use AutodiffModule;
use ;
use ;
It also generates plot, and a loss chrt:

License
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright
2024, Eugene Hauptmann