Expand description
§PaCMAP
: Pairwise Controlled Manifold Approximation
This crate provides a Rust implementation of PaCMAP
(Pairwise Controlled
Manifold Approximation), a dimensionality reduction technique that preserves
both local and global structure of high-dimensional data.
PaCMAP
transforms high-dimensional data into a lower-dimensional
representation while preserving important relationships between points. This
is useful for visualization, analysis, and as preprocessing for other
algorithms.
§Key Features
PaCMAP
preserves both local and global structure through three types of
point relationships:
- Nearest neighbor pairs preserve local structure
- Mid-near pairs preserve intermediate structure
- Far pairs prevent collapse and maintain separation
The implementation provides:
- Configurable optimization with adaptive learning rates via Adam optimization
- Phase-based weight schedules to balance local and global preservation
- Multiple initialization options including PCA and random seeding
- Optional snapshot capture of intermediate states
§Examples
Basic usage with default parameters:
use ndarray::Array2;
use pacmap::{Configuration, fit_transform};
let data: Array2<f32> = // ... load your high-dimensional data
let config = Configuration::default();
let (embedding, _) = fit_transform(data.view(), config).unwrap();
Customized embedding:
use pacmap::{Configuration, Initialization};
let config = Configuration::builder()
.embedding_dimensions(3)
.initialization(Initialization::Random(Some(42)))
.learning_rate(0.8)
.num_iters((50, 50, 100))
.mid_near_ratio(0.3)
.far_pair_ratio(2.0)
.build();
Capturing intermediate states:
use pacmap::Configuration;
let config = Configuration::builder()
.snapshots(vec![100, 200, 300])
.build();
§Configuration
Core parameters:
embedding_dimensions
: Output dimensionality (default: 2)initialization
: How to initialize coordinates:Pca
- Project data using PCA (default)Value(array)
- Use provided coordinatesRandom(seed)
- Random initialization with optional seed
learning_rate
: Learning rate for Adam optimizer (default: 1.0)num_iters
: Iteration counts for three optimization phases (default: (100, 100, 250))snapshots
: Optional vector of iterations at which to save embedding statesapprox_threshold
: Number of points above which approximate neighbor search is used
Pair sampling parameters:
mid_near_ratio
: Ratio of mid-near to nearest neighbor pairs (default: 0.5)far_pair_ratio
: Ratio of far to nearest neighbor pairs (default: 2.0)override_neighbors
: Optional fixed neighbor count overrideseed
: Optional random seed for reproducible sampling
§Feature Flags
§BLAS/LAPACK Backends
Only one BLAS/LAPACK backend feature should be enabled at a time. These are required for PCA operations except on macOS which uses Accelerate by default.
intel-mkl-static
- Static linking with Intel MKLintel-mkl-system
- Dynamic linking with system Intel MKLopenblas-static
- Static linking withOpenBLAS
openblas-system
- Dynamic linking with systemOpenBLAS
netlib-static
- Static linking with Netlibnetlib-system
- Dynamic linking with system Netlib
For more details on BLAS/LAPACK configuration, see the ndarray-linalg documentation.
§Performance Features
simsimd
- Enable SIMD optimizations inUSearch
for faster approximate nearest neighbor search. Requires GCC 13+ for compilation and a recent glibc at runtime.
§Implementation Notes
- Supports both exact and approximate nearest neighbor search
- Uses Euclidean distances for pair relationships
- Leverages ndarray for efficient matrix operations
- Employs parallel iterators via rayon for performance
- Provides detailed error handling with custom error types
§References
Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP for Data Visualization. Wang, Y., Huang, H., Rudin, C., & Shaposhnik, Y. (2021). Journal of Machine Learning Research, 22(201), 1-73.
Original Python implementation: https://github.com/YingfanWang/PaCMAP
Modules§
- knn
- K-nearest neighbor computation for
PaCMAP
dimensionality reduction.
Structs§
- Configuration
- Configuration options for the
PaCMAP
embedding process. - Configuration
Builder - Use builder syntax to set the inputs and finish with
build()
.
Enums§
- Initialization
- Methods for initializing the embedding coordinates.
- PaCMap
Error - Errors that can occur during
PaCMAP
embedding. - Pair
Configuration - Strategy for sampling pairs during optimization.
Functions§
- fit_
transform - Reduces dimensionality of input data using
PaCMAP
.