pacmap 0.2.4

Pairwise Controlled Manifold Approximation (PaCMAP) for dimensionality reduction
Documentation
# pacmap

[![Crates.io](https://img.shields.io/crates/v/pacmap)](https://crates.io/crates/pacmap)
[![Documentation](https://docs.rs/pacmap/badge.svg)](https://docs.rs/pacmap)
[![CI](https://github.com/beamform/pacmap-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/beamform/pacmap-rs/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)

A Rust implementation of PaCMAP (Pairwise Controlled Manifold Approximation) for dimensionality reduction based on the
original [Python implementation](https://github.com/YingfanWang/PaCMAP).

## Overview

Dimensionality reduction transforms high-dimensional data into a lower-dimensional representation while preserving
important relationships between points. This is useful for visualization, analysis, and as preprocessing for other
algorithms.

PaCMAP is a relatively recent dimensionality reduction technique that preserves both local and global structure through
three types of point relationships:

- Nearest neighbor pairs preserve local structure
- Mid-near pairs preserve intermediate structure
- Far pairs prevent collapse and maintain separation

For details on the algorithm, see the [original paper](https://jmlr.org/papers/v22/20-1061.html).

## Features

- Fast approximate nearest neighbors for large datasets using USearch
- SIMD-optimized distance calculations
- Parallel processing with Rayon
- Optional PCA initialization using various BLAS backends
- Reproducible results with optional seeding

## Usage

Basic usage with default parameters:

```rust
use anyhow::Result;
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
use pacmap::{Configuration, fit_transform};

fn main() -> Result<()> {
    // Your high-dimensional data as an n × d array
    let n_samples = 1000;
    let n_features = 1000;
    let mut data = Array2::random((n_samples, n_features), Uniform::new(-1.0, 1.0));

    let config = Configuration::default();
    let (embedding, _) = fit_transform(data.view(), config)?;

    // embedding is now an n × 2 array
    Ok(())
}
```

Customized embedding:

```rust
use anyhow::Result;
use pacmap::{Configuration, Initialization};

fn main() -> Result<()> {
    let config = Configuration::builder()
        .embedding_dimensions(3)
        .initialization(Initialization::Random(Some(42)))
        .learning_rate(0.8)
        .num_iters((50, 50, 100))
        .mid_near_ratio(0.3)
        .far_pair_ratio(2.0)
        .approx_threshold(8_000)  // Use approximate neighbors above this size
        .build();

    let (embedding, _) = fit_transform(data.view(), config)?;
    Ok(())
}
```

Capturing intermediate states:

```rust
use anyhow::Result;
use pacmap::Configuration;

fn main() -> Result<()> {
    let config = Configuration::builder()
        .snapshots(vec![100, 200, 300])
        .build();

    let (embedding, Some(states)) = fit_transform(data.view(), config)?;

    // states is now an s × n × d array where s is the number of snapshots
    Ok(())
}
```

For a standalone example, see the [pacmap-rs-example repository](https://github.com/beamform/pacmap-rs-example).

## Configuration

### Core Parameters

- `embedding_dimensions`: Output dimensionality (default: 2)
- `initialization`: How to initialize coordinates:
    - `Pca` - Project data using PCA (default)
    - `Value(array)` - Use provided coordinates
    - `Random(seed)` - Random initialization with optional seed
- `learning_rate`: Learning rate for Adam optimizer (default: 1.0)
- `num_iters`: Iteration counts for three optimization phases (default: (100, 100, 250)):
    1. Mid-near weight reduction phase
    2. Balanced weight phase
    3. Local structure focus phase
- `snapshots`: Optional vector of iterations at which to save embedding states
- `approx_threshold`: Number of samples above which to use approximate nearest neighbors (default: 8,000)

### Pair Sampling Parameters

- `mid_near_ratio`: Ratio of mid-near to nearest neighbor pairs (default: 0.5)
- `far_pair_ratio`: Ratio of far to nearest neighbor pairs (default: 2.0)
- `override_neighbors`: Optional fixed neighbor count override (default: None, auto-scaled with dataset size)
- `seed`: Optional random seed for reproducible sampling and initialization

### Pair Configuration

- `PairConfiguration::Generate` - Generate all pairs from scratch (default)
- `PairConfiguration::NeighborsProvided { pair_neighbors }` - Use provided nearest neighbors, generate remaining pairs
- `PairConfiguration::AllProvided { pair_neighbors, pair_mn, pair_fp }` - Use all provided pairs

## Feature Flags

### BLAS/LAPACK Backends

Only one BLAS/LAPACK backend feature should be enabled at a time.
These are required for PCA operations except on macOS which uses Accelerate by default.

- `intel-mkl-static` - Static linking with Intel MKL
- `intel-mkl-system` - Dynamic linking with system Intel MKL
- `openblas-static` - Static linking with OpenBLAS
- `openblas-system` - Dynamic linking with system OpenBLAS
- `netlib-static` - Static linking with Netlib
- `netlib-system` - Dynamic linking with system Netlib

For example:

```toml
[dependencies]
pacmap = { version = "0.2", features = ["openblas-static"] }
```

See [ndarray-linalg's documentation](https://github.com/rust-ndarray/ndarray-linalg#backend-features) for detailed
information about BLAS/LAPACK configuration and performance considerations.

### Performance Features

- `simsimd` - Enable SIMD optimizations in USearch for faster approximate nearest neighbor search.
  Requires GCC 13+ for compilation and a recent glibc at runtime.

## Limitations

This implementation currently:

- Only supports Euclidean distances
- Does not support incremental transform

## References

[Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP for Data Visualization](https://jmlr.org/papers/v22/20-1061.html).
Wang, Y., Huang, H., Rudin, C., & Shaposhnik, Y. (2021). Journal of Machine Learning Research, 22(201), 1-73.

## License

Apache License, Version 2.0