evoc-rs 0.1.1

Rust port of the EVoC clustering algorithm for high dimensional data
Documentation

CI Crates.io License: MIT

evoc-rs

Rust implementation of EVoC (Embedding Vector Oriented Clustering): efficient density-based clustering of high-dimensional embedding vectors (CLIP, sentence transformers, and the like).

EVoC combines a UMAP-like node embedding with HDBSCAN-style density estimation and multi-layer persistence analysis to produce stable, hierarchical clusterings without requiring you to specify the number of clusters up front.

This crate is a port of the original Python implementation by Leland McInnes at the Tutte Institute, extended with support for multiple approximate nearest neighbour backends via ann-search-rs.

How it works

  1. kNN graph — approximate nearest-neighbour search via a selectable ANN backend (NNDescent, HNSW, or a precomputed graph).
  2. Fuzzy simplicial set — the kNN graph is smoothed and symmetrised into a weighted undirected graph.
  3. Node embedding — a low-dimensional layout is optimised using the EVoC gradient, a modified UMAP repulsion term controlled by a noise_level parameter.
  4. MST — a minimum spanning tree is built over the embedding using mutual reachability distances.
  5. Cluster layers — a hierarchy of clusterings is extracted via persistence analysis, returning the most stable granularities. Alternatively, a target number of clusters can be specified and the algorithm will binary-search for the closest match.

Installation

Just add the dependency to your cargo.toml file:

[dependencies]
evoc-rs = "*"

Usage

Here is how you would use it.

use evoc_rs::{evoc, EvocParams};
use manifolds_rs::data::nearest_neighbours::NearestNeighbourParams;

// data is a faer MatRef<f32> with shape (n_points, n_features)
let params = EvocParams::default();
let nn_params = NearestNeighbourParams::default();

let result = evoc(
    data.as_ref(),
    "nndescent".to_string(),
    None,          // precomputed kNN (None = build from data)
    &params,
    &nn_params,
    42,            // seed
    true,          // verbose
);

// Best clustering by persistence score
let labels = result.best_labels();
println!("Found {} clusters.", result.n_clusters());

// Full layer hierarchy, finest first
for (i, layer) in result.cluster_layers.iter().enumerate() {
    println!("Layer {}: {} labels", i, layer.len());
}

If you already know roughly how many clusters you want:

let params = EvocParams {
    approx_n_clusters: Some(10),
    ..EvocParams::default()
};

Credits

Based on the original EVoC algorithm and Python implementation by Leland McInnes at the Tutte Institute for Mathematics and Computing.

Licence

This project is licensed under the MIT Licence for original contributions – see LICENSE for details.

Portions of this work are derived from the EVoC project by the Tutte Institute for Mathematics and Computing, licensed under the BSD 2-Clause Licence; see LICENSE-THIRD-PARTY for details.