evoc 0.0.1

Embedding Vector Oriented Clustering — fast clustering of high-dimensional embedding vectors (Rust port of EVōC)
docs.rs failed to build evoc-0.0.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

evoc (Rust)

License: BSD-2-Clause Rust

evoc is a Rust implementation of EVōCEmbedding Vector Oriented Clustering for fast clustering of high-dimensional vectors (CLIP, sentence-transformers, document embeddings, images, etc.).

The API mirrors the Python library: build a kNN graph, learn a low-dimensional node embedding on that graph, then run density-based clustering (Borůvka MST + HDBSCAN-style condensed tree) with optional multi-resolution cluster layers.

Documentation

Document Description
ARCHITECTURE.md Pipeline, module map, parity, backends
examples/README.md Runnable examples and figures
CONTRIBUTING.md Build, test, parity workflow
CHANGELOG.md Version history
AUTHORS.md Upstream and Rust port attribution
CITATION.md How to cite PLSCAN, EVōC, and this crate
LICENSE BSD-2-Clause
NOTICE.md Third-party dependency licenses

Installation

Add to Cargo.toml:

[dependencies]
evoc = "0.0.1"
ndarray = "0.16"

Or from git:

evoc = { git = "https://github.com/eugenehp/evoc-rs" }

Build from source:

cargo build --release

Quick start

use evoc::Evoc;
use ndarray::Array2;

// One row per vector; L2-normalize rows for cosine kNN on f32 data.
let data: Array2<f32> = /* ... */ Array2::zeros((10_000, 128));

let mut clusterer = Evoc {
    random_state: Some(42),
    ..Evoc::default()
};
let labels = clusterer.fit_predict(data)?;

let n_clusters = labels
    .iter()
    .filter(|&&l| l >= 0)
    .collect::<std::collections::HashSet<_>>()
    .len();
println!("clusters: {n_clusters}, layers: {}", clusterer.cluster_layers_.len());

// Learned graph embedding (after fit_predict):
let _embedding = clusterer.embedding_.clone();

Examples

Command Description
cargo run --release --example cluster_in_memory Synthetic blobs, no files
cargo run --release --example user_clustering -- data.npy 42 Your .npy matrix
cargo run --release --example bbc_news_clustering BBC News (5 topics)
cargo run --release --example news_clustering -- 3000 42 20 Newsgroups
cargo run --release --example fashion_mnist_clustering -- 3000 42 Fashion-MNIST
cargo run --release --bin mnist_labels -- --mnist 3000 42 labels.npy MNIST download + cluster
Example Figure
Synthetic blobs synthetic
MNIST mnist
Fashion-MNIST fashion
BBC News (5 topics) bbc
20 Newsgroups news

Regenerate figures: python3 examples/render_readme_figures.py (requires matplotlib + scikit-learn).

Binaries

Binary Purpose
bench Time fit_predict on a float32 .npy matrix
bench_backends Compare wall time across strict + RLX backends — see benches/README.md
bench_huge Synthetic scale benchmarks
mnist_fetch / fashion_mnist_fetch Download, subsample, write .npy
mnist_labels Cluster from .npy, --mnist, or --fashion-mnist
emb_epoch_diff UMAP epoch parity vs Python dumps

Supported input types

Type Distance API
f32 (L2-normalized rows) Cosine Evoc::fit_predict(Array2<f32>)
i8 Quantized cosine knn_graph(EmbeddingData::Int8(...))
u8 (packed bits/row) Bitwise Jaccard knn_graph(EmbeddingData::Binary(...))

Features

Cargo feature Description
full (default) cluster + npy + datasets
cluster Full Evoc API (implies embedinitgraphknn)
knn kNN graph (rlx-cpu fast path; C reference when deterministic)
graph Fuzzy neighbor graph from kNN
init Label-propagation initialization
embed UMAP-style node embedding
npy .npy / .npz load helpers (parity, benchmarks)
datasets MNIST, Fashion-MNIST, BBC News, 20 Newsgroups download helpers
rlx-cpu RLX CPU backend (no extra rlx crate)
rlx-cuda RLX CUDA backend (NVIDIA)
rlx-mlx RLX MLX backend (Apple Silicon)
rlx-rocm RLX ROCm backend (AMD)
rlx-wgpu RLX wgpu backend (cross-platform GPU)
rlx-all Convenience: all rlx-* backends
rlx_metal Alias for rlx-mlx

Minimal dependency (clustering only):

evoc = { version = "0.0.1", default-features = false, features = ["cluster"] }

Enable only the backend you need (each feature is independent):

evoc = { version = "0.0.1", default-features = false, features = ["cluster", "rlx-cuda"] }

Run one backend at a time:

EVOC_BACKEND=cuda cargo run --release --bin bench --features "cluster,npy,rlx-cuda" -- tests/fixtures/small_200/data.npy
./scripts/bench_backends.sh large_2000   # table: strict + each compiled backend
cargo run --release --bin bench_backends --features "cluster,npy,bench-json,rlx-all" -- large_2000 --json
cargo bench --features "cluster,npy,rlx-all" --bench evoc_bench
EVOC_BACKEND=cuda cargo bench --features "cluster,npy,rlx-cuda" --bench evoc_bench

Set EVOC_BACKEND=strict|cpu|cuda|mlx|metal|rocm|wgpu|gpu or pass compute_backend on Evoc. Requesting a backend without its Cargo feature returns an error (no silent fallback).

Parity with Python EVōC

Golden parity fixtures live in the git repository under tests/fixtures/ (not included in the crates.io package):

export EVOC_ROOT=/path/to/python/evoc
.venv-parity/bin/python3 scripts/generate_parity_fixtures.py
cargo test --release
Test Status
graph_parity Exact fuzzy graph from fixture kNN
parity_knn_* Bit-exact kNN
parity_intermediates Staged init/embedding tolerance
parity_labels_* 0 label mismatches on goldens

With Evoc.parity_graph_coo, graph/init checkpoints match Python on reference fixtures. Ad-hoc large datasets may still differ slightly without those intermediates (see CHANGELOG.md).

Python reference

Upstream: TutteInstitute/evoc · Read the Docs

Set EVOC_ROOT (or EVOC_PARITY_ROOT) for parity scripts in scripts/.

Citation

If you use this software in research, cite:

  1. PLSCAN — algorithm paper (arXiv:2512.16558)
  2. EVōC (Python) — reference implementation
  3. evoc-rs (Rust) — this crate when you run the Rust implementation

Full text, BibTeX (evoc_rs2026), and CITATION.cff: CITATION.md · CITATION.bib

Authors

License

BSD-2-Clause — see LICENSE. Same license family as upstream EVōC.

Third-party crate licenses: NOTICE.md.