Expand description
EVoC - Embedding Vector Oriented Clustering
Efficient clustering of high-dimensional embedding vectors (CLIP, sentence transformers, etc.) by combining a UMAP-like node embedding with HDBSCAN-style density-based clustering and multi-layer persistence analysis. This is the Rust version/port which allows for different approximate nearest neighbour search algorithms (for details, see ann-search-rs). This code is based on the original code from Leland McInnes, see the Python implementation: evoc
Modules§
- clustering
- This module contains the clustering-related sub modules and functions, utilities namely the generation of minimum spanning trees (MST), KD trees and the core functions for the density-based clustering.
- graph
- This module contains the needed graph-related functions: kNN graph generation, fuzzy graph generation, label propagation and the embedding optimisation.
- prelude
- Re-exports of commonly used types, traits, structures and functions across the crate:
- utils
- Utility functions like shared traits, disjoint sets and sparse structures + matrix multiplications.
Structs§
- Evoc
Params - Parameters for EVoC clustering.
- Evoc
Result - Result of EVoC clustering.
Functions§
- evoc
- Run EVoC clustering on high-dimensional embedding data.
- search_
for_ n_ clusters - Binary-searches over
min_cluster_sizeto find approximatelytarget_kclusters.