Expand description
§fastkmeans-rs
A fast and efficient k-means clustering implementation in Rust, compatible with ndarray.
§Features
- Double-chunking algorithm: Processes both data and centroids in chunks to minimize memory usage while maintaining efficiency
- Parallel computation: Uses rayon for multi-threaded processing
- ndarray compatible: Works seamlessly with ndarray arrays
- FAISS/scikit-learn compatible API: Familiar
train(),fit(),predict()interface - Optional BLAS acceleration: Enable
accelerate(macOS) oropenblasfeatures for faster matrix operations
§Example
use fastkmeans_rs::{FastKMeans, KMeansConfig};
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
// Generate random data
let data = Array2::random((1000, 128), Uniform::new(-1.0f32, 1.0));
// Create and train the model
let mut kmeans = FastKMeans::new(128, 10);
kmeans.train(&data.view()).unwrap();
// Get cluster assignments
let labels = kmeans.predict(&data.view()).unwrap();
assert_eq!(labels.len(), 1000);§Custom Configuration
use fastkmeans_rs::{FastKMeans, KMeansConfig};
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
let data = Array2::random((5000, 64), Uniform::new(-1.0f32, 1.0));
let config = KMeansConfig {
k: 50,
max_iters: 100,
tol: 1e-6,
seed: 42,
max_points_per_centroid: None, // Disable subsampling
chunk_size_data: 10_000,
chunk_size_centroids: 1_000,
verbose: false,
};
let mut kmeans = FastKMeans::with_config(config);
let labels = kmeans.fit_predict(&data.view()).unwrap();§BLAS Acceleration
For improved performance on large datasets, enable a BLAS backend:
# macOS (recommended - uses Apple Accelerate)
fastkmeans-rs = { version = "0.1", features = ["accelerate"] }
# Linux/Windows (requires OpenBLAS installed)
fastkmeans-rs = { version = "0.1", features = ["openblas"] }§CUDA GPU Acceleration
For maximum performance on large datasets, enable CUDA support:
fastkmeans-rs = { version = "0.1", features = ["cuda"] }This requires the CUDA toolkit to be installed. Then use FastKMeansCuda:
ⓘ
use fastkmeans_rs::cuda::FastKMeansCuda;
use fastkmeans_rs::KMeansConfig;
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
let data = Array2::random((100000, 128), Uniform::new(-1.0f32, 1.0));
let config = KMeansConfig::new(1024)
.with_max_iters(50)
.with_verbose(true);
let mut kmeans = FastKMeansCuda::with_config(config).unwrap();
kmeans.train(&data.view()).unwrap();
let labels = kmeans.predict(&data.view()).unwrap();Re-exports§
pub use algorithm::kmeans_double_chunked;
Modules§
Structs§
- FastK
Means - Fast k-means clustering implementation compatible with ndarray.
- KMeans
Config - Configuration for the FastKMeans algorithm
Enums§
- KMeans
Error - Error types for the FastKMeans library