rust-imbalanced-learn
High-performance resampling techniques for imbalanced datasets in Rust
A Rust implementation of resampling techniques for handling imbalanced datasets in machine learning. This library provides a high-performance alternative to Python's imbalanced-learn, designed for the Rust ML community with performance, safety, and modern Rust idioms in mind.
Features
- Zero-cost abstractions with compile-time state checking
- SIMD-accelerated algorithms for maximum performance
- Parallel processing using Rayon for multi-core utilization
- Memory safety with Rust's ownership system
- GPU acceleration on Apple Silicon via Metal (optional)
- Comprehensive metrics for model evaluation
- Seamless integration with the Rust ML ecosystem
Supported Algorithms
Resampling Techniques
- SMOTE (Synthetic Minority Over-sampling Technique)
- ADASYN (Adaptive Synthetic Sampling) - Coming Soon
- Random Under/Over Sampling - Coming Soon
- BorderlineSMOTE - Coming Soon
Ensemble Methods
- Balanced Random Forest - In Development
- EasyEnsemble - Coming Soon
- RUSBoost - Coming Soon
Evaluation Metrics
- Classification Report with per-class metrics
- Confusion Matrix with parallel computation
- F1 Score (macro, micro, weighted)
- Balanced Accuracy
- Precision, Recall, Support
Quick Start
Add to your Cargo.toml
:
[]
= "0.1"
= "0.1"
= "0.1"
= "0.15"
Basic Usage
use *;
use *;
use *;
use ;
Architecture
The library is organized as a workspace with focused crates:
rust-imbalanced-learn/
├── imbalanced-core/ # Core traits and abstractions
├── imbalanced-sampling/ # Resampling algorithms (SMOTE, etc.)
├── imbalanced-ensemble/ # Ensemble methods
└── imbalanced-metrics/ # Evaluation metrics
Type-Safe State Management
use *;
// Compile-time state checking prevents misuse
let resampler = new // Uninitialized
.configure; // Configured
// Type-safe transitions ensure correct usage
Integration with Rust ML Ecosystem
Linfa Integration
use *;
use DecisionTree;
let dataset = new;
let model = params
.max_depth
.fit?;
SmartCore Integration
use DenseMatrix;
use DecisionTreeClassifier;
let x_matrix = from_2d_array;
let model = fit?;
Polars Integration
use *;
let df = read_csv?;
let balanced_df = smote.resample_dataframe?;
Performance
Rust implementation provides significant speedups over Python:
Algorithm | Dataset Size | Rust Time | Python Time | Speedup |
---|---|---|---|---|
SMOTE | 10K samples | 15ms | 180ms | 12x |
SMOTE | 100K samples | 120ms | 2.1s | 17.5x |
Benchmarks run on M1 MacBook Pro with optimized release builds
Advanced Features
SIMD Acceleration
let smote = new
.with_performance_hints;
GPU Acceleration (Apple Silicon)
[]
= { = "0.1", = ["metal-acceleration"] }
use MetalKNN;
let knn = new?;
let neighbors = knn.find_neighbors.await?;
Examples
Run the included examples:
# Basic SMOTE usage
# Comprehensive pipeline
# Performance benchmarks
Development
Building
Testing
Benchmarking
Contributing
Contributions welcome! Please read our Contributing Guide and Code of Conduct.
Priority Areas
- Additional resampling algorithms (ADASYN, BorderlineSMOTE)
- More ensemble methods (EasyEnsemble, RUSBoost)
- Advanced GPU acceleration
- Integration with more ML frameworks
- Performance optimizations
Citation
If you use rust-imbalanced-learn in your research, please cite:
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
Acknowledgements
- imbalanced-learn - Original Python implementation
- scikit-learn - Machine learning fundamentals
- Rust ML community - Ecosystem foundation
Built for the Rust ML community