rust-imbalanced-learn 0.1.0

High-performance resampling techniques for imbalanced datasets in Rust
Documentation

rust-imbalanced-learn

High-performance resampling techniques for imbalanced datasets in Rust

Rust Crate Documentation

A Rust implementation of resampling techniques for handling imbalanced datasets in machine learning. This library provides a high-performance alternative to Python's imbalanced-learn, designed for the Rust ML community with performance, safety, and modern Rust idioms in mind.

Features

  • Zero-cost abstractions with compile-time state checking
  • SIMD-accelerated algorithms for maximum performance
  • Parallel processing using Rayon for multi-core utilization
  • Memory safety with Rust's ownership system
  • GPU acceleration on Apple Silicon via Metal (optional)
  • Comprehensive metrics for model evaluation
  • Seamless integration with the Rust ML ecosystem

Supported Algorithms

Resampling Techniques

  • SMOTE (Synthetic Minority Over-sampling Technique)
  • ADASYN (Adaptive Synthetic Sampling) - Coming Soon
  • Random Under/Over Sampling - Coming Soon
  • BorderlineSMOTE - Coming Soon

Ensemble Methods

  • Balanced Random Forest - In Development
  • EasyEnsemble - Coming Soon
  • RUSBoost - Coming Soon

Evaluation Metrics

  • Classification Report with per-class metrics
  • Confusion Matrix with parallel computation
  • F1 Score (macro, micro, weighted)
  • Balanced Accuracy
  • Precision, Recall, Support

Quick Start

Add to your Cargo.toml:

[dependencies]
imbalanced-core = "0.1"
imbalanced-sampling = "0.1"  
imbalanced-metrics = "0.1"
ndarray = "0.15"

Basic Usage

use imbalanced_core::prelude::*;
use imbalanced_sampling::prelude::*;
use imbalanced_metrics::prelude::*;
use ndarray::{Array1, Array2};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load your imbalanced dataset
    let (x, y) = load_imbalanced_data(); // Your data loading function
    
    // Apply SMOTE resampling
    let smote = SmoteStrategy::new(5); // k=5 neighbors
    let config = SmoteConfig::default();
    let (x_balanced, y_balanced) = smote.resample(x.view(), y.view(), &config)?;
    
    // Train your model on balanced data
    // ... (using linfa, smartcore, or candle)
    
    // Evaluate with specialized metrics
    let y_pred = model.predict(&x_test); // Your predictions
    let report = classification_report(y_test.view(), y_pred.view());
    println!("{}", report);
    
    Ok(())
}

Architecture

The library is organized as a workspace with focused crates:

rust-imbalanced-learn/
├── imbalanced-core/      # Core traits and abstractions
├── imbalanced-sampling/  # Resampling algorithms (SMOTE, etc.)
├── imbalanced-ensemble/  # Ensemble methods
└── imbalanced-metrics/   # Evaluation metrics

Type-Safe State Management

use imbalanced_core::prelude::*;

// Compile-time state checking prevents misuse
let resampler = Resampler::new(SmoteStrategy::new(5))  // Uninitialized
    .configure(config);                                // Configured

// Type-safe transitions ensure correct usage

Integration with Rust ML Ecosystem

Linfa Integration

use linfa::prelude::*;
use linfa_trees::DecisionTree;

let dataset = Dataset::new(x_balanced, y_balanced);
let model = DecisionTree::params()
    .max_depth(Some(10))
    .fit(&dataset)?;

SmartCore Integration

use smartcore::linalg::basic::matrix::DenseMatrix;
use smartcore::tree::decision_tree_classifier::DecisionTreeClassifier;

let x_matrix = DenseMatrix::from_2d_array(&x_balanced.view().into_raw_vec());
let model = DecisionTreeClassifier::fit(&x_matrix, &y_balanced.view().to_slice().unwrap())?;

Polars Integration

use polars::prelude::*;

let df = DataFrame::read_csv("imbalanced_data.csv")?;
let balanced_df = smote.resample_dataframe(&df, "target_column")?;

Performance

Rust implementation provides significant speedups over Python:

Algorithm Dataset Size Rust Time Python Time Speedup
SMOTE 10K samples 15ms 180ms 12x
SMOTE 100K samples 120ms 2.1s 17.5x

Benchmarks run on M1 MacBook Pro with optimized release builds

Advanced Features

SIMD Acceleration

let smote = SmoteStrategy::new(5)
    .with_performance_hints(
        PerformanceHints::new()
            .with_hint(PerformanceHint::Vectorize)
            .with_hint(PerformanceHint::Parallel)
    );

GPU Acceleration (Apple Silicon)

[dependencies]
imbalanced-core = { version = "0.1", features = ["metal-acceleration"] }
#[cfg(target_os = "macos")]
use imbalanced_core::platform::MetalKNN;

let knn = MetalKNN::new(device)?;
let neighbors = knn.find_neighbors(&x, &query, k).await?;

Examples

Run the included examples:

# Basic SMOTE usage
cargo run --example basic_usage -p imbalanced-sampling

# Comprehensive pipeline
cargo run --example pipeline -p imbalanced-sampling

# Performance benchmarks  
cargo run --example benchmarks -p imbalanced-sampling --release

Development

Building

git clone https://github.com/yourusername/rust-imbalanced-learn
cd rust-imbalanced-learn
cargo build --release

Testing

cargo test --all

Benchmarking

cargo bench --all

Contributing

Contributions welcome! Please read our Contributing Guide and Code of Conduct.

Priority Areas

  • Additional resampling algorithms (ADASYN, BorderlineSMOTE)
  • More ensemble methods (EasyEnsemble, RUSBoost)
  • Advanced GPU acceleration
  • Integration with more ML frameworks
  • Performance optimizations

Citation

If you use rust-imbalanced-learn in your research, please cite:

@software{rust_imbalanced_learn,
  title={rust-imbalanced-learn: High-performance resampling for imbalanced datasets},
  author={Tim},
  year={2024},
  url={https://github.com/yourusername/rust-imbalanced-learn}
}

License

Licensed under either of

at your option.

Acknowledgements


Built for the Rust ML community