sklears-tree
State-of-the-art tree-based algorithms for Rust with 5-20x performance improvements over scikit-learn. Features advanced algorithms like BART, soft trees, and LightGBM-style optimizations.
Latest release:
0.1.0-alpha.1(October 13, 2025). See the workspace release notes for highlights and upgrade guidance.
Overview
sklears-tree provides comprehensive tree-based ML algorithms:
- Core Algorithms: Decision Trees, Random Forest, Extra Trees, Gradient Boosting
- Advanced Methods: BART, Soft Decision Trees, Oblique Trees, CHAID
- Interpretability: SHAP values, LIME explanations, partial dependence plots
- Performance: LightGBM optimizations, histogram-based splits, GPU support (coming)
- Production: Memory-mapped storage, streaming algorithms, distributed training
Quick Start
use ;
use array;
// Decision Tree
let tree = builder
.max_depth
.min_samples_split
.criterion
.build;
// Random Forest with parallel training
let rf = builder
.n_estimators
.max_features
.n_jobs
.build;
// Gradient Boosting with early stopping
let gb = builder
.n_estimators
.learning_rate
.early_stopping
.validation_fraction
.build;
// Train and predict
let X = array!;
let y = array!;
let fitted = tree.fit?;
let predictions = fitted.predict?;
Advanced Features
BART (Bayesian Additive Regression Trees)
use BART;
let bart = BARTbuilder
.n_trees
.n_chains
.n_samples
.build;
let fitted = bart.fit?;
let = fitted.predict_with_uncertainty?;
Soft Decision Trees
use SoftDecisionTree;
let soft_tree = builder
.temperature
.learning_rate
.use_batch_norm
.build;
LightGBM-Style Optimizations
use ;
let lgb = builder
.max_bins
.use_goss // Gradient-based One-Side Sampling
.use_efb // Exclusive Feature Bundling
.leaf_wise_growth
.build;
Interpretability
use ;
// SHAP values for tree models
let shap = new;
let shap_values = shap.explain?;
// LIME local explanations
let lime = LIMEbuilder
.n_samples
.kernel_width
.build;
let explanation = lime.explain?;
// Partial dependence plots
let pd = new;
let pd_values = pd.compute?; // Features 0 and 1
Performance Features
Parallel Processing
let rf = builder
.n_estimators
.n_jobs // Use all cores
.parallel_predict
.build;
Memory-Mapped Storage
use MemoryMappedForest;
// Save large models to disk
let mmap_forest = from_forest?;
mmap_forest.save_to_file?;
// Load and use without loading into RAM
let loaded = load?;
let predictions = loaded.predict?;
Streaming Algorithms
use ;
// Hoeffding tree for streaming data
let mut hoeffding = builder
.grace_period
.split_confidence
.build;
for batch in data_stream
Specialized Features
Fairness-Aware Trees
use ;
let fair_tree = builder
.protected_attribute // Column index
.constraint
.fairness_threshold
.build;
Multi-Output Trees
use ;
// Multi-output regression
let mo_tree = builder
.strategy
.build;
// Multi-label classification
let ml_rf = builder
.n_estimators
.label_correlation
.build;
Temporal and Spatial Trees
use ;
// Time series with seasonal patterns
let temporal_rf = builder
.seasonal_period
.trend_detection
.build;
// Geospatial data
let spatial_tree = builder
.coordinate_system
.spatial_index
.build;
Benchmarks
Performance on standard datasets:
| Algorithm | scikit-learn | sklears-tree | Speedup |
|---|---|---|---|
| Decision Tree | 5.2ms | 0.8ms | 6.5x |
| Random Forest | 125ms | 12ms | 10.4x |
| Gradient Boosting | 850ms | 95ms | 8.9x |
| Extra Trees | 110ms | 8ms | 13.8x |
With upcoming GPU support:
- Expected 50-100x speedup for large datasets
- Real-time training for streaming data
Architecture
sklears-tree/
├── core/ # Base tree structures
├── ensemble/ # Forest algorithms
├── boosting/ # Gradient boosting variants
├── interpretability/ # SHAP, LIME, PDP
├── streaming/ # Online algorithms
├── distributed/ # Distributed training
├── specialized/ # BART, soft trees, etc.
└── gpu/ # GPU kernels (WIP)
Status
- Implementation: 96% complete (171/186 tests passing)
- Advanced Algorithms: BART, soft trees, oblique trees ✓
- Interpretability: SHAP, LIME, anchor explanations ✓
- GPU Support: In development (Week 1 priority)
Contributing
We welcome contributions! Priority areas:
- GPU kernel implementations
- Additional tree algorithms
- Performance optimizations
- Documentation improvements
See CONTRIBUTING.md for guidelines.
License
Licensed under either of:
- Apache License, Version 2.0
- MIT license
Citation