sklears-tree 0.1.0-alpha.1

Decision tree algorithms for sklears: CART, ID3, C4.5
Documentation

sklears-tree

Crates.io Documentation License Minimum Rust Version

State-of-the-art tree-based algorithms for Rust with 5-20x performance improvements over scikit-learn. Features advanced algorithms like BART, soft trees, and LightGBM-style optimizations.

Latest release: 0.1.0-alpha.1 (October 13, 2025). See the workspace release notes for highlights and upgrade guidance.

Overview

sklears-tree provides comprehensive tree-based ML algorithms:

  • Core Algorithms: Decision Trees, Random Forest, Extra Trees, Gradient Boosting
  • Advanced Methods: BART, Soft Decision Trees, Oblique Trees, CHAID
  • Interpretability: SHAP values, LIME explanations, partial dependence plots
  • Performance: LightGBM optimizations, histogram-based splits, GPU support (coming)
  • Production: Memory-mapped storage, streaming algorithms, distributed training

Quick Start

use sklears_tree::{DecisionTreeClassifier, RandomForestClassifier, GradientBoostingRegressor};
use ndarray::array;

// Decision Tree
let tree = DecisionTreeClassifier::builder()
    .max_depth(5)
    .min_samples_split(2)
    .criterion(Criterion::Entropy)
    .build();

// Random Forest with parallel training
let rf = RandomForestClassifier::builder()
    .n_estimators(100)
    .max_features(MaxFeatures::Sqrt)
    .n_jobs(4)
    .build();

// Gradient Boosting with early stopping
let gb = GradientBoostingRegressor::builder()
    .n_estimators(1000)
    .learning_rate(0.1)
    .early_stopping(true)
    .validation_fraction(0.2)
    .build();

// Train and predict
let X = array![[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]];
let y = array![0, 1, 0];
let fitted = tree.fit(&X, &y)?;
let predictions = fitted.predict(&X)?;

Advanced Features

BART (Bayesian Additive Regression Trees)

use sklears_tree::BART;

let bart = BART::builder()
    .n_trees(200)
    .n_chains(4)
    .n_samples(1000)
    .build();

let fitted = bart.fit(&X, &y)?;
let (predictions, lower, upper) = fitted.predict_with_uncertainty(&X_test, 0.95)?;

Soft Decision Trees

use sklears_tree::SoftDecisionTree;

let soft_tree = SoftDecisionTree::builder()
    .temperature(0.5)
    .learning_rate(0.01)
    .use_batch_norm(true)
    .build();

LightGBM-Style Optimizations

use sklears_tree::{HistogramGradientBoosting, GOSS, EFB};

let lgb = HistogramGradientBoosting::builder()
    .max_bins(255)
    .use_goss(true)  // Gradient-based One-Side Sampling
    .use_efb(true)   // Exclusive Feature Bundling
    .leaf_wise_growth(true)
    .build();

Interpretability

use sklears_tree::{TreeSHAP, LIME, PartialDependence};

// SHAP values for tree models
let shap = TreeSHAP::new(&fitted_model);
let shap_values = shap.explain(&X)?;

// LIME local explanations
let lime = LIME::builder()
    .n_samples(1000)
    .kernel_width(0.75)
    .build();
let explanation = lime.explain(&fitted_model, &instance)?;

// Partial dependence plots
let pd = PartialDependence::new(&fitted_model);
let pd_values = pd.compute(&X, &[0, 1])?; // Features 0 and 1

Performance Features

Parallel Processing

let rf = RandomForestClassifier::builder()
    .n_estimators(1000)
    .n_jobs(-1)  // Use all cores
    .parallel_predict(true)
    .build();

Memory-Mapped Storage

use sklears_tree::MemoryMappedForest;

// Save large models to disk
let mmap_forest = MemoryMappedForest::from_forest(&rf)?;
mmap_forest.save_to_file("model.mmap")?;

// Load and use without loading into RAM
let loaded = MemoryMappedForest::load("model.mmap")?;
let predictions = loaded.predict(&X)?;

Streaming Algorithms

use sklears_tree::{HoeffdingTree, StreamingGradientBoosting};

// Hoeffding tree for streaming data
let mut hoeffding = HoeffdingTree::builder()
    .grace_period(200)
    .split_confidence(0.95)
    .build();

for batch in data_stream {
    hoeffding.partial_fit(&batch.X, &batch.y)?;
}

Specialized Features

Fairness-Aware Trees

use sklears_tree::{FairDecisionTree, FairnessConstraint};

let fair_tree = FairDecisionTree::builder()
    .protected_attribute(2)  // Column index
    .constraint(FairnessConstraint::DemographicParity)
    .fairness_threshold(0.8)
    .build();

Multi-Output Trees

use sklears_tree::{MultiOutputTree, MultiLabelRandomForest};

// Multi-output regression
let mo_tree = MultiOutputTree::builder()
    .strategy(MultiOutputStrategy::Chained)
    .build();

// Multi-label classification
let ml_rf = MultiLabelRandomForest::builder()
    .n_estimators(100)
    .label_correlation(true)
    .build();

Temporal and Spatial Trees

use sklears_tree::{TemporalRandomForest, SpatialDecisionTree};

// Time series with seasonal patterns
let temporal_rf = TemporalRandomForest::builder()
    .seasonal_period(12)
    .trend_detection(true)
    .build();

// Geospatial data
let spatial_tree = SpatialDecisionTree::builder()
    .coordinate_system(CoordinateSystem::Geographic)
    .spatial_index(SpatialIndex::QuadTree)
    .build();

Benchmarks

Performance on standard datasets:

Algorithm scikit-learn sklears-tree Speedup
Decision Tree 5.2ms 0.8ms 6.5x
Random Forest 125ms 12ms 10.4x
Gradient Boosting 850ms 95ms 8.9x
Extra Trees 110ms 8ms 13.8x

With upcoming GPU support:

  • Expected 50-100x speedup for large datasets
  • Real-time training for streaming data

Architecture

sklears-tree/
├── core/              # Base tree structures
├── ensemble/          # Forest algorithms
├── boosting/          # Gradient boosting variants
├── interpretability/  # SHAP, LIME, PDP
├── streaming/         # Online algorithms
├── distributed/       # Distributed training
├── specialized/       # BART, soft trees, etc.
└── gpu/              # GPU kernels (WIP)

Status

  • Implementation: 96% complete (171/186 tests passing)
  • Advanced Algorithms: BART, soft trees, oblique trees ✓
  • Interpretability: SHAP, LIME, anchor explanations ✓
  • GPU Support: In development (Week 1 priority)

Contributing

We welcome contributions! Priority areas:

  • GPU kernel implementations
  • Additional tree algorithms
  • Performance optimizations
  • Documentation improvements

See CONTRIBUTING.md for guidelines.

License

Licensed under either of:

  • Apache License, Version 2.0
  • MIT license

Citation

@software{sklears_tree,
  title = {sklears-tree: High-Performance Tree Algorithms for Rust},
  author = {Cool Japan Team},
  year = {2025},
  url = {https://github.com/cool-japan/sklears}
}