Expand description
§Irithyll
Streaming Gradient Boosted Trees for evolving data streams.
Irithyll implements the SGBT algorithm (Gunasekara et al., 2024) in pure Rust, providing incremental gradient boosted tree ensembles that learn one sample at a time. Trees are built using Hoeffding-bound split decisions and automatically replaced when concept drift is detected, making the model suitable for non-stationary environments where the data distribution shifts over time.
§Key Capabilities
- True online learning –
train_one()processes samples individually, no batching - Concept drift adaptation – automatic tree replacement via Page-Hinkley, ADWIN, or DDM
- Async streaming – tokio-native
AsyncSGBTwith bounded channels and concurrent prediction - Pluggable losses – squared, logistic, softmax, Huber, or custom via the
Losstrait - Multi-class – one-vs-rest committees with softmax normalization (
MulticlassSGBT) - Serialization – checkpoint/restore via JSON or bincode for zero-downtime deployments
- Production-grade – SIMD acceleration, parallel training, Arrow/Parquet I/O, ONNX export
§Feature Flags
| Feature | Default | Description |
|---|---|---|
serde-json | Yes | JSON model serialization |
serde-bincode | No | Bincode serialization (compact, fast) |
parallel | No | Rayon-based parallel tree training (ParallelSGBT) |
simd | No | AVX2 histogram acceleration |
kmeans-binning | No | K-means histogram binning strategy |
arrow | No | Apache Arrow RecordBatch integration |
parquet | No | Parquet file I/O |
onnx | No | ONNX model export |
neural-leaves | No | Experimental MLP leaf models |
full | No | Enable all features |
§Quick Start
use irithyll::{SGBTConfig, SGBT, Sample};
let config = SGBTConfig::builder()
.n_steps(100)
.learning_rate(0.0125)
.build()
.unwrap();
let mut model = SGBT::new(config);
// Stream samples one at a time
let sample = Sample::new(vec![1.0, 2.0, 3.0], 0.5);
model.train_one(&sample);
let prediction = model.predict(&sample.features);§Algorithm
The ensemble maintains n_steps boosting stages, each owning a streaming
Hoeffding tree and a drift detector. For each sample (x, y):
- Compute the ensemble prediction F(x) = base + lr * sum(tree_s(x))
- For each boosting step, compute gradient/hessian of the loss at the residual
- Update the tree’s histogram accumulators and evaluate splits via Hoeffding bound
- Feed the standardized error to the drift detector
- If drift is detected, replace the tree with a fresh alternate
This enables continuous learning without storing past data, with statistically sound split decisions and automatic adaptation to distribution shifts.
Re-exports§
pub use drift::DriftDetector;pub use drift::DriftSignal;pub use ensemble::bagged::BaggedSGBT;pub use ensemble::config::FeatureType;pub use ensemble::config::SGBTConfig;pub use ensemble::multi_target::MultiTargetSGBT;pub use ensemble::multiclass::MulticlassSGBT;pub use ensemble::quantile_regressor::QuantileRegressorSGBT;pub use ensemble::DynSGBT;pub use ensemble::SGBT;pub use error::ConfigError;pub use error::IrithyllError;pub use histogram::BinnerKind;pub use histogram::BinningStrategy;pub use loss::Loss;pub use loss::LossType;pub use sample::Observation;pub use sample::Sample;pub use sample::SampleRef;pub use tree::StreamingTree;pub use explain::importance_drift::ImportanceDriftMonitor;pub use explain::streaming::StreamingShap;pub use explain::treeshap::ShapValues;pub use stream::AsyncSGBT;pub use stream::Prediction;pub use stream::PredictionStream;pub use stream::Predictor;pub use stream::SampleSender;pub use metrics::conformal::AdaptiveConformalInterval;pub use metrics::ewma::EwmaClassificationMetrics;pub use metrics::ewma::EwmaRegressionMetrics;pub use metrics::rolling::RollingClassificationMetrics;pub use metrics::rolling::RollingRegressionMetrics;pub use metrics::ClassificationMetrics;pub use metrics::FeatureImportance;pub use metrics::MetricSet;pub use metrics::RegressionMetrics;
Modules§
- drift
- Concept drift detection algorithms.
- ensemble
- SGBT ensemble orchestrator — the core boosting loop.
- error
- Error types for Irithyll.
- explain
- TreeSHAP explanations for streaming gradient boosted trees.
- histogram
- Histogram-based feature binning for streaming tree construction.
- loss
- Loss functions for gradient boosting.
- metrics
- Online metric tracking for streaming model evaluation.
- sample
- Core data types for streaming samples.
- serde_
support - Model serialization and deserialization support.
- stream
- Async streaming infrastructure for tokio-native sample ingestion.
- tree
- Streaming decision trees with Hoeffding-bound split decisions.