Expand description
§Irithyll
Streaming machine learning in Rust – gradient boosted trees, kernel methods, linear models, and composable pipelines, all learning one sample at a time.
Irithyll provides 12+ streaming algorithms under one unified
StreamingLearner trait. The core is SGBT
(Gunasekara et al., 2024),
but the library extends to kernel regression, RLS with confidence intervals,
Naive Bayes, Mondrian forests, streaming PCA, and composable pipelines.
Every algorithm processes samples one at a time with O(1) memory per model.
§Key Capabilities
- 12+ streaming algorithms – SGBT, KRLS, RLS, linear SGD, Gaussian NB, Mondrian forests, and more
- Composable pipelines – chain preprocessors and learners:
pipe(normalizer()).learner(sgbt(50, 0.01)) - Concept drift adaptation – automatic tree replacement via Page-Hinkley, ADWIN, or DDM
- Kernel methods –
KRLSwith RBF, polynomial, and linear kernels + ALD sparsification - Confidence intervals –
RecursiveLeastSquares::predict_intervalfor prediction uncertainty - Streaming PCA –
CCIPCAfor O(kd) dimensionality reduction without covariance matrices - Async streaming – tokio-native
AsyncSGBTwith bounded channels and concurrent prediction - Pluggable losses – squared, logistic, softmax, Huber, or custom via the
Losstrait - Serialization – checkpoint/restore via JSON or bincode for zero-downtime deployments
- Production-grade – SIMD acceleration, parallel training, Arrow/Parquet I/O, ONNX export
§Feature Flags
| Feature | Default | Description |
|---|---|---|
serde-json | Yes | JSON model serialization |
serde-bincode | No | Bincode serialization (compact, fast) |
parallel | No | Rayon-based parallel tree training (ParallelSGBT) |
simd | No | AVX2 histogram acceleration |
kmeans-binning | No | K-means histogram binning strategy |
arrow | No | Apache Arrow RecordBatch integration |
parquet | No | Parquet file I/O |
onnx | No | ONNX model export |
neural-leaves | No | Experimental MLP leaf models |
full | No | Enable all features |
§Quick Start
use irithyll::{SGBTConfig, SGBT, Sample};
let config = SGBTConfig::builder()
.n_steps(100)
.learning_rate(0.0125)
.build()
.unwrap();
let mut model = SGBT::new(config);
// Stream samples one at a time
let sample = Sample::new(vec![1.0, 2.0, 3.0], 0.5);
model.train_one(&sample);
let prediction = model.predict(&sample.features);Or use factory functions for quick construction:
use irithyll::{pipe, normalizer, sgbt, StreamingLearner};
let mut model = pipe(normalizer()).learner(sgbt(50, 0.01));
model.train(&[100.0, 0.5], 42.0);
let pred = model.predict(&[100.0, 0.5]);§Algorithm
The ensemble maintains n_steps boosting stages, each owning a streaming
Hoeffding tree and a drift detector. For each sample (x, y):
- Compute the ensemble prediction F(x) = base + lr * sum(tree_s(x))
- For each boosting step, compute gradient/hessian of the loss at the residual
- Update the tree’s histogram accumulators and evaluate splits via Hoeffding bound
- Feed the standardized error to the drift detector
- If drift is detected, replace the tree with a fresh alternate
This enables continuous learning without storing past data, with statistically sound split decisions and automatic adaptation to distribution shifts.
Re-exports§
pub use drift::DriftDetector;pub use drift::DriftSignal;pub use ensemble::adaptive::AdaptiveSGBT;pub use ensemble::bagged::BaggedSGBT;pub use ensemble::config::FeatureType;pub use ensemble::config::SGBTConfig;pub use ensemble::distributional::DistributionalSGBT;pub use ensemble::distributional::GaussianPrediction;pub use ensemble::multi_target::MultiTargetSGBT;pub use ensemble::multiclass::MulticlassSGBT;pub use ensemble::quantile_regressor::QuantileRegressorSGBT;pub use ensemble::DynSGBT;pub use ensemble::SGBT;pub use error::ConfigError;pub use error::IrithyllError;pub use histogram::BinnerKind;pub use histogram::BinningStrategy;pub use loss::Loss;pub use loss::LossType;pub use sample::Observation;pub use sample::Sample;pub use sample::SampleRef;pub use tree::StreamingTree;pub use explain::importance_drift::ImportanceDriftMonitor;pub use explain::streaming::StreamingShap;pub use explain::treeshap::ShapValues;pub use ensemble::parallel::ParallelSGBT;pub use stream::AsyncSGBT;pub use stream::Prediction;pub use stream::PredictionStream;pub use stream::Predictor;pub use stream::SampleSender;pub use metrics::conformal::AdaptiveConformalInterval;pub use metrics::ewma::EwmaClassificationMetrics;pub use metrics::ewma::EwmaRegressionMetrics;pub use metrics::rolling::RollingClassificationMetrics;pub use metrics::rolling::RollingRegressionMetrics;pub use metrics::ClassificationMetrics;pub use metrics::FeatureImportance;pub use metrics::MetricSet;pub use metrics::RegressionMetrics;pub use anomaly::hst::AnomalyScore;pub use anomaly::hst::HSTConfig;pub use anomaly::hst::HalfSpaceTree;pub use learner::SGBTLearner;pub use learner::StreamingLearner;pub use pipeline::Pipeline;pub use pipeline::PipelineBuilder;pub use pipeline::StreamingPreprocessor;pub use preprocessing::IncrementalNormalizer;pub use preprocessing::OnlineFeatureSelector;pub use preprocessing::CCIPCA;pub use ensemble::lr_schedule::LRScheduler;pub use learners::GaussianNB;pub use learners::Kernel;pub use learners::LinearKernel;pub use learners::LocallyWeightedRegression;pub use learners::MondrianForest;pub use learners::PolynomialKernel;pub use learners::RBFKernel;pub use learners::RecursiveLeastSquares;pub use learners::StreamingLinearModel;pub use learners::StreamingPolynomialRegression;pub use learners::KRLS;
Modules§
- anomaly
- Streaming anomaly detection algorithms.
- arrow_
support - Arrow and Parquet integration for zero-copy data ingestion.
- drift
- Concept drift detection algorithms.
- ensemble
- SGBT ensemble orchestrator — the core boosting loop.
- error
- Error types for Irithyll.
- explain
- TreeSHAP explanations for streaming gradient boosted trees.
- histogram
- Histogram-based feature binning for streaming tree construction.
- learner
- Unified streaming learner trait for polymorphic model composition.
- learners
- Streaming learner implementations for polymorphic model composition.
- loss
- Loss functions for gradient boosting.
- metrics
- Online metric tracking for streaming model evaluation.
- onnx_
export - Export trained SGBT models to ONNX format.
- pipeline
- Composable streaming pipelines for preprocessing → learning chains.
- preprocessing
- Streaming preprocessing utilities for feature transformation.
- sample
- Core data types for streaming samples.
- serde_
support - Model serialization and deserialization support.
- stream
- Async streaming infrastructure for tokio-native sample ingestion.
- tree
- Streaming decision trees with Hoeffding-bound split decisions.
Functions§
- adaptive_
sgbt - Create an adaptive SGBT with a learning rate scheduler.
- ccipca
- Create a CCIPCA preprocessor for streaming dimensionality reduction.
- gaussian_
nb - Create a Gaussian Naive Bayes classifier.
- krls
- Create a kernel recursive least squares model with an RBF kernel.
- linear
- Create a streaming linear model with the given learning rate.
- mondrian
- Create a Mondrian forest with the given number of trees.
- normalizer
- Create an incremental normalizer for streaming standardization.
- pipe
- Start building a pipeline with the first preprocessor.
- rls
- Create a recursive least squares model with the given forgetting factor.
- sgbt
- Create an SGBT learner with squared loss from minimal parameters.