Expand description
§Irithyll
Streaming machine learning in Rust – gradient boosted trees, kernel methods, linear models, and composable pipelines, all learning one sample at a time.
Irithyll provides 12+ streaming algorithms under one unified
StreamingLearner trait. The core is SGBT
(Gunasekara et al., 2024),
but the library extends to kernel regression, RLS with confidence intervals,
Naive Bayes, Mondrian forests, streaming PCA, and composable pipelines.
Every algorithm processes samples one at a time with O(1) memory per model.
§Key Capabilities
- 12+ streaming algorithms – SGBT, KRLS, RLS, linear SGD, Gaussian NB, Mondrian forests, and more
- Composable pipelines – chain preprocessors and learners:
pipe(normalizer()).learner(sgbt(50, 0.01)) - Concept drift adaptation – automatic tree replacement via Page-Hinkley, ADWIN, or DDM
- Kernel methods –
KRLSwith RBF, polynomial, and linear kernels + ALD sparsification - Confidence intervals –
RecursiveLeastSquares::predict_intervalfor prediction uncertainty - Streaming PCA –
CCIPCAfor O(kd) dimensionality reduction without covariance matrices - Async streaming – tokio-native
AsyncSGBTwith bounded channels and concurrent prediction - Pluggable losses – squared, logistic, softmax, Huber, or custom via the
Losstrait - Serialization – checkpoint/restore via JSON or bincode for zero-downtime deployments
- Production-grade – SIMD acceleration, parallel training, Arrow/Parquet I/O, ONNX export
§Feature Flags
| Feature | Default | Description |
|---|---|---|
serde-json | Yes | JSON model serialization |
serde-bincode | No | Bincode serialization (compact, fast) |
parallel | No | Rayon-based parallel tree training (ParallelSGBT) |
simd | No | AVX2 histogram acceleration |
kmeans-binning | No | K-means histogram binning strategy |
arrow | No | Apache Arrow RecordBatch integration |
parquet | No | Parquet file I/O |
onnx | No | ONNX model export |
neural-leaves | No | Experimental MLP leaf models |
full | No | Enable all features |
§Quick Start
use irithyll::{SGBTConfig, SGBT, Sample};
let config = SGBTConfig::builder()
.n_steps(100)
.learning_rate(0.0125)
.build()
.unwrap();
let mut model = SGBT::new(config);
// Stream samples one at a time
let sample = Sample::new(vec![1.0, 2.0, 3.0], 0.5);
model.train_one(&sample);
let prediction = model.predict(&sample.features);Or use factory functions for quick construction:
use irithyll::{pipe, normalizer, sgbt, StreamingLearner};
let mut model = pipe(normalizer()).learner(sgbt(50, 0.01));
model.train(&[100.0, 0.5], 42.0);
let pred = model.predict(&[100.0, 0.5]);§Algorithm
The ensemble maintains n_steps boosting stages, each owning a streaming
Hoeffding tree and a drift detector. For each sample (x, y):
- Compute the ensemble prediction F(x) = base + lr * sum(tree_s(x))
- For each boosting step, compute gradient/hessian of the loss at the residual
- Update the tree’s histogram accumulators and evaluate splits via Hoeffding bound
- Feed the standardized error to the drift detector
- If drift is detected, replace the tree with a fresh alternate
This enables continuous learning without storing past data, with statistically sound split decisions and automatic adaptation to distribution shifts.
Re-exports§
pub use ensemble::adaptive::AdaptiveSGBT;pub use ensemble::bagged::BaggedSGBT;pub use ensemble::config::SGBTConfig;pub use ensemble::config::ScaleMode;pub use ensemble::distributional::DecomposedPrediction;pub use ensemble::distributional::DistributionalSGBT;pub use ensemble::distributional::GaussianPrediction;pub use ensemble::distributional::ModelDiagnostics;pub use ensemble::distributional::TreeDiagnostic;pub use ensemble::moe_distributional::MoEDistributionalSGBT;pub use ensemble::multi_target::MultiTargetSGBT;pub use ensemble::multiclass::MulticlassSGBT;pub use ensemble::quantile_regressor::QuantileRegressorSGBT;pub use ensemble::DynSGBT;pub use ensemble::SGBT;pub use error::IrithyllError;pub use sample::Sample;pub use explain::importance_drift::ImportanceDriftMonitor;pub use explain::streaming::StreamingShap;pub use explain::treeshap::ShapValues;pub use ensemble::parallel::ParallelSGBT;pub use stream::AsyncSGBT;pub use stream::Prediction;pub use stream::PredictionStream;pub use stream::Predictor;pub use stream::SampleSender;pub use metrics::auc::StreamingAUC;pub use metrics::conformal::AdaptiveConformalInterval;pub use metrics::ewma::EwmaClassificationMetrics;pub use metrics::ewma::EwmaRegressionMetrics;pub use metrics::kappa::CohenKappa;pub use metrics::kappa::KappaM;pub use metrics::kappa::KappaT;pub use metrics::rolling::RollingClassificationMetrics;pub use metrics::rolling::RollingRegressionMetrics;pub use metrics::ClassificationMetrics;pub use metrics::FeatureImportance;pub use metrics::MetricSet;pub use metrics::RegressionMetrics;pub use evaluation::HoldoutStrategy;pub use evaluation::PrequentialConfig;pub use evaluation::PrequentialEvaluator;pub use evaluation::ProgressiveValidator;pub use clustering::CluStream;pub use clustering::CluStreamConfig;pub use clustering::ClusterFeature;pub use clustering::DBStream;pub use clustering::DBStreamConfig;pub use clustering::MicroCluster;pub use clustering::StreamingKMeans;pub use clustering::StreamingKMeansConfig;pub use ensemble::adaptive_forest::AdaptiveRandomForest;pub use learners::BernoulliNB;pub use learners::MultinomialNB;pub use anomaly::hst::AnomalyScore;pub use anomaly::hst::HSTConfig;pub use anomaly::hst::HalfSpaceTree;pub use learner::SGBTLearner;pub use pipeline::Pipeline;pub use pipeline::PipelineBuilder;pub use pipeline::StreamingPreprocessor;pub use preprocessing::FeatureHasher;pub use preprocessing::IncrementalNormalizer;pub use preprocessing::MinMaxScaler;pub use preprocessing::OneHotEncoder;pub use preprocessing::OnlineFeatureSelector;pub use preprocessing::PolynomialFeatures;pub use preprocessing::TargetEncoder;pub use preprocessing::CCIPCA;pub use ensemble::lr_schedule::LRScheduler;pub use learners::GaussianNB;pub use learners::Kernel;pub use learners::LinearKernel;pub use learners::LocallyWeightedRegression;pub use learners::MondrianForest;pub use learners::PolynomialKernel;pub use learners::RBFKernel;pub use learners::RecursiveLeastSquares;pub use learners::StreamingLinearModel;pub use learners::StreamingPolynomialRegression;pub use learners::KRLS;pub use time_series::DecomposedPoint;pub use time_series::DecompositionConfig;pub use time_series::HoltWinters;pub use time_series::HoltWintersConfig;pub use time_series::SNARIMAXCoefficients;pub use time_series::SNARIMAXConfig;pub use time_series::Seasonality;pub use time_series::StreamingDecomposition;pub use time_series::SNARIMAX;pub use bandits::Bandit;pub use bandits::ContextualBandit;pub use bandits::DiscountedThompsonSampling;pub use bandits::EpsilonGreedy;pub use bandits::LinUCB;pub use bandits::ThompsonSampling;pub use bandits::UCBTuned;pub use bandits::UCB1;pub use reservoir::ESNConfig;pub use reservoir::ESNConfigBuilder;pub use reservoir::ESNPreprocessor;pub use reservoir::EchoStateNetwork;pub use reservoir::NGRCConfig;pub use reservoir::NGRCConfigBuilder;pub use reservoir::NextGenRC;pub use ssm::MambaConfig;pub use ssm::MambaConfigBuilder;pub use ssm::MambaPreprocessor;pub use ssm::StreamingMamba;pub use snn::SpikeNet;pub use snn::SpikeNetConfig;pub use snn::SpikeNetConfigBuilder;pub use snn::SpikePreprocessor;pub use ttt::StreamingTTT;pub use ttt::TTTConfig;pub use ttt::TTTConfigBuilder;pub use kan::KANConfig;pub use kan::KANConfigBuilder;pub use kan::StreamingKAN;pub use attention::AttentionPreprocessor;pub use attention::StreamingAttentionConfig;pub use attention::StreamingAttentionConfigBuilder;pub use attention::StreamingAttentionModel;pub use moe::NeuralMoE;pub use moe::NeuralMoEBuilder;pub use moe::NeuralMoEConfig;pub use automl::Algorithm;pub use automl::AttentionFactory;Deprecated pub use automl::EsnFactory;Deprecated pub use automl::Factory;pub use automl::MambaFactory;Deprecated pub use automl::SgbtFactory;Deprecated pub use automl::SpikeNetFactory;Deprecated pub use automl::AutoMetric;pub use automl::AutoTuner;pub use automl::AutoTunerBuilder;pub use automl::AutoTunerConfig;pub use automl::ModelFactory;pub use automl::ConfigSampler;pub use automl::ConfigSpace;pub use automl::HyperConfig;pub use automl::HyperParam;pub use automl::RewardNormalizer;pub use irithyll_core;
Modules§
- anomaly
- Streaming anomaly detection algorithms.
- arrow_
support - Arrow and Parquet integration for zero-copy data ingestion.
- attention
- Streaming linear attention models.
- automl
- Streaming AutoML: champion-challenger racing with bandit-guided hyperparameter search.
- bandits
- Multi-armed bandit algorithms for online decision-making.
- clustering
- Streaming clustering algorithms.
- continual
- Continual learning wrappers for streaming neural models.
- drift
- Concept drift detection algorithms.
- ensemble
- SGBT ensemble orchestrator – the core boosting loop.
- error
- Error types for Irithyll.
- evaluation
- Streaming evaluation protocols for online machine learning.
- explain
- TreeSHAP explanations for streaming gradient boosted trees.
- export_
embedded - Export trained SGBT models to the irithyll-core packed binary format.
- histogram
- Histogram binning and accumulation for streaming tree construction.
- kan
- Streaming Kolmogorov-Arnold Networks (KAN).
- learner
- Unified streaming learner trait for polymorphic model composition.
- learners
- Streaming learner implementations for polymorphic model composition.
- loss
- Loss functions for gradient boosting.
- metrics
- Online metric tracking for streaming model evaluation.
- moe
- Streaming Neural Mixture of Experts.
- onnx_
export - Export trained SGBT models to ONNX format.
- pipeline
- Composable streaming pipelines for preprocessing → learning chains.
- preprocessing
- Streaming preprocessing utilities for feature transformation.
- reservoir
- Reservoir computing models for streaming temporal learning.
- sample
- Core data types for streaming samples.
- serde_
support - Model serialization and deserialization support.
- snn
- Spiking Neural Networks for streaming machine learning.
- ssm
- Streaming Mamba (selective state space model) for temporal ML pipelines.
- stream
- Async streaming infrastructure for tokio-native sample ingestion.
- time_
series - Time series models for streaming forecasting.
- tree
- Streaming decision trees with Hoeffding-bound split decisions.
- ttt
- Streaming Test-Time Training (TTT) layers.
Structs§
- Ensemble
View - Zero-copy view over a packed ensemble binary.
- Hoeffding
Tree Classifier - A streaming decision tree classifier based on the VFDT algorithm.
- Packed
Node - 12-byte packed decision tree node. AoS layout for cache-optimal inference.
- Packed
Node I16 - 8-byte quantized decision tree node. Integer-only traversal for FPU-less targets.
- Quantized
Ensemble Header - Header for quantized ensemble binary. 16 bytes, 4-byte aligned.
- Quantized
Ensemble View - Zero-copy view over a quantized (int16) ensemble binary.
- Sample
Ref - A borrowed observation that avoids
Vec<f64>allocation.
Enums§
- Binner
Kind - Concrete binning strategy enum, eliminating
Box<dyn BinningStrategy>heap allocations per feature per leaf. - Config
Error - Structured error for configuration validation failures.
- Drift
Signal - Signal emitted by a drift detector after observing a value.
- Feature
Type - Declares whether a feature is continuous (default) or categorical.
- Format
Error - Errors that can occur when parsing or validating a packed ensemble binary.
- Leaf
Model Type - Describes which leaf model architecture to use.
- Loss
Type - Tag identifying a loss function for serialization and reconstruction.
Traits§
- Binning
Strategy - A strategy for computing histogram bin edges from a stream of values.
- Drift
Detector - A sequential drift detector that monitors a stream of values.
- Loss
- A differentiable loss function for gradient boosting.
- Observation
- Trait for anything that can be used as a training observation.
- Streaming
Learner - Object-safe trait for any streaming (online) machine learning model.
- Streaming
Tree - A streaming decision tree that trains incrementally.
Functions§
- adaptive_
sgbt - Create an adaptive SGBT with a learning rate scheduler.
- auto_
tune - Create an auto-tuning streaming learner with default settings.
- ccipca
- Create a CCIPCA preprocessor for streaming dimensionality reduction.
- delta_
net - Create a Gated DeltaNet model (strongest retrieval, NVIDIA 2024).
- epsilon_
greedy - Create an epsilon-greedy bandit with the given number of arms and exploration rate.
- esn
- Create an Echo State Network with cycle topology.
- esn_
preprocessor - Create an ESN preprocessor for pipeline composition.
- feature_
hasher - Create a feature hasher for fixed-size dimensionality reduction.
- gaussian_
nb - Create a Gaussian Naive Bayes classifier.
- gla
- Create a Gated Linear Attention model (SOTA streaming attention).
- hawk
- Create a Hawk model (lightest streaming attention, vector state).
- krls
- Create a kernel recursive least squares model with an RBF kernel.
- lin_ucb
- Create a LinUCB contextual bandit.
- linear
- Create a streaming linear model with the given learning rate.
- mamba
- Create a streaming Mamba (selective SSM) model.
- mamba_
preprocessor - Create a Mamba preprocessor for pipeline composition.
- min_
max_ scaler - Create a min-max scaler that normalizes features to
[0, 1]. - mondrian
- Create a Mondrian forest with the given number of trees.
- ngrc
- Create a Next Generation Reservoir Computer.
- normalizer
- Create an incremental normalizer for streaming standardization.
- one_hot
- Create a one-hot encoder for the given categorical feature indices.
- pipe
- Start building a pipeline with the first preprocessor.
- polynomial_
features - Create a degree-2 polynomial feature generator (interactions + squares).
- ret_net
- Create a RetNet model (simplest, fixed decay).
- rls
- Create a recursive least squares model with the given forgetting factor.
- sgbt
- Create an SGBT learner with squared loss from minimal parameters.
- spikenet
- Create a spiking neural network with e-prop learning.
- streaming_
attention - Create a streaming attention model with any mode.
- streaming_
kan - Create a streaming KAN with the given layer sizes and learning rate.
- streaming_
ttt - Create a streaming TTT (Test-Time Training) model.
- target_
encoder - Create a target encoder with Bayesian smoothing for categorical features.
- thompson
- Create a Thompson Sampling bandit with Beta(1,1) prior.
- ucb1
- Create a UCB1 bandit with the given number of arms.
- ucb_
tuned - Create a UCB-Tuned bandit with the given number of arms.