Skip to main content

Crate irithyll

Crate irithyll 

Source
Expand description

§Irithyll

Streaming machine learning in Rust – gradient boosted trees, kernel methods, linear models, and composable pipelines, all learning one sample at a time.

Irithyll provides 12+ streaming algorithms under one unified StreamingLearner trait. The core is SGBT (Gunasekara et al., 2024), but the library extends to kernel regression, RLS with confidence intervals, Naive Bayes, Mondrian forests, streaming PCA, and composable pipelines. Every algorithm processes samples one at a time with O(1) memory per model.

§Key Capabilities

  • 12+ streaming algorithms – SGBT, KRLS, RLS, linear SGD, Gaussian NB, Mondrian forests, and more
  • Composable pipelines – chain preprocessors and learners: pipe(normalizer()).learner(sgbt(50, 0.01))
  • Concept drift adaptation – automatic tree replacement via Page-Hinkley, ADWIN, or DDM
  • Kernel methodsKRLS with RBF, polynomial, and linear kernels + ALD sparsification
  • Confidence intervalsRecursiveLeastSquares::predict_interval for prediction uncertainty
  • Streaming PCACCIPCA for O(kd) dimensionality reduction without covariance matrices
  • Async streaming – tokio-native AsyncSGBT with bounded channels and concurrent prediction
  • Pluggable losses – squared, logistic, softmax, Huber, or custom via the Loss trait
  • Serialization – checkpoint/restore via JSON or bincode for zero-downtime deployments
  • Production-grade – SIMD acceleration, parallel training, Arrow/Parquet I/O, ONNX export

§Feature Flags

FeatureDefaultDescription
serde-jsonYesJSON model serialization
serde-bincodeNoBincode serialization (compact, fast)
parallelNoRayon-based parallel tree training (ParallelSGBT)
simdNoAVX2 histogram acceleration
kmeans-binningNoK-means histogram binning strategy
arrowNoApache Arrow RecordBatch integration
parquetNoParquet file I/O
onnxNoONNX model export
neural-leavesNoExperimental MLP leaf models
fullNoEnable all features

§Quick Start

use irithyll::{SGBTConfig, SGBT, Sample};

let config = SGBTConfig::builder()
    .n_steps(100)
    .learning_rate(0.0125)
    .build()
    .unwrap();

let mut model = SGBT::new(config);

// Stream samples one at a time
let sample = Sample::new(vec![1.0, 2.0, 3.0], 0.5);
model.train_one(&sample);
let prediction = model.predict(&sample.features);

Or use factory functions for quick construction:

use irithyll::{pipe, normalizer, sgbt, StreamingLearner};

let mut model = pipe(normalizer()).learner(sgbt(50, 0.01));
model.train(&[100.0, 0.5], 42.0);
let pred = model.predict(&[100.0, 0.5]);

§Algorithm

The ensemble maintains n_steps boosting stages, each owning a streaming Hoeffding tree and a drift detector. For each sample (x, y):

  1. Compute the ensemble prediction F(x) = base + lr * sum(tree_s(x))
  2. For each boosting step, compute gradient/hessian of the loss at the residual
  3. Update the tree’s histogram accumulators and evaluate splits via Hoeffding bound
  4. Feed the standardized error to the drift detector
  5. If drift is detected, replace the tree with a fresh alternate

This enables continuous learning without storing past data, with statistically sound split decisions and automatic adaptation to distribution shifts.

Re-exports§

pub use ensemble::adaptive::AdaptiveSGBT;
pub use ensemble::bagged::BaggedSGBT;
pub use ensemble::config::SGBTConfig;
pub use ensemble::config::ScaleMode;
pub use ensemble::distributional::DecomposedPrediction;
pub use ensemble::distributional::DistributionalSGBT;
pub use ensemble::distributional::GaussianPrediction;
pub use ensemble::distributional::ModelDiagnostics;
pub use ensemble::distributional::TreeDiagnostic;
pub use ensemble::moe_distributional::MoEDistributionalSGBT;
pub use ensemble::multi_target::MultiTargetSGBT;
pub use ensemble::multiclass::MulticlassSGBT;
pub use ensemble::quantile_regressor::QuantileRegressorSGBT;
pub use ensemble::DynSGBT;
pub use ensemble::SGBT;
pub use error::IrithyllError;
pub use sample::Sample;
pub use explain::importance_drift::ImportanceDriftMonitor;
pub use explain::streaming::StreamingShap;
pub use explain::treeshap::ShapValues;
pub use ensemble::parallel::ParallelSGBT;
pub use stream::AsyncSGBT;
pub use stream::Prediction;
pub use stream::PredictionStream;
pub use stream::Predictor;
pub use stream::SampleSender;
pub use metrics::auc::StreamingAUC;
pub use metrics::conformal::AdaptiveConformalInterval;
pub use metrics::ewma::EwmaClassificationMetrics;
pub use metrics::ewma::EwmaRegressionMetrics;
pub use metrics::kappa::CohenKappa;
pub use metrics::kappa::KappaM;
pub use metrics::kappa::KappaT;
pub use metrics::rolling::RollingClassificationMetrics;
pub use metrics::rolling::RollingRegressionMetrics;
pub use metrics::ClassificationMetrics;
pub use metrics::FeatureImportance;
pub use metrics::MetricSet;
pub use metrics::RegressionMetrics;
pub use evaluation::HoldoutStrategy;
pub use evaluation::PrequentialConfig;
pub use evaluation::PrequentialEvaluator;
pub use evaluation::ProgressiveValidator;
pub use clustering::CluStream;
pub use clustering::CluStreamConfig;
pub use clustering::ClusterFeature;
pub use clustering::DBStream;
pub use clustering::DBStreamConfig;
pub use clustering::MicroCluster;
pub use clustering::StreamingKMeans;
pub use clustering::StreamingKMeansConfig;
pub use ensemble::adaptive_forest::AdaptiveRandomForest;
pub use learners::BernoulliNB;
pub use learners::MultinomialNB;
pub use anomaly::hst::AnomalyScore;
pub use anomaly::hst::HSTConfig;
pub use anomaly::hst::HalfSpaceTree;
pub use learner::SGBTLearner;
pub use pipeline::Pipeline;
pub use pipeline::PipelineBuilder;
pub use pipeline::StreamingPreprocessor;
pub use preprocessing::FeatureHasher;
pub use preprocessing::IncrementalNormalizer;
pub use preprocessing::MinMaxScaler;
pub use preprocessing::OneHotEncoder;
pub use preprocessing::OnlineFeatureSelector;
pub use preprocessing::PolynomialFeatures;
pub use preprocessing::TargetEncoder;
pub use preprocessing::CCIPCA;
pub use ensemble::lr_schedule::LRScheduler;
pub use learners::GaussianNB;
pub use learners::Kernel;
pub use learners::LinearKernel;
pub use learners::LocallyWeightedRegression;
pub use learners::MondrianForest;
pub use learners::PolynomialKernel;
pub use learners::RBFKernel;
pub use learners::RecursiveLeastSquares;
pub use learners::StreamingLinearModel;
pub use learners::StreamingPolynomialRegression;
pub use learners::KRLS;
pub use time_series::DecomposedPoint;
pub use time_series::DecompositionConfig;
pub use time_series::HoltWinters;
pub use time_series::HoltWintersConfig;
pub use time_series::SNARIMAXCoefficients;
pub use time_series::SNARIMAXConfig;
pub use time_series::Seasonality;
pub use time_series::StreamingDecomposition;
pub use time_series::SNARIMAX;
pub use bandits::Bandit;
pub use bandits::ContextualBandit;
pub use bandits::DiscountedThompsonSampling;
pub use bandits::EpsilonGreedy;
pub use bandits::LinUCB;
pub use bandits::ThompsonSampling;
pub use bandits::UCBTuned;
pub use bandits::UCB1;
pub use reservoir::ESNConfig;
pub use reservoir::ESNConfigBuilder;
pub use reservoir::ESNPreprocessor;
pub use reservoir::EchoStateNetwork;
pub use reservoir::NGRCConfig;
pub use reservoir::NGRCConfigBuilder;
pub use reservoir::NextGenRC;
pub use ssm::MambaConfig;
pub use ssm::MambaConfigBuilder;
pub use ssm::MambaPreprocessor;
pub use ssm::StreamingMamba;
pub use snn::SpikeNet;
pub use snn::SpikeNetConfig;
pub use snn::SpikeNetConfigBuilder;
pub use snn::SpikePreprocessor;
pub use ttt::StreamingTTT;
pub use ttt::TTTConfig;
pub use ttt::TTTConfigBuilder;
pub use kan::KANConfig;
pub use kan::KANConfigBuilder;
pub use kan::StreamingKAN;
pub use attention::AttentionPreprocessor;
pub use attention::StreamingAttentionConfig;
pub use attention::StreamingAttentionConfigBuilder;
pub use attention::StreamingAttentionModel;
pub use moe::NeuralMoE;
pub use moe::NeuralMoEBuilder;
pub use moe::NeuralMoEConfig;
pub use automl::Algorithm;
pub use automl::AttentionFactory;Deprecated
pub use automl::EsnFactory;Deprecated
pub use automl::Factory;
pub use automl::MambaFactory;Deprecated
pub use automl::SgbtFactory;Deprecated
pub use automl::SpikeNetFactory;Deprecated
pub use automl::AutoMetric;
pub use automl::AutoTuner;
pub use automl::AutoTunerBuilder;
pub use automl::AutoTunerConfig;
pub use automl::ModelFactory;
pub use automl::ConfigSampler;
pub use automl::ConfigSpace;
pub use automl::HyperConfig;
pub use automl::HyperParam;
pub use automl::RewardNormalizer;
pub use irithyll_core;

Modules§

anomaly
Streaming anomaly detection algorithms.
arrow_support
Arrow and Parquet integration for zero-copy data ingestion.
attention
Streaming linear attention models.
automl
Streaming AutoML: champion-challenger racing with bandit-guided hyperparameter search.
bandits
Multi-armed bandit algorithms for online decision-making.
clustering
Streaming clustering algorithms.
continual
Continual learning wrappers for streaming neural models.
drift
Concept drift detection algorithms.
ensemble
SGBT ensemble orchestrator – the core boosting loop.
error
Error types for Irithyll.
evaluation
Streaming evaluation protocols for online machine learning.
explain
TreeSHAP explanations for streaming gradient boosted trees.
export_embedded
Export trained SGBT models to the irithyll-core packed binary format.
histogram
Histogram binning and accumulation for streaming tree construction.
kan
Streaming Kolmogorov-Arnold Networks (KAN).
learner
Unified streaming learner trait for polymorphic model composition.
learners
Streaming learner implementations for polymorphic model composition.
loss
Loss functions for gradient boosting.
metrics
Online metric tracking for streaming model evaluation.
moe
Streaming Neural Mixture of Experts.
onnx_export
Export trained SGBT models to ONNX format.
pipeline
Composable streaming pipelines for preprocessing → learning chains.
preprocessing
Streaming preprocessing utilities for feature transformation.
reservoir
Reservoir computing models for streaming temporal learning.
sample
Core data types for streaming samples.
serde_support
Model serialization and deserialization support.
snn
Spiking Neural Networks for streaming machine learning.
ssm
Streaming Mamba (selective state space model) for temporal ML pipelines.
stream
Async streaming infrastructure for tokio-native sample ingestion.
time_series
Time series models for streaming forecasting.
tree
Streaming decision trees with Hoeffding-bound split decisions.
ttt
Streaming Test-Time Training (TTT) layers.

Structs§

EnsembleView
Zero-copy view over a packed ensemble binary.
HoeffdingTreeClassifier
A streaming decision tree classifier based on the VFDT algorithm.
PackedNode
12-byte packed decision tree node. AoS layout for cache-optimal inference.
PackedNodeI16
8-byte quantized decision tree node. Integer-only traversal for FPU-less targets.
QuantizedEnsembleHeader
Header for quantized ensemble binary. 16 bytes, 4-byte aligned.
QuantizedEnsembleView
Zero-copy view over a quantized (int16) ensemble binary.
SampleRef
A borrowed observation that avoids Vec<f64> allocation.

Enums§

BinnerKind
Concrete binning strategy enum, eliminating Box<dyn BinningStrategy> heap allocations per feature per leaf.
ConfigError
Structured error for configuration validation failures.
DriftSignal
Signal emitted by a drift detector after observing a value.
FeatureType
Declares whether a feature is continuous (default) or categorical.
FormatError
Errors that can occur when parsing or validating a packed ensemble binary.
LeafModelType
Describes which leaf model architecture to use.
LossType
Tag identifying a loss function for serialization and reconstruction.

Traits§

BinningStrategy
A strategy for computing histogram bin edges from a stream of values.
DriftDetector
A sequential drift detector that monitors a stream of values.
Loss
A differentiable loss function for gradient boosting.
Observation
Trait for anything that can be used as a training observation.
StreamingLearner
Object-safe trait for any streaming (online) machine learning model.
StreamingTree
A streaming decision tree that trains incrementally.

Functions§

adaptive_sgbt
Create an adaptive SGBT with a learning rate scheduler.
auto_tune
Create an auto-tuning streaming learner with default settings.
ccipca
Create a CCIPCA preprocessor for streaming dimensionality reduction.
delta_net
Create a Gated DeltaNet model (strongest retrieval, NVIDIA 2024).
epsilon_greedy
Create an epsilon-greedy bandit with the given number of arms and exploration rate.
esn
Create an Echo State Network with cycle topology.
esn_preprocessor
Create an ESN preprocessor for pipeline composition.
feature_hasher
Create a feature hasher for fixed-size dimensionality reduction.
gaussian_nb
Create a Gaussian Naive Bayes classifier.
gla
Create a Gated Linear Attention model (SOTA streaming attention).
hawk
Create a Hawk model (lightest streaming attention, vector state).
krls
Create a kernel recursive least squares model with an RBF kernel.
lin_ucb
Create a LinUCB contextual bandit.
linear
Create a streaming linear model with the given learning rate.
mamba
Create a streaming Mamba (selective SSM) model.
mamba_preprocessor
Create a Mamba preprocessor for pipeline composition.
min_max_scaler
Create a min-max scaler that normalizes features to [0, 1].
mondrian
Create a Mondrian forest with the given number of trees.
ngrc
Create a Next Generation Reservoir Computer.
normalizer
Create an incremental normalizer for streaming standardization.
one_hot
Create a one-hot encoder for the given categorical feature indices.
pipe
Start building a pipeline with the first preprocessor.
polynomial_features
Create a degree-2 polynomial feature generator (interactions + squares).
ret_net
Create a RetNet model (simplest, fixed decay).
rls
Create a recursive least squares model with the given forgetting factor.
sgbt
Create an SGBT learner with squared loss from minimal parameters.
spikenet
Create a spiking neural network with e-prop learning.
streaming_attention
Create a streaming attention model with any mode.
streaming_kan
Create a streaming KAN with the given layer sizes and learning rate.
streaming_ttt
Create a streaming TTT (Test-Time Training) model.
target_encoder
Create a target encoder with Bayesian smoothing for categorical features.
thompson
Create a Thompson Sampling bandit with Beta(1,1) prior.
ucb1
Create a UCB1 bandit with the given number of arms.
ucb_tuned
Create a UCB-Tuned bandit with the given number of arms.