.__ .__ __ .__ .__ .__
│__│______│__│╱ │_│ │__ ___.__.│ │ │ │
│ ╲_ __ ╲ ╲ __╲ │ < │ ││ │ │ │
│ ││ │ ╲╱ ││ │ │ Y ╲___ ││ │_│ │__
│__││__│ │__││__│ │___│ ╱ ____││____╱____╱
╲╱╲╱
Streaming machine learning in Rust -- gradient boosted trees, neural streaming architectures (reservoir computing, state space models, spiking networks), kernel methods, linear models, and composable pipelines, all learning one sample at a time.
use ;
let mut model = pipe.learner;
model.train;
let prediction = model.predict;
Workspace
irithyll is structured as a Cargo workspace with three crates:
| Crate | Description | no_std |
Allocator |
|---|---|---|---|
irithyll |
Training, streaming algorithms, pipelines, I/O, async | No | std |
irithyll-core |
Packed inference engine (12-byte nodes, branch-free traversal) | Yes | Zero-alloc |
irithyll-python |
PyO3 Python bindings for irithyll |
No | std |
irithyll-core cross-compiles for bare-metal targets (verified: cargo check --target thumbv6m-none-eabi) and has zero dependencies.
Why irithyll?
- 30+ streaming algorithms under one unified
StreamingLearnertrait - One sample at a time -- O(1) memory per model, no batches, no windows, no retraining
- Embedded deployment -- train with
irithyll, export to packed binary, infer withirithyll-coreon bare metal - Composable pipelines -- chain preprocessors and learners with a builder API
- Concept drift adaptation -- automatic model replacement when the data distribution shifts
- Confidence intervals -- prediction uncertainty from RLS and conformal methods
- Production-grade -- async streaming, SIMD acceleration, Arrow/Parquet I/O, ONNX export
- Neural streaming architectures -- reservoir computing (NG-RC, ESN), state space models (Mamba), spiking neural networks (e-prop)
- Streaming AutoML -- champion-challenger hyperparameter racing with bandit-guided search
- Pure Rust -- zero unsafe in
irithyll, deterministic, serializable, 2,500+ tests
Algorithms
Every algorithm implements StreamingLearner -- train and predict with the same two-method interface.
| Algorithm | Type | Use Case | Per-Sample Cost |
|---|---|---|---|
SGBT |
Gradient boosted trees | General regression/classification | O(n_steps * depth) |
AdaptiveSGBT |
SGBT + LR scheduling | Decaying/cycling learning rates | O(n_steps * depth) |
MulticlassSGBT |
One-vs-rest SGBT | Multi-class classification | O(classes * n_steps * depth) |
MultiTargetSGBT |
Independent SGBTs | Multi-output regression | O(targets * n_steps * depth) |
DistributionalSGBT |
Mean + variance SGBT | Prediction uncertainty | O(2 * n_steps * depth) |
KRLS |
Kernel recursive LS | Nonlinear regression (sin, exp, ...) | O(budget^2) |
RecursiveLeastSquares |
RLS with confidence | Linear regression + uncertainty | O(d^2) |
StreamingLinearModel |
SGD linear model | Fast linear baseline | O(d) |
StreamingPolynomialRegression |
Polynomial SGD | Polynomial curve fitting | O(d * degree) |
GaussianNB |
Naive Bayes | Text/categorical classification | O(d * classes) |
MondrianForest |
Random forest variant | Streaming ensemble regression | O(n_trees * depth) |
LocallyWeightedRegression |
Memory-based | Locally varying relationships | O(window) |
Preprocessing (implements StreamingPreprocessor):
| Preprocessor | Description |
|---|---|
IncrementalNormalizer |
Welford's online standardization |
OnlineFeatureSelector |
Streaming mutual-information feature selection |
CCIPCA |
O(kd) streaming PCA without covariance matrices |
Neural Streaming Architectures
v9 introduces three families of neural architectures, all implementing StreamingLearner -- train and predict one sample at a time, compose in pipelines, no batching required.
Reservoir Computing
| Model | Description | Per-Sample Cost |
|---|---|---|
NextGenRC |
Polynomial features of time-delayed observations + RLS readout. No random matrices. Trains in <10 samples. Based on Gauthier et al. 2021 (Nature Communications). | O(k * s * d^degree) |
EchoStateNetwork |
Deterministic cycle/ring reservoir (O(N) weights, not O(N^2)) with leaky integration + RLS readout. Based on Rodan & Tino 2010, Martinuzzi 2025. | O(N^2) readout |
ESNPreprocessor |
Use ESN as a pipeline preprocessor feeding any downstream learner. | O(N) reservoir step |
State Space Models (SSM / Mamba)
| Model | Description | Per-Sample Cost |
|---|---|---|
StreamingMamba |
Selective state space model with input-dependent B, C, Delta. ZOH discretization, diagonal A matrix. First streaming SSM implementation in Rust. Based on Gu & Dao 2023. | O(D * N) |
MambaPreprocessor |
SSM temporal features feeding SGBT or other learners in a pipeline. | O(D * N) |
Spiking Neural Networks (SNN / e-prop)
| Model | Description | Per-Sample Cost |
|---|---|---|
SpikeNet |
LIF neurons with e-prop online learning rule. Delta spike encoding for continuous features. Based on Bellec et al. 2020 (Nature Communications), Neftci et al. 2019. | O(N_hidden^2) |
SpikeNetFixed |
Full no_std training in Q1.14 integer arithmetic. 64 neurons fits in 22KB (Cortex-M0+ 32KB SRAM). Lives in irithyll-core. |
O(N_hidden^2) |
All neural models also have preprocessor variants (ESNPreprocessor, MambaPreprocessor, SpikePreprocessor) that implement StreamingPreprocessor for pipeline composition.
Streaming AutoML
v9.5 introduces online hyperparameter optimization via champion-challenger racing -- the first streaming AutoML framework in Rust.
| Component | Description |
|---|---|
AutoTuner |
Top-level orchestrator (implements StreamingLearner). Champion always predicts; challengers race in parallel. |
ModelFactory |
Trait for creating model instances from hyperparameter configs. Built-in factories for SGBT, ESN, Mamba, Attention, SpikeNet. |
ConfigSpace |
Hyperparameter search space with float (linear/log), integer, and categorical params. |
DiscountedThompsonSampling |
Non-stationary bandit with exponential forgetting for config-space exploration. |
use ;
// One-liner: auto-tune SGBT hyperparameters online
let mut tuner = auto_tune;
for i in 0..1000
let pred = tuner.predict;
Based on Wu et al. (2021) ChaCha, Qi et al. (2023) Discounted Thompson Sampling.
Quick Start
Factory Functions
The fastest way to get started -- one-liner construction for every algorithm:
use ;
let mut trees = sgbt; // 50 boosting steps, lr=0.01
let mut kernel = krls; // RBF gamma=1.0, budget=100
let mut bayes = gaussian_nb; // Gaussian Naive Bayes
let mut forest = mondrian; // 10 Mondrian trees
let mut lin = linear; // SGD linear model, lr=0.01
let mut rls_m = rls; // RLS, forgetting factor=0.99
// All share the same interface
trees.train;
let pred = trees.predict;
Neural Architectures
Reservoir computing, state space models, and spiking networks -- same StreamingLearner interface:
use *;
// NG-RC: time-delay + polynomial features
let mut model = ngrc;
model.train;
let pred = model.predict;
// Echo State Network
let mut model = esn;
// SSM as feature extractor -> gradient boosted trees
let mut model = pipe.learner;
// Spiking neural network
let mut model = spikenet;
model.train;
let pred = model.predict;
Composable Pipelines
Chain preprocessors and learners with zero boilerplate:
use ;
// Normalize → reduce to 5 components → gradient boosted trees
let mut model = pipe
.pipe
.learner;
model.train;
let pred = model.predict;
Kernel Methods (KRLS)
Learn nonlinear functions with automatic dictionary sparsification:
use ;
let mut model = krls; // RBF kernel, budget=100
for i in 0..500
let pred = model.predict; // sin(pi/2) ~ 1.0
Prediction Intervals (RLS)
Get calibrated confidence intervals that narrow as data arrives:
use ;
let mut model = rls;
for i in 0..1000
let = model.predict_interval;
// 95% CI: prediction is between lo and hi
Full Builder Pattern
For complete control over SGBT hyperparameters:
use ;
let config = builder
.n_steps
.learning_rate
.max_depth
.n_bins
.lambda
.grace_period
.feature_names
.build
.expect;
let mut model = SGBTnew;
for i in 0..500
// TreeSHAP explanations
let shap = model.explain;
if let Some = model.explain_named
Concept Drift Detection
Automatic adaptation when the data distribution shifts:
use ;
use Adwin;
let config = builder
.n_steps
.learning_rate
.drift_detector
.build
.expect;
let mut model = SGBTnew;
// When drift is detected, trees are automatically replaced
Async Streaming
Tokio-native with bounded channels and concurrent prediction:
use ;
use AsyncSGBT;
async
Python
=
=
=
=
=
Packed Inference (irithyll-core)
Train with the full irithyll crate, export to a compact binary, and run inference on embedded targets with zero allocation.
Node Format
Each PackedNode is 12 bytes (5 nodes per 64-byte cache line):
| Field | Size | Description |
|---|---|---|
value |
4B | Split threshold (internal) or prediction with learning rate baked in (leaf) |
children |
4B | Packed left/right child u16 indices |
feature_flags |
2B | Bit 15 = is_leaf, bits 14:0 = feature index |
_reserved |
2B | Padding for future use |
Export and Deploy
use ;
use export_packed;
// 1. Train on a host machine
let config = builder
.n_steps
.learning_rate
.max_depth
.build
.unwrap;
let mut model = SGBTnew;
for sample in training_data
// 2. Export to packed binary (learning rate baked into leaf values)
let packed: = export_packed;
write.unwrap;
// 3. Load on embedded target (no_std, zero-alloc)
use EnsembleView;
// Zero-copy: borrows the buffer, no heap allocation
let model_bytes: & = include_bytes!;
let view = from_bytes.unwrap;
let prediction: f32 = view.predict;
Performance
Benchmarked on x86-64 (single core, 50 trees, max depth 4, 10 features):
| Operation | Latency | Throughput |
|---|---|---|
Packed single predict (irithyll-core) |
66 ns | 15.2M pred/s |
| Packed batch predict (x4 interleave) | -- | 5.3M pred/s |
Training-time predict (irithyll SoA) |
533 ns | 1.9M pred/s |
The 8x speedup comes from the 12-byte AoS node layout (vs training-time SoA vectors), branch-free child selection (compiles to cmov on x86), and f32 arithmetic with pre-baked learning rate.
The SGBT Algorithm
The core gradient boosting engine is based on Gunasekara et al., 2024. The ensemble maintains n_steps boosting stages, each owning a streaming Hoeffding tree and a drift detector. For each sample (x, y):
- Compute the ensemble prediction F(x) = base + lr * sum(tree_s(x))
- For each boosting step, compute gradient/hessian of the loss at the residual
- Update the tree's histogram accumulators and evaluate splits via Hoeffding bound
- Feed the standardized error to the drift detector
- If drift is detected, replace the tree with a fresh alternate
Beyond the paper, irithyll adds EWMA leaf decay, lazy O(1) histogram decay, proactive tree replacement, and EFDT-style split re-evaluation for long-running non-stationary systems.
Architecture
irithyll/ Workspace root
src/
ensemble/ SGBT variants, config, multi-class/target, parallel, adaptive, distributional
learners/ KRLS, RLS, Gaussian NB, Mondrian forests, linear/polynomial models
pipeline/ Composable preprocessor + learner chains (StreamingPreprocessor trait)
preprocessing/ IncrementalNormalizer, OnlineFeatureSelector, CCIPCA
tree/ Hoeffding-bound streaming decision trees
histogram/ Streaming histogram binning (uniform, quantile, k-means)
drift/ Concept drift detectors (Page-Hinkley, ADWIN, DDM)
loss/ Differentiable loss functions (squared, logistic, softmax, Huber)
explain/ TreeSHAP, StreamingShap, importance drift monitoring
stream/ Async tokio-based training runner and predictor handles
metrics/ Online regression/classification metrics, conformal intervals, EWMA
anomaly/ Half-space trees for streaming anomaly detection
automl/ Champion-challenger racing, config space, model factories, reward normalization
reservoir/ NG-RC (time-delay polynomial) and ESN (cycle reservoir) + preprocessors
ssm/ StreamingMamba (selective SSM) + MambaPreprocessor
snn/ SpikeNet (f64 wrapper), SpikePreprocessor
serde_support/ Model checkpoint/restore (JSON, bincode)
export_embedded.rs SGBT -> packed binary export for irithyll-core
irithyll-core/ #![no_std] training engine + zero-alloc inference
packed.rs 12-byte PackedNode, EnsembleHeader, TreeEntry
traverse.rs Branch-free tree traversal (single + x4 batch)
view.rs EnsembleView<'a> -- zero-copy inference from &[u8]
quantize.rs f64 -> f32 quantization utilities
error.rs FormatError (no_std compatible)
reservoir/ NG-RC delay buffer, polynomial features, cycle reservoir, PRNG (alloc)
ssm/ Selective SSM: diagonal state, ZOH discretization, projections (alloc)
snn/ SpikeNetFixed: Q1.14 LIF neurons, e-prop, delta encoding (alloc)
irithyll-python/ PyO3 Python bindings
Configuration
| Parameter | Default | Description |
|---|---|---|
n_steps |
100 | Number of boosting steps (trees in ensemble) |
learning_rate |
0.0125 | Shrinkage factor applied to each tree output |
feature_subsample_rate |
0.75 | Fraction of features sampled per tree |
max_depth |
6 | Maximum depth of each streaming tree |
n_bins |
64 | Number of histogram bins per feature |
lambda |
1.0 | L2 regularization on leaf weights |
gamma |
0.0 | Minimum gain required to make a split |
grace_period |
200 | Minimum samples before evaluating splits |
delta |
1e-7 | Hoeffding bound confidence parameter |
drift_detector |
PageHinkley(0.005, 50.0) | Drift detection algorithm for tree replacement |
variant |
Standard | Computational variant (Standard, Skip, MI) |
leaf_half_life |
None (disabled) | EWMA decay half-life for leaf statistics |
max_tree_samples |
None (disabled) | Proactive tree replacement threshold |
split_reeval_interval |
None (disabled) | Re-evaluation interval for max-depth leaves |
Feature Flags
These flags apply to the irithyll crate. irithyll-core has no required features (it is no_std with zero dependencies by default; an optional std feature is available).
| Feature | Default | Description |
|---|---|---|
serde-json |
Yes | JSON model serialization |
serde-bincode |
No | Compact binary serialization (bincode) |
parallel |
No | Rayon-based parallel tree training (ParallelSGBT) |
simd |
No | AVX2 histogram acceleration |
kmeans-binning |
No | K-means histogram binning strategy |
arrow |
No | Apache Arrow RecordBatch integration |
parquet |
No | Parquet file I/O |
onnx |
No | ONNX model export |
neural-leaves |
No | Experimental MLP leaf models |
full |
No | Enable all features |
Neural streaming modules (reservoir, ssm, snn) compile unconditionally -- no feature flag required.
Examples
Run any example with cargo run --example <name>:
| Example | Description |
|---|---|
basic_regression |
Linear regression with RMSE tracking |
classification |
Binary classification with logistic loss |
async_ingestion |
Tokio-native async training with concurrent prediction |
custom_loss |
Implementing a custom loss function |
drift_detection |
Abrupt concept drift with recovery analysis |
model_checkpointing |
Save/restore models with prediction verification |
streaming_metrics |
Prequential evaluation with windowed metrics |
krls_nonlinear |
Kernel regression on sin(x) with ALD sparsification |
ccipca_reduction |
Streaming PCA dimensionality reduction |
rls_confidence |
RLS prediction intervals narrowing over time |
pipeline_composition |
Normalizer + SGBT composable pipeline |
Documentation
- API Reference -- full docs on docs.rs (all features enabled)
- Rustdoc (GitHub Pages) -- latest from master
Minimum Supported Rust Version
The MSRV is 1.75. This is checked in CI and will only be raised in minor version bumps.
References
Gunasekara, N., Pfahringer, B., Gomes, H. M., & Bifet, A. (2024). Gradient boosted trees for evolving data streams. Machine Learning, 113, 3325-3352.
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Banber, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2, 56-67.
Weng, J., Zhang, Y., & Hwang, W.-S. (2003). Candid covariance-free incremental principal component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8), 1034-1040.
Gauthier, D. J., Bollt, E., Griffith, A., & Barbosa, W. A. S. (2021). Next generation reservoir computing. Nature Communications, 12, 5564.
Rodan, A., & Tino, P. (2010). Minimum complexity echo state network. IEEE Transactions on Neural Networks, 23(1), 131-144.
Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.
Bellec, G., Scherr, F., Subramoney, A., Hajek, E., Salaj, D., Legenstein, R., & Maass, W. (2020). A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications, 11, 3625.
Neftci, E. O., Mostafa, H., & Zenke, F. (2019). Surrogate gradient learning in spiking neural networks. IEEE Signal Processing Magazine, 36(6), 51-63.
Jaeger, H. (2001). The "echo state" approach to analysing and training recurrent neural networks. GMD Report 148.
Lukoševičius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127-149.
Martinuzzi, F. (2025). Minimal deterministic echo state networks outperform random reservoirs in learning chaotic dynamics. Chaos, 35.
Sussillo, D., & Abbott, L. F. (2009). Generating coherent patterns of activity from chaotic neural networks. Neuron, 63(4), 544-557.
Yan, M., Huang, C., Bienstman, P., Tino, P., Lin, W., & Sun, J. (2024). Emerging opportunities and challenges for the future of reservoir computing. Nature Communications, 15, 2056.
Gu, A., Goel, K., & Ré, C. (2021). Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396.
Gu, A., Gupta, A., Goel, K., & Ré, C. (2022). On the parameterization and initialization of diagonal state space models. NeurIPS 2022.
Dao, T., & Gu, A. (2024). Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. arXiv preprint arXiv:2405.21060.
Liu, B., Wang, R., Wu, L., Feng, Y., Stone, P., & Liu, Q. (2024). Longhorn: State space models are amortized online learners. NeurIPS 2024.
Kleyko, D., Frady, E. P., Kheffache, M., & Osipov, E. (2020). Integer echo state networks: Efficient reservoir computing for digital hardware. IEEE TNNLS.
Zenke, F., & Ganguli, S. (2018). SuperSpike: Supervised learning in multilayer spiking neural networks. Neural Computation, 30(6), 1514-1541.
Eshraghian, J. K., Ward, M., Neftci, E. O., et al. (2023). Training spiking neural networks using lessons from deep learning. Proceedings of the IEEE, 111(9), 1016-1054.
Frenkel, C., & Indiveri, G. (2022). ReckOn: A 28nm sub-mm² task-agnostic spiking recurrent neural network processor. ISSCC 2022.
Meyer, S. M., et al. (2024). Diagonal state space model on Loihi 2 for efficient streaming. arXiv preprint arXiv:2409.15022.
Jaeger, H., Lukoševičius, M., Popovici, D., & Siewert, U. (2007). Optimization and applications of echo state networks with leaky-integrator neurons. Neural Networks, 20(3), 335-352.
Javed, K., Shah, H., Sutton, R., & White, M. (2023). Scalable real-time recurrent learning. arXiv preprint arXiv:2302.05326.
Yang, S., Wang, B., Shen, Y., Panda, R., & Kim, Y. (2023). Gated linear attention transformers with hardware-efficient training. arXiv preprint arXiv:2312.06635.
Yang, S., et al. (2024). Gated Delta Networks: Improving Mamba2 with Delta Rule. arXiv preprint arXiv:2412.06464.
Peng, B., et al. (2024). Eagle and Finch: RWKV with matrix-valued states and dynamic recurrence. arXiv preprint arXiv:2404.05892.
Beck, M., et al. (2024). xLSTM: Extended long short-term memory. NeurIPS 2024.
Sun, Y., et al. (2023). Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621.
De, S., Smith, S. L., et al. (2024). Griffin: Mixing gated linear recurrences with local attention for efficient language models. arXiv preprint arXiv:2402.19427.
Kirkpatrick, J., et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS, 114(13), 3521-3526.
Dohare, S., et al. (2024). Loss of plasticity in deep continual learning. Nature, 632, 768-774.
Angelopoulos, A. N., Candes, E. J., & Tibshirani, R. J. (2023). Conformal PID control for time series prediction. NeurIPS 2023.
Bhatnagar, A., Wang, H., Xiong, C., & Bai, Y. (2023). Improved online conformal prediction via strongly adaptive online learning. ICML 2023.
Gupta, C., & Ramdas, A. (2023). Online Platt scaling with calibeating. ICML 2023.
Wu, Q., Iyer, C., & Wang, C. (2021). ChaCha for online AutoML. ICML 2021.
Qi, Y., et al. (2023). Discounted Thompson Sampling for non-stationary bandits. arXiv preprint arXiv:2305.10718.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.