irithyll 10.0.0

Streaming ML in Rust -- gradient boosted trees, neural architectures (TTT/KAN/MoE/Mamba/SNN), AutoML, kernel methods, and composable pipelines
docs.rs failed to build irithyll-10.0.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: irithyll-10.0.1
.__       .__  __  .__           .__  .__
│__│______│__│╱  │_│  │__ ___.__.│  │ │  │
│  ╲_  __ ╲  ╲   __╲  │  <   │  ││  │ │  │
│  ││  │ ╲╱  ││  │ │   Y  ╲___  ││  │_│  │__
│__││__│  │__││__│ │___│  ╱ ____││____╱____╱
                        ╲╱╲╱

Crates.io Documentation CI License MSRV GitHub stars

Streaming machine learning in Rust. Gradient-boosted trees, neural streaming architectures, kernel methods, linear models — all behind a single StreamingLearner trait, all learning one sample at a time, all running in O(1) memory.


What it is

irithyll is a streaming ML library for the case where data arrives in order and never stops. There is no training set. There is no batch loop. Every sample updates the model and is then released — no buffer, no replay. The same idea ties together gradient-boosted trees, recurrent state-space models, kernel regression, attention, spiking networks, and every preprocessor that feeds them. They all wear the same two-method coat: train_one(features, target, weight) and predict(features) -> f64. A Box<dyn StreamingLearner> is a fully typed model.

The library is structured as two crates that share a vocabulary but not their constraints. The full crate (irithyll) does training, async ingestion, drift detection, AutoML — everything that benefits from std. The packed crate (irithyll-core) is #![no_std], runs zero-allocation inference on bare metal, and serializes a trained tree as 12-byte nodes that traverse branch-free. Train on the cloud, export, run on a Cortex-M0+. The boundary is hard and tested against thumbv6m-none-eabi.

It is a deliberate library — every threshold derives from a paper, every neural readout is bounded before it touches the linear head, every config field round-trips through a builder that validates rather than accepts. Where the literature gives an option, the option becomes a feature flag, not a default. The aesthetic is a frozen city: cold, ordered, lit from inside.

The library serves four cases primarily: edge inference at sample rate, online forecasting under concept drift, embedded learning where the dataset would never fit in RAM, and research benches where a new streaming architecture lands beside SGBT and is held to the same throughput and accuracy floor.

Quick Start

cargo add irithyll

Four snippets, in order of how a streaming pipeline grows.

The smallest useful thing — normalize, boost, predict.

use irithyll::{pipe, normalizer, sgbt, StreamingLearner};

let mut model = pipe(normalizer()).learner(sgbt(50, 0.01));
model.train(&[100.0, 0.5, 42.0], 3.14);
let pred = model.predict(&[100.0, 0.5, 42.0]);

Race three model families against each other — let the data choose.

use irithyll::{automl::{AutoTuner, Factory}, StreamingLearner};

let mut tuner = AutoTuner::builder()
    .add_factory(Factory::sgbt(5))
    .add_factory(Factory::mamba(5))
    .add_factory(Factory::esn())
    .use_drift_rerace(true)
    .build();

tuner.train(&[1.0, 2.0, 3.0], 6.0);
let pred = tuner.predict(&[1.0, 2.0, 3.0]);

Mix architectures inside a single mixture-of-experts — heterogeneous experts welcome.

use irithyll::{moe::NeuralMoE, sgbt, esn, StreamingLearner};

let mut moe = NeuralMoE::builder()
    .expert(sgbt(50, 0.01))
    .expert_with_warmup(esn(100, 0.9), 50)
    .top_k(2)
    .build();

Turn any regressor into a classifier — binary_classifier and multiclass_classifier wrap a StreamingLearner with bipolar one-vs-rest heads.

use irithyll::{sgbt, binary_classifier, StreamingLearner};

let mut clf = binary_classifier(sgbt(50, 0.05));
clf.train(&[1.5, -0.3, 2.1], 1.0);            // labels are 0.0 / 1.0
let prob_positive = clf.predict(&[1.5, -0.3, 2.1]);

Composition is the point. Anything that implements StreamingLearner slots into a pipeline, an MoE expert, an AutoML candidate, a projection wrapper, or a classification head. The trait is the contract; the rest is LEGO.

For the longer ergonomics story — pipeline composition, AutoML tournaments, drift wiring, embedded deployment — see docs/USAGE.md.

Design Principles

The library has opinions. They are stable across releases and they shape every model.

One sample at a time, every time. No mini-batches hidden inside train_one. No "warm up the optimizer with a buffer first". Streaming-only models stay streaming. Architectures that originally required offline training (TTT, KAN, Mamba) are reimplemented with online updates that converge sample-by-sample — and tested for it.

O(1) memory per model. State size is a function of the model, not the data seen. A model that has trained on a billion samples occupies the same memory as one that has trained on a thousand. Drift detectors are bounded ring buffers; histograms have fixed bin counts; subspace trackers carry rank-k projections, not covariance matrices.

Bounded readouts before linear heads. Every neural model that feeds a recursive least squares head bounds its features first — tanh, sigmoid, L2-normalize, clamp. Unbounded features explode the RLS weights silently. This is non-negotiable; new neural architectures land with the bounding step or they don't land.

Constants come from theory, not from grid search. Bernstein bounds for promotion tests, the Hoeffding inequality for split decisions, the PAST update for streaming PCA. Where a paper gives a constant, the constant cites the paper. Where it doesn't, the library prefers a self-calibrating online statistic over a magic number.

Validation is a builder's job. Every public Config carries a Builder that returns Result<_, ConfigError>. Bounds are checked before the model is constructed; impossible configurations don't get the chance to misbehave.

Forbid unsafe in the main crate. irithyll has #![forbid(unsafe_code)] at its root — the entire training-side surface is safe Rust. irithyll-core has localized unsafe for two earned reasons: zero-copy view parsing of the packed binary format and AVX2 SIMD intrinsics behind the simd-avx2 feature. Each block carries a safety comment that names its precondition; nothing else is unsafe.

Workspace

Crate What it does no_std
irithyll Training, streaming algorithms, pipelines, async I/O, AutoML No
irithyll-core Packed inference engine — 12-byte nodes, branch-free traversal, zero-alloc Yes
irithyll-python PyO3 bindings — AutoTuner, ProjectedLearner, factory variants No

irithyll-core cross-compiles for bare-metal targets — thumbv6m-none-eabi (Cortex-M0+), thumbv7m-none-eabi (M3), and thumbv7em-none-eabi (M4) all green in CI. Its only dependency is libm for soft-float math; everything else (SIMD, parallel, serde) is opt-in. Train with the full crate, export to packed format, run inference on the microcontroller — same predictions, no surprises.

Models

irithyll's model lineup spans four tiers. Production models are the ones you reach for first: streaming gradient-boosted trees with drift-driven tree replacement, recursive least squares with confidence intervals, kernel RLS, Mondrian forests, classical baselines. The neural tier is where the library has spent most of its recent design budget — selective state-space models, test-time-trained recurrent networks, Kolmogorov-Arnold networks, spiking networks, and a streaming linear-attention layer that exposes twelve distinct attention modes (RetNet, Hawk/Griffin, GLA, GLAVector, DeltaNet, GatedDeltaNet, RWKV, RWKV-7, mLSTM, DeltaProduct, HGRN2, log-linear). Specialized tools cover conformal prediction, anomaly detection, online projection learning, packed inference, and TreeSHAP. Ensembles compose all of the above.

Every algorithm implements StreamingLearner. Every neural model is online-trainable end-to-end — no offline pretraining required. None of the readouts are unbounded; every feature feeding a recursive least squares head is squashed, normalized, or clamped, because that is the difference between a streaming model and one that diverges quietly on the first heavy-tailed sample.

Tier What it contains
Production SGBT family (SGBT, DistributionalSGBT, BaggedSGBT, MulticlassSGBT, ParallelSGBT), RecursiveLeastSquares, KRLS, Mondrian forests, Hoeffding trees, Gaussian Naive Bayes, linear / polynomial models
Neural Mamba family (V1 / V3 / Mamba-3), Echo State Networks, Next-Gen Reservoir Computing, StreamingTTT, StreamingKAN / T-KAN, AGMP, mGRADE, HGRN2, sLSTM, SpikeNet (e-prop + surrogate gradients), StreamingAttentionModel (12 modes)
Specialized Packed inference (irithyll-core), conformal prediction with PID control, anomaly detection, ProjectedLearner (online subspace tracking via PAST), TreeSHAP
Ensemble NeuralMoE (heterogeneous experts, top-k routing, drift-aware), streaming AutoML (AutoTuner, tournament racing, drift re-racing, complexity-adjusted elimination)

Classification works on top of regression: binary_classifier(model) and multiclass_classifier(model, k) wrap any StreamingLearner with bipolar one-vs-rest heads.

For per-model architecture, paper citations, when-to-use guidance, math summaries, and config references, see MODELS.md.

Drift Handling

The world distribution shifts; streaming models that don't notice are streaming models that lie. irithyll treats drift as a first-class signal, not a recovery story.

Three detectors ship in irithyll::drift: ADWIN (Bifet & Gavaldà 2007) for adaptive windowing, DDM (Gama et al. 2004) for the warning-and-drift two-stage state machine, and Page-Hinkley for cumulative-deviation tests. They expose a single update(error) -> DriftState interface, plug into any model that takes a Box<dyn DriftDetector>, and respond to adjust_config() calls when AutoML wants to widen the learning rate during a re-race.

Inside SGBT, drift drives tree replacement: each boosting stage carries a detector watching its standardized residual; when drift fires, that stage's tree is replaced with a fresh alternate that warms up in parallel before promotion. The ensemble keeps predicting throughout — there is no rebuild pause.

Inside AutoML, drift drives re-racing: the AutoTuner re-evaluates challenger configurations against the champion when the residual distribution shifts, with the comparison gated by an empirical Bernstein promotion test (bernstein_promotion_test in automl::racing) so the champion never flips on noise.

Bare-Metal Deployment

The packed inference path is a deliberate boundary: train with the full crate, export to a 12-byte-per-node binary representation, deserialize on a microcontroller in pure #![no_std] (no allocator required), and predict.

// On the host: train, then export to packed bytes.
use irithyll::{SGBT, SGBTConfig, StreamingLearner};
use irithyll::export_embedded::export_packed;

let mut model = SGBT::new(SGBTConfig::builder().n_steps(50).build().unwrap());
// ... train on a stream ...
let packed_bytes: Vec<u8> = export_packed(&model, /* n_features */ 4);
// Write to flash, ship to device.
// On the device: zero-copy view over the bytes. No std, no allocation in predict.
#![no_std]
use irithyll_core::EnsembleView;

let view = EnsembleView::from_bytes(PACKED_BYTES).unwrap();
let prediction: f32 = view.predict(&[0.5, 1.2, -0.3, 0.1]);

Validation happens once in from_bytes (magic bytes, child-index bounds, feature-index bounds); after that, prediction is pure pointer arithmetic. Five nodes fit per 64-byte cache line, learning rate is baked into leaf values at export time, and an 8-byte int16-quantized variant (export_packed_i16 + QuantizedEnsembleView) eliminates floats from the inference hot loop entirely. The crate's only dependency is libm. CI cross-compiles for all three Cortex-M targets on every commit.

Feature Flags

irithyll-core's default build is pure no_std — no allocator, no std, just libm for soft-float math. Opt-in features (alloc, std, serde, simd, simd-avx2, parallel) extend it as needed; the device-side inference path on the previous page runs in the strictest mode. Neural streaming modules in the main crate compile unconditionally — no flag required.

Feature Default Description
serde-json Yes JSON model serialization
serde-bincode No Compact binary serialization
parallel No Rayon-based parallel tree training (ParallelSGBT)
simd No Generic SIMD acceleration
simd-avx2 No AVX2 histogram + neural ops (x86_64 only)
kmeans-binning No K-means histogram binning strategy
arrow No Apache Arrow RecordBatch integration
parquet No Parquet file I/O
onnx No ONNX model export
neural-leaves No Experimental MLP leaf models
full No Everything above

TUI

irithyll ships a terminal dashboard for live monitoring of streaming model state, prequential metrics, and drift events. Mins, maxes, percentile envelopes, drift markers, AutoML leaderboards — rendered with ratatui, refreshed at the rate the model trains. It is the cheapest way to feel whether your model is learning.

irithyll TUI

irithyll TUI demo

Throttled for demo.

# Multi-family demo on a built-in regression benchmark.

irithyll                                     # SGBT on Friedman

irithyll --family kan --bench mackey-glass   # KAN on Mackey-Glass chaos

irithyll --family mamba --bench lorenz       # Mamba on the Lorenz attractor


# Train your own CSV with the live dashboard. Any of the eight supported

# families works the same way — swap --model-type to switch.

irithyll train data.csv --tui --model-type sgbt

irithyll train data.csv --tui --model-type kan

irithyll eval data.csv  --tui --model-type mamba

Built-in benchmarks: friedman, lorenz, mackey-glass, periodic, mqar, needle. Supported families for --tui: sgbt, mamba, ttt, kan, esn, ngrc, spike-net. Per-feature importance ships for SGBT, KAN, and Linear; the reservoir/SSM/spiking families show a "not exposed" placeholder in the importances tab.

References

The implementations cite their sources. The list below is the load-bearing core — papers whose math directly shapes a model in irithyll. The complete bibliography (foundations, related work, surveys) lives in REFERENCES.md.

Streaming Boosting and Trees

  • Gunasekara, Pfahringer, Gomes, Bifet (2024). Gradient boosted trees for evolving data streams. Machine Learning, 113, 3325-3352.
  • Domingos, Hulten (2000). Mining high-speed data streams. KDD 2000. — Hoeffding bound for online splits.
  • Bifet, Gavaldà (2007). Learning from time-changing data with adaptive windowing. SIAM SDM 2007. — ADWIN.
  • Lundberg et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2, 56-67. — TreeSHAP.

State-Space Models and Recurrent Networks

  • Gu, Dao (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752.
  • Dao, Gu (2024). Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. arXiv:2405.21060.
  • Gu, Gupta, Goel, Ré (2022). On the parameterization and initialization of diagonal state space models. NeurIPS 2022. — S4D-Inv.
  • Beck et al. (2024). xLSTM: Extended long short-term memory. NeurIPS 2024. — mLSTM / sLSTM.

Streaming Linear Attention

  • Yang, Wang, Shen, Panda, Kim (2023). Gated linear attention transformers with hardware-efficient training. arXiv:2312.06635. — GLA.
  • Yang et al. (2024). Gated Delta Networks: Improving Mamba2 with Delta Rule. arXiv:2412.06464. — DeltaNet / GatedDeltaNet.
  • Sun et al. (2023). Retentive network: A successor to transformer for large language models. arXiv:2307.08621. — RetNet.
  • De et al. (2024). Griffin: Mixing gated linear recurrences with local attention. arXiv:2402.19427. — Hawk.
  • Peng et al. (2024). Eagle and Finch: RWKV with matrix-valued states and dynamic recurrence. arXiv:2404.05892. — RWKV.

Test-Time Training, KAN, Reservoir, Spiking

  • Sun et al. (2024). Learning to (Learn at Test Time): RNNs with expressive hidden states. ICML 2025. — StreamingTTT.
  • Behrouz, Zhong, Mirrokni (2025). Titans: Learning to memorize at test time. arXiv:2501.00663. — momentum + weight-decay TTT.
  • Liu et al. (2024). KAN: Kolmogorov-Arnold Networks. ICLR 2025.
  • Hoang et al. (2026). Ultrafast on-chip online learning via Kolmogorov-Arnold Networks. arXiv:2602.02056. — streaming convergence.
  • Gauthier, Bollt, Griffith, Barbosa (2021). Next generation reservoir computing. Nature Communications, 12, 5564.
  • Rodan, Tiňo (2010). Minimum complexity echo state network. IEEE TNN, 23(1).
  • Bellec et al. (2020). A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications, 11, 3625. — e-prop.

Mixture-of-Experts and AutoML

  • Shazeer et al. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. ICLR 2017.
  • Aspis et al. (2025). DriftMoE: Mixture of experts for streaming classification with concept drift. ECMLPKDD 2025.
  • Wu, Iyer, Wang (2021). ChaCha for online AutoML. ICML 2021.
  • Qi et al. (2023). Discounted Thompson Sampling for non-stationary bandits. arXiv:2305.10718.

Continual Learning, Conformal, Projection

  • Dohare et al. (2024). Loss of plasticity in deep continual learning. Nature, 632, 768-774.
  • Kirkpatrick et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS, 114(13). — EWC.
  • Angelopoulos, Candes, Tibshirani (2023). Conformal PID control for time series prediction. NeurIPS 2023.
  • Yang (1995). Projection approximation subspace tracking. IEEE TSP, 43(1). — PAST.

Further Reading

Document Contents
MODELS.md Per-model architecture, paper citation, when-to-use, math summary, config reference
docs/USAGE.md Extended ergonomics — pipelines, AutoML, MoE composition, embedded deployment
BENCHMARKS.md Benchmark methodology, datasets, throughput numbers, Pareto plots
REFERENCES.md Complete bibliography, organized by tier
examples/ Runnable examples, organized 01_quickstart02_essentials03_neural04_advanced
CHANGELOG.md Release history
CONTRIBUTING.md Contribution guide and code standards
docs.rs Full API reference

License

Licensed under either of

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

MSRV: 1.75. Checked in CI; raised only in minor version bumps.