Skip to main content

Module attention

Module attention 

Source
Expand description

Streaming linear attention models.

This module provides StreamingAttentionModel, a streaming machine learning model that uses multi-head linear attention as its temporal feature extractor, feeding into a Recursive Least Squares (RLS) readout layer. It integrates with irithyll’s StreamingLearner trait and StreamingPreprocessor trait.

§Architecture

input features ──→ [MultiHeadAttention] ──→ temporal features ──→ [RLS] ──→ prediction
  (d_model)            (recurrent state)       (d_model)          (1)

The attention layer processes each feature vector as a timestep, maintaining per-head recurrent state that captures temporal dependencies via linear attention mechanisms (RetNet, Hawk, GLA, DeltaNet, GatedDeltaNet, RWKV, mLSTM). The RLS readout learns a linear mapping from the attention output to the target.

§Components

§Example

use irithyll::attention::{StreamingAttentionModel, StreamingAttentionConfig, AttentionMode};
use irithyll::learner::StreamingLearner;

let config = StreamingAttentionConfig::builder()
    .d_model(8)
    .n_heads(2)
    .mode(AttentionMode::GLA)
    .build()
    .unwrap();

let mut model = StreamingAttentionModel::new(config);
model.train(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], 5.0);
let pred = model.predict(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]);
assert!(pred.is_finite());

Re-exports§

pub use attention_config::StreamingAttentionConfig;
pub use attention_config::StreamingAttentionConfigBuilder;
pub use attention_preprocessor::AttentionPreprocessor;
pub use streaming_attention::StreamingAttentionModel;

Modules§

attention_config
Configuration and builder for StreamingAttentionModel.
attention_preprocessor
Attention-based streaming preprocessor for pipeline composition.
streaming_attention
Streaming linear attention model: multi-head attention + RLS readout.

Structs§

AttentionConfig
Full configuration for a multi-head streaming attention layer.
MultiHeadAttention
Multi-head streaming linear attention layer.

Enums§

AttentionMode
Selects the attention architecture variant.

Traits§

AttentionLayer
Trait for streaming attention layers.

Functions§

delta_net
Create a Gated DeltaNet model (NVIDIA, strongest retrieval).
gla
Create a Gated Linear Attention model (SOTA streaming attention).
hawk
Create a Hawk model (lightest, vector state).
ret_net
Create a RetNet model (simplest, fixed decay).
streaming_attention
Create a generic streaming attention model with any mode.