1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
//! Unified streaming linear attention engine.
//!
//! Covers RetNet, Hawk, GLA, DeltaNet, GatedDeltaNet, RWKV, and mLSTM
//! as configuration variants of one core engine. Based on the State Space
//! Duality (Dao & Gu, ICML 2024) proving these are all equivalent structured
//! linear attention with different parameterizations.
//!
//! # Architecture
//!
//! All seven architectures share a common recurrence:
//!
//! ```text
//! S_t = decay_t * S_{t-1} + update_t (state update)
//! o_t = query_fn(x_t, S_t) (output)
//! ```
//!
//! The difference between architectures lies only in how decay, update, and
//! query are computed -- which this module captures via [`AttentionMode`].
//!
//! # Modules
//!
//! - [`config`] -- Configuration types and mode enum
//! - [`state`] -- Vector and matrix state containers
//! - [`gating`] -- Gate computation functions (fixed, sigmoid, exponential, LSTM)
//! - [`update_rules`] -- State update functions for each architecture variant
//! - [`multi_head`] -- Multi-head attention composing heads with output projection
pub use ;
pub use ;
pub use LogLinearState;
pub use MultiHeadAttention;
pub use AttentionState;
use Vec;
/// Trait for streaming attention layers.
///
/// Implementors maintain internal state that evolves with each call to
/// [`forward`](AttentionLayer::forward). The state captures temporal patterns
/// from the input sequence without requiring storage of past observations.
///
/// # Thread Safety
///
/// All attention layers are `Send + Sync`, enabling use in async pipelines
/// and parallel prediction contexts.