Skip to main content

default_lambda_init

Function default_lambda_init 

Source
pub fn default_lambda_init(max_levels: usize) -> f64
Available on crate feature alloc only.
Expand description

Default initial λ for AttentionMode::LogLinear. With Σ λ ≤ 1 after softplus-softmax mixing, an init of 1/max_levels makes the un-trained mixture uniform — every level contributes equally. Paper §3.3 (R1 §5.3) notes: in the streaming setting without backprop, the λ projection is fixed at init time, so a uniform mixture is the principled choice when no information about which levels are useful is available.