pub fn default_lambda_init(max_levels: usize) -> f64Available on crate feature
alloc only.Expand description
Default initial λ for AttentionMode::LogLinear. With Σ λ ≤ 1
after softplus-softmax mixing, an init of 1/max_levels makes
the un-trained mixture uniform — every level contributes
equally. Paper §3.3 (R1 §5.3) notes: in the streaming setting
without backprop, the λ projection is fixed at init time, so a
uniform mixture is the principled choice when no information
about which levels are useful is available.