pub struct AlibiPositionEncoding {
pub config: PositionEncodingConfig,
}Expand description
ALiBi (Attention with Linear Biases)
Used in models like BLOOM. Instead of adding position embeddings to inputs, ALiBi adds a bias to attention scores that linearly penalizes distance. This allows extrapolation to longer sequences than seen during training.
Reference: “Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation” https://arxiv.org/abs/2108.12409
Fields§
§config: PositionEncodingConfigConfiguration
Implementations§
Source§impl AlibiPositionEncoding
impl AlibiPositionEncoding
Sourcepub fn new(config: PositionEncodingConfig) -> Result<Self>
pub fn new(config: PositionEncodingConfig) -> Result<Self>
Create a new ALiBi position encoding
Sourcepub fn build_bias_graph(&self, graph: &mut EinsumGraph) -> Result<Vec<usize>>
pub fn build_bias_graph(&self, graph: &mut EinsumGraph) -> Result<Vec<usize>>
Build einsum graph for ALiBi bias
ALiBi adds linear biases to attention scores based on query-key distance:
bias(i, j) = -m * |i - j|
where m is a head-specific slope
Input tensors:
- 0: attention_scores
[batch, n_heads, seq_len, seq_len] - 1: alibi_slopes
[n_heads](precomputed slopes, one per head) - 2: distance_matrix
[seq_len, seq_len](|i - j|for all positions)
Output tensors:
- output:
[batch, n_heads, seq_len, seq_len](scores with ALiBi bias)
Sourcepub fn compute_slopes(&self) -> Vec<f64>
pub fn compute_slopes(&self) -> Vec<f64>
Compute ALiBi slopes for each attention head
Slopes are computed as: m_i = 2^(-8i/n) for i in 1..n_heads This gives different rates of distance penalty per head
Trait Implementations§
Source§impl Clone for AlibiPositionEncoding
impl Clone for AlibiPositionEncoding
Source§fn clone(&self) -> AlibiPositionEncoding
fn clone(&self) -> AlibiPositionEncoding
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more