Expand description
Utility functions for attention mechanisms.
This module provides common utilities like softmax, masking, and numerical stability helpers used across attention implementations.
Functionsยง
- add_
vectors - Adds two vectors element-wise.
- apply_
causal_ mask - Applies causal masking to attention scores.
- apply_
dropout - Applies dropout to a vector during training.
- dot_
product - Computes dot product between two vectors.
- l2_norm
- Computes L2 norm of a vector.
- masked_
softmax - Computes softmax with masking support.
- normalize_
vector - Normalizes a vector to unit length.
- scale_
vector - Scales a vector by a scalar value.
- softmax
- Computes softmax over a slice of values.
- stable_
softmax - Stable softmax that returns Vec
directly (no Result) Used by sparse, moe, and graph modules