1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
//! Attention mechanisms for transformer models
//!
//! Extracted from layers/mod.rs (PMAT-802) to reduce module size.
//! Contains:
//! - Attention: Basic scaled dot-product attention
//! - SlidingWindowAttention: Efficient attention with fixed window size
//! - FusedQKVAttention: FlashAttention-style tiled attention
//! - MultiHeadAttention: Full multi-head attention with Q/K/V projections
use crate::;
use ;
/// Scaled dot-product attention
///
/// Computes attention as:
/// ```text
/// Attention(Q, K, V) = softmax(Q @ K.T / sqrt(d_k)) @ V
/// ```
///
/// This is a building block for multi-head attention.
///
/// # References
///
/// "Attention is All You Need" - Vaswani et al., 2017
include!;
include!;
include!;
include!;
include!;