Expand description
Flash Attention v3 implementation for Burn framework
This crate provides an efficient implementation of Flash Attention v3, optimized for different backends (CubeCL, CUDA, WGPU).
§Example
use burn::tensor::{Tensor, backend::Backend};
use burn_attention::FlashAttentionV3;
fn example<B: Backend>(
query: Tensor<B, 4>,
key: Tensor<B, 4>,
value: Tensor<B, 4>,
) -> Tensor<B, 4> {
FlashAttentionV3::forward(query, key, value, None, false)
}Structs§
- Flash
Attention V3 - Flash Attention v3 implementation
- Flash
Attention V3Config - Flash Attention v3 configuration