Crate burn_attention

Crate burn_attention

Expand description

Flash Attention v3 implementation for Burn framework

This crate provides an efficient implementation of Flash Attention v3, optimized for different backends (CubeCL, CUDA, WGPU).

§Example

use burn::tensor::{Tensor, backend::Backend};
use burn_attention::FlashAttentionV3;

fn example<B: Backend>(
    query: Tensor<B, 4>,
    key: Tensor<B, 4>,
    value: Tensor<B, 4>,
) -> Tensor<B, 4> {
    FlashAttentionV3::forward(query, key, value, None, false)
}

Structs§

FlashAttentionV3: Flash Attention v3 implementation
FlashAttentionV3Config: Flash Attention v3 configuration