Crate burn_attention

Crate burn_attention 

Source
Expand description

Flash Attention v3 implementation for Burn framework

This crate provides an efficient implementation of Flash Attention v3, optimized for different backends (CubeCL, CUDA, WGPU).

§Example

use burn::tensor::{Tensor, backend::Backend};
use burn_attention::FlashAttentionV3;

fn example<B: Backend>(
    query: Tensor<B, 4>,
    key: Tensor<B, 4>,
    value: Tensor<B, 4>,
) -> Tensor<B, 4> {
    FlashAttentionV3::forward(query, key, value, None, false)
}

Structs§

FlashAttentionV3
Flash Attention v3 implementation
FlashAttentionV3Config
Flash Attention v3 configuration