Skip to main content

Module flash_attention

Module flash_attention 

Source
Expand description

flash_attention op-diff harness — see crate::op_diff.

Dense causal attention. CpuBackend implements the reference; Metal/CUDA run their flash kernels against the same Q/K/V.

Structs§

FlashAttentionOp