Module flash

Module flash 

Source
Expand description

Flash attention - memory-efficient attention with tiled computation

Memory: O(block_size) for attention matrix instead of O(n²)

Structs§

FlashAttention
Flash attention with block-wise computation