Expand description
Flash attention - memory-efficient attention with tiled computation
Memory: O(block_size) for attention matrix instead of O(n²)
Structs§
- Flash
Attention - Flash attention with block-wise computation
Flash attention - memory-efficient attention with tiled computation
Memory: O(block_size) for attention matrix instead of O(n²)