1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// SPDX-License-Identifier: MIT
// Copyright 2026 Tyler Zervas
//! `CubeCL` GPU kernel implementations for Flash Attention.
//!
//! This module provides memory-efficient GPU kernels using `CubeCL` v0.8.1.
//! The implementation follows the Flash Attention 2 algorithm with online softmax
//! for O(N) memory complexity instead of O(N²).
//!
//! ## Module Structure
//!
//! - [`config`] - Kernel configuration (tile sizes, launch parameters)
//! - [`interop`] - Candle ↔ `CubeCL` tensor conversion utilities
//! - [`kernel`] - Actual Flash Attention `CubeCL` kernel implementation
//!
//! ## Hardware Targets
//!
//! - **Phase 1**: `GeForce` RTX 5080 (primary development)
//! - **Phase 2**: `GeForce` RTX 3090 Ti (validation and tuning)
//! - **Future**: A100/H100, AMD MI series, WGPU/CPU backends
//!
//! ## Usage
//!
//! ```rust,ignore
//! use unsloth_rs::kernels::cubecl::{flash_attention_kernel, FlashAttentionConfig};
//!
//! let config = FlashAttentionConfig::default();
//! let output = flash_attention_kernel(&q, &k, &v, scale, mask, &config)?;
//! ```
//!
//! ## Implementation Status
//!
//! See [`FLASH_ATTENTION_IMPLEMENTATION_STATUS.md`] for current progress.
pub use FlashAttentionConfig;
pub use ;
pub use flash_attention_kernel;