docs.rs failed to build candle-cudnn-attn-0.0.3
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
candle-cudnn-attn
cuDNN SDPA attention for Candle with THD varlen layout.
Status
- THD layout supported:
(total_tokens, num_heads, head_dim) - Varlen supported via cumulative sequence lengths (
cu_seqlens) causal = falseandcausal = trueboth supported- Forward pass implemented
Requirements
- CUDA 12+
- cuDNN 9.19+ (tested)
- GPU with modern cuDNN SDPA support (Ampere+ recommended)
This crate links cuDNN dynamically (dylib).
API
Output shape is (total_tokens, heads, dim).
Minimal Usage
use ;
use flash_attn_varlen;
let device = new_cuda?;
let = ;
let q = randn?.to_dtype?;
let k = randn?.to_dtype?;
let v = randn?.to_dtype?;
// Example: 4 sequences of length 128 -> [0, 128, 256, 384, 512]
let seqlens = new?;
let out = flash_attn_varlen?;
Build Notes
build.rs auto-detects headers/libs in common system paths. If needed, set:
CUDA_INCLUDE_DIRCUDNN_INCLUDE_DIRCUDNN_LIB_DIRCUDNN_FRONTEND_INCLUDE_DIR
Validation
- Example:
cargo run --example basic_attention - Tests:
cargo test --package candle-cudnn-attn
License
Apache-2.0 OR MIT