//! GPU-resident attention operations for transformer architectures.
//!
//! This module contains batched and incremental multi-head attention implementations
//! that operate entirely on GPU with zero intermediate host transfers.
//!
//! # Implementations
//!
//! - `batched_multihead_attention` - Standard per-head attention processing
//! - `batched_multihead_attention_optimized` - Optimized batched attention (WAPR-PERF-008)
//! - `incremental_attention_gpu` - Autoregressive decoder attention (WAPR-PERF-013)
//! - `kv_cache_scatter_gpu` - KV cache scatter operation
pub use ;
pub use ;