burn_dsa
burn_dsa contains Burn-native dynamic sparse attention blocks and the
CubeCL-backed sparse attention kernel boundary used by JEPA autocode
experiments.
The crate keeps the reference block and backend-specific kernel executor separate:
DsaBlockowns the projection, selector, sparse attention, normalization, and MLP path.DsaExecutor::Referenceruns the portable Burn graph path on any backend.DsaExecutor::CudaKernelandDsaExecutor::WgpuKernelrequest the CubeCL sparse attention kernels when compiled withcuda-kernelorwgpu-kernel.- Unsupported tensor element types or unsupported backend/executor combinations fall back to the reference graph.
The custom kernels are f32-specialized. The crate exposes CUDA aliases for f16, bf16, and flex32 backends so callers can identify those configurations, but custom sparse attention execution is only selected for f32 Cube backends.
Full-topk dynamic sparse attention is dense attention. DsaBlock routes that
case through the backend dense attention graph instead of the scalar sparse
pair-backward path so high-token training remains compute dense.