burn_dsa 0.21.0

Burn-native dynamic sparse attention reference blocks and kernel boundary
Documentation

burn_dsa

burn_dsa contains Burn-native dynamic sparse attention blocks and the CubeCL-backed sparse attention kernel boundary used by JEPA autocode experiments.

The crate keeps the reference block and backend-specific kernel executor separate:

  • DsaBlock owns the projection, selector, sparse attention, normalization, and MLP path.
  • DsaExecutor::Reference runs the portable Burn graph path on any backend.
  • DsaExecutor::CudaKernel and DsaExecutor::WgpuKernel request the CubeCL sparse attention kernels when compiled with cuda-kernel or wgpu-kernel.
  • Unsupported tensor element types or unsupported backend/executor combinations fall back to the reference graph.

The custom kernels are f32-specialized. The crate exposes CUDA aliases for f16, bf16, and flex32 backends so callers can identify those configurations, but custom sparse attention execution is only selected for f32 Cube backends.

Full-topk dynamic sparse attention is dense attention. DsaBlock routes that case through the backend dense attention graph instead of the scalar sparse pair-backward path so high-token training remains compute dense.