hanzo-kernel 0.2.9

Hanzo's first-party GPU kernel DSL: one Rust source, lowered to CUDA/ROCm/Vulkan/Metal.
docs.rs failed to build hanzo-kernel-0.2.9
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

hanzo-kernel: Hanzo's first-party GPU kernel DSL.

Write a kernel ONCE in Rust; it lowers to CUDA / ROCm / Vulkan / Metal. This crate is the first-party facade over the CubeCL lowering engine: kernel source names only hanzo_kernel::*, never cubecl. "Values, not places" -- CubeCL is the implementation value; hanzo_kernel is the stable namespace we build against, so the engine can be upgraded (or forked) without touching a single kernel.

The perf primitives our hand-tuned kernels rely on are all here and lower to the native instruction on each backend:

  • dot on a vectorized Line<i8> -> dp4a / OpSDotAccSat (int8 4-way dot)
  • cmma -> tensor cores (WMMA / cooperative-matrix / simdgroup)
  • SharedMemory, plane_* (subgroup reduce/shuffle), Atomic, barriers

MIGRATION POLICY (perf-gated, never a downgrade): the DSL provides COVERAGE -- every quant type on every backend, from one source, killing the "same op, N impls, N numeric behaviors" bug class. A hand-tuned kernel is replaced by its DSL twin ONLY when the DSL version is bit-exact AND within perf noise (bench-gated). Where a hand-tuned kernel still wins, it stays as the specialized peak path and the DSL is the portable fallback for the backends that lack a tuned version.