Skip to main content

Module kernels

Module kernels 

Source
Expand description

GPU kernels for BitNet quantization operations.

This module provides CubeCL-based GPU kernels for efficient ternary weight x activation matrix multiplication.

§Kernels

  • absmean_quantize - Quantize weights to ternary {-1, 0, +1}
  • ternary_dequantize - Convert ternary back to float
  • ternary_matmul_gpu - Optimized ternary matmul (no multiply ops!)
  • packed_ternary_matmul - 2-bit packed weights for reduced bandwidth
  • bitlinear_forward - Fused LayerNorm + ternary matmul

§Feature Gate

Requires the cuda feature to be enabled:

[dependencies]
bitnet-quantize = { version = "0.1", features = ["cuda"] }

Functions§

cuda_available
Check if CUDA kernels are available.