Expand description
GPU kernels for BitNet quantization operations.
This module provides CubeCL-based GPU kernels for efficient ternary weight x activation matrix multiplication.
§Kernels
absmean_quantize- Quantize weights to ternary {-1, 0, +1}ternary_dequantize- Convert ternary back to floatternary_matmul_gpu- Optimized ternary matmul (no multiply ops!)packed_ternary_matmul- 2-bit packed weights for reduced bandwidthbitlinear_forward- Fused LayerNorm + ternary matmul
§Feature Gate
Requires the cuda feature to be enabled:
[dependencies]
bitnet-quantize = { version = "0.1", features = ["cuda"] }Functions§
- cuda_
available - Check if CUDA kernels are available.