Skip to main content

Module kernels

bitnet_quantize

Module kernels

Expand description

GPU kernels for BitNet quantization operations.

This module provides CubeCL-based GPU kernels for efficient ternary weight x activation matrix multiplication.

§Kernels

absmean_quantize - Quantize weights to ternary {-1, 0, +1}
ternary_dequantize - Convert ternary back to float
ternary_matmul_gpu - Optimized ternary matmul (no multiply ops!)
packed_ternary_matmul - 2-bit packed weights for reduced bandwidth
bitlinear_forward - Fused LayerNorm + ternary matmul

§Feature Gate

Requires the cuda feature to be enabled:

[dependencies]
bitnet-quantize = { version = "0.1", features = ["cuda"] }

Functions§

cuda_available: Check if CUDA kernels are available.