Skip to main content

Crate bitnet_quantize

Crate bitnet_quantize 

Source
Expand description

Microsoft BitNet b1.58 quantization and inference for Rust.

This crate provides an implementation of BitNet, which uses:

  • Ternary weights: {-1, 0, +1} via AbsMean quantization
  • INT8 activations: Per-token AbsMax scaling

§Features

  • BitLinear: Drop-in replacement for nn::Linear
  • Efficient ternary weight storage via trit-vsa
  • Straight-Through Estimator (STE) for training
  • Optional peft-rs adapter integration
  • Optional GGUF export via qlora-rs

§Quick Start

use bitnet_quantize::{BitLinear, BitNetConfig};
use candle_core::{Device, Tensor};

let device = Device::Cpu;
let config = BitNetConfig::default();

// Create BitLinear from existing weights
let weight = Tensor::randn(0.0f32, 1.0, (512, 256), &device)?;
let layer = BitLinear::from_weight(&weight, None, &config)?;

// Forward pass
let input = Tensor::randn(0.0f32, 1.0, (4, 256), &device)?;
let output = layer.forward(&input)?;

println!("Compression ratio: {:.2}x", layer.compression_ratio());

§Quantization

§Weight Quantization (AbsMean)

Weights are quantized using the AbsMean method:

scale = mean(|W|)
W_q = round(W / scale) clamped to {-1, 0, +1}

§Activation Quantization (AbsMax)

Activations are quantized to INT8 using per-token AbsMax:

scale = max(|X|) / 127
X_q = round(X / scale) clamped to [-127, 127]

§Feature Flags

  • default: CPU-only
  • cuda: Enable CUDA GPU kernels
  • peft: Enable peft-rs adapter integration
  • gguf-export: Enable GGUF export via qlora-rs

§References

Re-exports§

pub use layer::BitLinear;
pub use quantization::dequantize_activations;
pub use quantization::dequantize_weights;
pub use quantization::quantize_activations;
pub use quantization::quantize_weights;
pub use quantization::QuantizedActivations;
pub use quantization::TernaryWeight;

Modules§

export
Export functionality for BitNet models.
kernels
GPU kernels for BitNet operations.
layer
Neural network layers for BitNet.
prelude
Prelude module for convenient imports.
quantization
Quantization modules for BitNet.

Structs§

BitNetAdapter
BitNet adapter for peft-rs integration.
BitNetAdapterConfig
BitNet adapter configuration for peft-rs integration.
BitNetConfig
Configuration for BitNet b1.58 quantization.

Enums§

BitNetError
Errors that can occur during BitNet operations.

Type Aliases§

Result
Result type alias for bitnet-rs operations.