Expand description
Microsoft BitNet b1.58 quantization and inference for Rust.
This crate provides an implementation of BitNet, which uses:
- Ternary weights: {-1, 0, +1} via AbsMean quantization
- INT8 activations: Per-token AbsMax scaling
§Features
BitLinear: Drop-in replacement fornn::Linear- Efficient ternary weight storage via
trit-vsa - Straight-Through Estimator (STE) for training
- Optional peft-rs adapter integration
- Optional GGUF export via qlora-rs
§Quick Start
ⓘ
use bitnet_quantize::{BitLinear, BitNetConfig};
use candle_core::{Device, Tensor};
let device = Device::Cpu;
let config = BitNetConfig::default();
// Create BitLinear from existing weights
let weight = Tensor::randn(0.0f32, 1.0, (512, 256), &device)?;
let layer = BitLinear::from_weight(&weight, None, &config)?;
// Forward pass
let input = Tensor::randn(0.0f32, 1.0, (4, 256), &device)?;
let output = layer.forward(&input)?;
println!("Compression ratio: {:.2}x", layer.compression_ratio());§Quantization
§Weight Quantization (AbsMean)
Weights are quantized using the AbsMean method:
scale = mean(|W|)
W_q = round(W / scale) clamped to {-1, 0, +1}§Activation Quantization (AbsMax)
Activations are quantized to INT8 using per-token AbsMax:
scale = max(|X|) / 127
X_q = round(X / scale) clamped to [-127, 127]§Feature Flags
default: CPU-onlycuda: Enable CUDA GPU kernelspeft: Enable peft-rs adapter integrationgguf-export: Enable GGUF export via qlora-rs
§References
- “The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits” https://arxiv.org/abs/2402.17764
Re-exports§
pub use layer::BitLinear;pub use quantization::dequantize_activations;pub use quantization::dequantize_weights;pub use quantization::quantize_activations;pub use quantization::quantize_weights;pub use quantization::QuantizedActivations;pub use quantization::TernaryWeight;
Modules§
- export
- Export functionality for BitNet models.
- kernels
- GPU kernels for BitNet operations.
- layer
- Neural network layers for BitNet.
- prelude
- Prelude module for convenient imports.
- quantization
- Quantization modules for BitNet.
Structs§
- BitNet
Adapter - BitNet adapter for peft-rs integration.
- BitNet
Adapter Config - BitNet adapter configuration for peft-rs integration.
- BitNet
Config - Configuration for BitNet b1.58 quantization.
Enums§
- BitNet
Error - Errors that can occur during BitNet operations.
Type Aliases§
- Result
- Result type alias for bitnet-rs operations.