Expand description
Rust acceleration for AI training and inference.
tritter-accel provides high-performance operations for both ternary
(BitNet-style) and conventional neural network workloads. It serves as
an acceleration layer that can be used from either Rust or Python.
§Architecture
The crate is organized into two main APIs:
- Rust API (
coremodule): Pure Rust interfaces for direct integration - Python API (PyO3 bindings): NumPy-compatible functions for Python users
§Rust Usage
ⓘ
use tritter_accel::core::{
ternary::{PackedTernary, matmul},
quantization::{quantize_absmean, QuantizeConfig},
training::{GradientCompressor, TrainingConfig},
inference::{InferenceEngine, InferenceConfig},
};
use candle_core::{Device, Tensor};
// Quantize weights to ternary
let device = Device::Cpu;
let weights = Tensor::randn(0f32, 1f32, (512, 512), &device)?;
let result = quantize_absmean(&weights, &QuantizeConfig::default())?;
// Create packed representation for efficient matmul
let packed = result.to_packed()?;
// Run ternary matmul
let input = Tensor::randn(0f32, 1f32, (1, 512), &device)?;
let output = matmul(&input, &packed, None)?;
// Compress gradients for distributed training
let compressor = GradientCompressor::new(TrainingConfig::default());
let gradients: Vec<f32> = vec![0.1, -0.2, 0.3, -0.4];
let compressed = compressor.compress(&gradients, Some(0.1))?;§Python Usage
Build with maturin:
cd rust-ai/tritter-accel
maturin develop --releaseThen in Python:
from tritter_accel import (
pack_ternary_weights,
unpack_ternary_weights,
ternary_matmul,
quantize_weights_absmean,
compress_gradients_vsa,
)
# Pack weights for efficient storage
packed = pack_ternary_weights(ternary_weights, scales)
# Efficient matmul with packed weights
output = ternary_matmul(input, packed)
# Compress gradients for distributed training
compressed = compress_gradients_vsa(gradients, compression_ratio=0.1)§Features
cuda: Enable GPU acceleration via CubeCL (requires CUDA toolkit)
§Modules
Re-exports§
pub use core::inference::InferenceConfig;pub use core::inference::InferenceEngine;pub use core::inference::TernaryLayer;pub use core::quantization::quantize_absmean;pub use core::quantization::quantize_absmax;pub use core::quantization::QuantizationResult;pub use core::quantization::QuantizeConfig;pub use core::ternary::matmul as ternary_matmul_rust;pub use core::ternary::PackedTernary;pub use core::ternary::TernaryMatmulConfig;pub use core::training::GradientCompressor;pub use core::training::TrainingConfig;pub use core::vsa::VsaConfig;pub use core::vsa::VsaOps;