tritter-accel
Rust acceleration for AI training and inference, with both Rust and Python APIs.
Overview
tritter-accel provides high-performance operations for both ternary (BitNet-style) and conventional neural network workloads. It offers:
- Dual API: Both Rust and Python interfaces
- Ternary Operations: BitNet b1.58 quantization and inference
- VSA Gradient Compression: 10-100x compression for distributed training
- GPU Acceleration: Optional CUDA support via CubeCL
Features
| Feature | Description | Benefit |
|---|---|---|
| Ternary Quantization | AbsMean/AbsMax to {-1, 0, +1} | 16x memory reduction |
| Packed Storage | 2-bit per trit (4 values/byte) | Efficient storage |
| Ternary Matmul | Addition-only arithmetic | 2-4x speedup |
| VSA Operations | Bind/bundle/similarity | Hyperdimensional computing |
| Gradient Compression | Random projection | 10-100x compression |
| Mixed Precision | BF16 utilities | Training efficiency |
Installation
Rust
Add to your Cargo.toml:
[]
= "0.2"
# With GPU support
= { = "0.2", = ["cuda"] }
Python
Build with maturin:
# With CUDA support
Usage
Rust API
use ;
use ;
Python API
# Quantize float weights to ternary {-1, 0, +1}
=
, =
# Pack for efficient storage (16x compression)
, =
# Efficient matmul with packed weights
=
=
# VSA gradient compression for distributed training
=
, =
Module Structure
tritter_accel
├── core # Pure Rust API
│ ├── ternary # PackedTernary, matmul, dot
│ ├── quantization # quantize_absmean, quantize_absmax
│ ├── vsa # VsaOps (bind, bundle, similarity)
│ ├── training # GradientCompressor, mixed_precision
│ └── inference # InferenceEngine, TernaryLayer, KVCache
├── bitnet # Re-exports from bitnet-quantize
├── ternary # Re-exports from trit-vsa
└── vsa # Re-exports from vsa-optim-rs
API Reference
Python Functions
| Function | Description |
|---|---|
quantize_weights_absmean(weights) |
Quantize float weights to ternary using AbsMean scaling |
pack_ternary_weights(weights, scales) |
Pack ternary weights into 2-bit representation |
unpack_ternary_weights(packed, scales, shape) |
Unpack ternary weights to float |
ternary_matmul(input, packed, scales, shape) |
Matrix multiply with packed ternary weights |
compress_gradients_vsa(gradients, ratio, seed) |
Compress gradients using VSA |
decompress_gradients_vsa(compressed, dim, seed) |
Decompress gradients from VSA |
version() |
Get library version |
cuda_available_py() |
Check if CUDA is available |
Rust Types
| Type | Description |
|---|---|
PackedTernary |
Packed ternary weight storage with scales |
QuantizationResult |
Result of quantization with values, scales, shape |
VsaOps |
VSA operations handler with device dispatch |
GradientCompressor |
Gradient compression/decompression |
InferenceEngine |
Batched inference with device management |
TernaryLayer |
Pre-quantized layer for fast inference |
Performance
| Operation | vs FP32 | Memory |
|---|---|---|
| Ternary matmul (CPU) | 2x speedup | 16x reduction |
| Ternary matmul (GPU) | 4x speedup | 16x reduction |
| Weight packing | N/A | 16x reduction |
| VSA gradient compression | N/A | 10-100x reduction |
Run benchmarks:
Examples
See the examples/ directory:
basic_quantization.py- Weight quantization demoternary_inference.py- Inference with packed weightsgradient_compression.py- VSA gradient compressionvsa_operations.py- Hyperdimensional computingbenchmark_comparison.py- Performance comparisons
Dependencies
This crate delegates to specialized sister crates:
| Crate | Description |
|---|---|
| trit-vsa | Balanced ternary arithmetic & VSA |
| bitnet-quantize | BitNet b1.58 quantization |
| vsa-optim-rs | Gradient optimization |
| rust-ai-core | GPU dispatch & device management |
Feature Flags
| Feature | Description |
|---|---|
default |
CPU-only build |
cuda |
Enable CUDA GPU acceleration |
License
MIT License - see LICENSE-MIT
Contributing
Contributions welcome! Please read: