tritter-accel 0.2.1

Rust acceleration for Tritter - BitNet, ternary ops, VSA optimization
Documentation

tritter-accel

Rust acceleration for AI training and inference, with both Rust and Python APIs.

CI License: MIT

Overview

tritter-accel provides high-performance operations for both ternary (BitNet-style) and conventional neural network workloads. It offers:

  • Dual API: Both Rust and Python interfaces
  • Ternary Operations: BitNet b1.58 quantization and inference
  • VSA Gradient Compression: 10-100x compression for distributed training
  • GPU Acceleration: Optional CUDA support via CubeCL

Features

Feature Description Benefit
Ternary Quantization AbsMean/AbsMax to {-1, 0, +1} 16x memory reduction
Packed Storage 2-bit per trit (4 values/byte) Efficient storage
Ternary Matmul Addition-only arithmetic 2-4x speedup
VSA Operations Bind/bundle/similarity Hyperdimensional computing
Gradient Compression Random projection 10-100x compression
Mixed Precision BF16 utilities Training efficiency

Installation

Rust

Add to your Cargo.toml:

[dependencies]
tritter-accel = "0.2"

# With GPU support
tritter-accel = { version = "0.2", features = ["cuda"] }

Python

Build with maturin:

cd tritter-accel
pip install maturin numpy
maturin develop --release

# With CUDA support
maturin develop --release --features cuda

Usage

Rust API

use tritter_accel::core::{
    quantization::{quantize_absmean, QuantizeConfig},
    ternary::{PackedTernary, matmul},
    training::{GradientCompressor, TrainingConfig},
    vsa::{VsaOps, VsaConfig},
};
use candle_core::{Device, Tensor};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let device = Device::Cpu;

    // Quantize weights to ternary
    let weights = Tensor::randn(0f32, 1f32, (512, 512), &device)?;
    let result = quantize_absmean(&weights, &QuantizeConfig::default())?;
    let packed = result.to_packed()?;

    // Ternary matmul (no multiplications!)
    let input = Tensor::randn(0f32, 1f32, (1, 512), &device)?;
    let output = matmul(&input, &packed, None)?;

    // VSA hyperdimensional computing
    let ops = VsaOps::new(VsaConfig::default());
    let a = ops.random(10000, 42)?;
    let b = ops.random(10000, 43)?;
    let bound = ops.bind(&a, &b)?;

    // Compress gradients for distributed training
    let config = TrainingConfig::default().with_compression_ratio(0.1);
    let compressor = GradientCompressor::new(config);
    let gradients: Vec<f32> = vec![0.1, -0.2, 0.3];
    let compressed = compressor.compress(&gradients, None)?;

    Ok(())
}

Python API

import numpy as np
from tritter_accel import (
    quantize_weights_absmean,
    pack_ternary_weights,
    ternary_matmul,
    compress_gradients_vsa,
    decompress_gradients_vsa,
)

# Quantize float weights to ternary {-1, 0, +1}
weights = np.random.randn(512, 512).astype(np.float32)
ternary_weights, scales = quantize_weights_absmean(weights)

# Pack for efficient storage (16x compression)
packed, scales = pack_ternary_weights(ternary_weights, scales)

# Efficient matmul with packed weights
input_data = np.random.randn(4, 512).astype(np.float32)
output = ternary_matmul(input_data, packed, scales, (512, 512))

# VSA gradient compression for distributed training
gradients = np.random.randn(1000000).astype(np.float32)
compressed, seed = compress_gradients_vsa(gradients, 0.1, 42)
print(f"Compression: {len(gradients) / len(compressed):.1f}x")

Module Structure

tritter_accel
├── core                    # Pure Rust API
│   ├── ternary            # PackedTernary, matmul, dot
│   ├── quantization       # quantize_absmean, quantize_absmax
│   ├── vsa                # VsaOps (bind, bundle, similarity)
│   ├── training           # GradientCompressor, mixed_precision
│   └── inference          # InferenceEngine, TernaryLayer, KVCache
├── bitnet                  # Re-exports from bitnet-quantize
├── ternary                 # Re-exports from trit-vsa
└── vsa                     # Re-exports from vsa-optim-rs

API Reference

Python Functions

Function Description
quantize_weights_absmean(weights) Quantize float weights to ternary using AbsMean scaling
pack_ternary_weights(weights, scales) Pack ternary weights into 2-bit representation
unpack_ternary_weights(packed, scales, shape) Unpack ternary weights to float
ternary_matmul(input, packed, scales, shape) Matrix multiply with packed ternary weights
compress_gradients_vsa(gradients, ratio, seed) Compress gradients using VSA
decompress_gradients_vsa(compressed, dim, seed) Decompress gradients from VSA
version() Get library version
cuda_available_py() Check if CUDA is available

Rust Types

Type Description
PackedTernary Packed ternary weight storage with scales
QuantizationResult Result of quantization with values, scales, shape
VsaOps VSA operations handler with device dispatch
GradientCompressor Gradient compression/decompression
InferenceEngine Batched inference with device management
TernaryLayer Pre-quantized layer for fast inference

Performance

Operation vs FP32 Memory
Ternary matmul (CPU) 2x speedup 16x reduction
Ternary matmul (GPU) 4x speedup 16x reduction
Weight packing N/A 16x reduction
VSA gradient compression N/A 10-100x reduction

Run benchmarks:

cargo bench -p tritter-accel

Examples

See the examples/ directory:

  • basic_quantization.py - Weight quantization demo
  • ternary_inference.py - Inference with packed weights
  • gradient_compression.py - VSA gradient compression
  • vsa_operations.py - Hyperdimensional computing
  • benchmark_comparison.py - Performance comparisons

Dependencies

This crate delegates to specialized sister crates:

Crate Description
trit-vsa Balanced ternary arithmetic & VSA
bitnet-quantize BitNet b1.58 quantization
vsa-optim-rs Gradient optimization
rust-ai-core GPU dispatch & device management

Feature Flags

Feature Description
default CPU-only build
cuda Enable CUDA GPU acceleration

License

MIT License - see LICENSE-MIT

Contributing

Contributions welcome! Please read: