Skip to main content

Crate tritter_accel

Crate tritter_accel 

Source
Expand description

Rust acceleration for AI training and inference.

tritter-accel provides high-performance operations for both ternary (BitNet-style) and conventional neural network workloads. It serves as an acceleration layer that can be used from either Rust or Python.

§Architecture

The crate is organized into two main APIs:

  • Rust API (core module): Pure Rust interfaces for direct integration
  • Python API (PyO3 bindings): NumPy-compatible functions for Python users

§Rust Usage

use tritter_accel::core::{
    ternary::{PackedTernary, matmul},
    quantization::{quantize_absmean, QuantizeConfig},
    training::{GradientCompressor, TrainingConfig},
    inference::{InferenceEngine, InferenceConfig},
};
use candle_core::{Device, Tensor};

// Quantize weights to ternary
let device = Device::Cpu;
let weights = Tensor::randn(0f32, 1f32, (512, 512), &device)?;
let result = quantize_absmean(&weights, &QuantizeConfig::default())?;

// Create packed representation for efficient matmul
let packed = result.to_packed()?;

// Run ternary matmul
let input = Tensor::randn(0f32, 1f32, (1, 512), &device)?;
let output = matmul(&input, &packed, None)?;

// Compress gradients for distributed training
let compressor = GradientCompressor::new(TrainingConfig::default());
let gradients: Vec<f32> = vec![0.1, -0.2, 0.3, -0.4];
let compressed = compressor.compress(&gradients, Some(0.1))?;

§Python Usage

Build with maturin:

cd rust-ai/tritter-accel
maturin develop --release

Then in Python:

from tritter_accel import (
    pack_ternary_weights,
    unpack_ternary_weights,
    ternary_matmul,
    quantize_weights_absmean,
    compress_gradients_vsa,
)

# Pack weights for efficient storage
packed = pack_ternary_weights(ternary_weights, scales)

# Efficient matmul with packed weights
output = ternary_matmul(input, packed)

# Compress gradients for distributed training
compressed = compress_gradients_vsa(gradients, compression_ratio=0.1)

§Features

  • cuda: Enable GPU acceleration via CubeCL (requires CUDA toolkit)

§Modules

  • core: Pure Rust API for direct integration
  • bitnet: Re-exports from bitnet-quantize
  • ternary: Re-exports from trit-vsa
  • vsa: Re-exports from vsa-optim-rs

Re-exports§

pub use core::inference::InferenceConfig;
pub use core::inference::InferenceEngine;
pub use core::inference::TernaryLayer;
pub use core::quantization::quantize_absmean;
pub use core::quantization::quantize_absmax;
pub use core::quantization::QuantizationResult;
pub use core::quantization::QuantizeConfig;
pub use core::ternary::matmul as ternary_matmul_rust;
pub use core::ternary::PackedTernary;
pub use core::ternary::TernaryMatmulConfig;
pub use core::training::GradientCompressor;
pub use core::training::TrainingConfig;
pub use core::vsa::VsaConfig;
pub use core::vsa::VsaOps;

Modules§

bitnet
Re-exports from bitnet-quantize for direct access. BitNet integration module.
core
Core Rust API for tritter-accel.
ternary
Re-exports from trit-vsa for direct access. Ternary operations module.
vsa
Re-exports from vsa-optim-rs for direct access. VSA gradient compression module.