tritter-accel

Rust acceleration library for Tritter - providing Python bindings for high-performance BitNet, ternary, and VSA operations.

Overview

tritter-accel provides Python bindings via PyO3 for the following capabilities:

BitNet Ternary Operations: Efficient ternary weight packing and matmul
VSA Gradient Compression: Vector Symbolic Architecture-based gradient compression
Packed Weight Storage: 2-bit per trit storage with SIMD acceleration

This crate is designed to be used as a Python extension module, built with maturin.

Installation

From Source (Development)

cd rust-ai/tritter-accel
pip install maturin
maturin develop --release

Building a Wheel

maturin build --release
pip install target/wheels/tritter_accel-*.whl

Usage

from tritter_accel import (
    pack_ternary_weights,
    unpack_ternary_weights,
    ternary_matmul,
    quantize_weights_absmean,
    compress_gradients_vsa,
    decompress_gradients_vsa,
)

# Quantize float weights to ternary {-1, 0, +1}
ternary_weights, scales = quantize_weights_absmean(float_weights)

# Pack for efficient storage (2 bits per trit)
packed, scales = pack_ternary_weights(ternary_weights, scales)

# Efficient matmul with packed weights
output = ternary_matmul(input, packed, scales, original_shape)

# VSA gradient compression for distributed training
compressed = compress_gradients_vsa(gradients, compression_ratio=0.1)
recovered = decompress_gradients_vsa(compressed, original_shape)

API Reference

Weight Quantization

Function	Description
`quantize_weights_absmean(weights)`	Quantize float weights to ternary using AbsMean scaling
`pack_ternary_weights(weights, scales)`	Pack ternary weights into 2-bit representation
`unpack_ternary_weights(packed, scales, shape)`	Unpack ternary weights to float

Ternary Operations

Function	Description
`ternary_matmul(input, packed, scales, shape)`	Matrix multiply with packed ternary weights
`ternary_matmul_simple(input, weights)`	Simple matmul with float ternary weights

VSA Gradient Compression

Function	Description
`compress_gradients_vsa(gradients, ratio, seed)`	Compress gradients using VSA
`decompress_gradients_vsa(compressed, shape, seed)`	Decompress gradients from VSA

Dependencies

This crate uses the following sister crates:

bitnet-quantize - BitNet b1.58 quantization
trit-vsa - Balanced ternary VSA operations
vsa-optim-rs - Gradient optimization

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Python Interface                      │
│              (tritter_accel module)                      │
├─────────────────────────────────────────────────────────┤
│  Exposed Functions                                      │
│  - pack_ternary_weights                                 │
│  - unpack_ternary_weights                               │
│  - ternary_matmul                                       │
│  - quantize_weights_absmean                             │
│  - compress_gradients_vsa                               │
│  - decompress_gradients_vsa                             │
├─────────────────────────────────────────────────────────┤
│                   Rust Implementation                    │
│  Core Crates: bitnet-quantize, trit-vsa, vsa-optim-rs    │
│  - AbsMean quant  |  PackedTritVec + VSA ops             │
│  - INT8 activs    |  Compression + Prediction            │
└─────────────────────────────────────────────────────────┘

Performance

Packing: 4x memory reduction (32-bit float → 2-bit trit)
Matmul: 2-4x speedup via addition-only arithmetic
VSA Compression: 10-100x gradient compression with <5% accuracy loss

License

MIT License - see LICENSE-MIT

Sister Crates

Crate	Description	crates.io
trit-vsa	Balanced ternary arithmetic
bitnet-quantize	BitNet b1.58 quantization
vsa-optim-rs	VSA gradient optimization
peft-rs	PEFT adapters
qlora-rs	QLoRA implementation
axolotl-rs	LLM fine-tuning toolkit

tritter-accel 0.1.2