ternlang-ml

Ternary ML inference kernels for the Ternlang ecosystem.

What's in this crate

Component	Description
`TritFloat`	Ternary floating-point with a native confidence field — certainty propagates automatically through arithmetic
`TritFloatTensor`	N-dimensional TritFloat tensor with confidence-propagating matmul, @sparseskip, softmax, and conversions
`linear_confident()`	TritFloat activations × TritMatrix weights — inference hot path with combined weight+activation sparseskip
`quantize()`	Convert f32 weights → balanced ternary using BitNet-style threshold
`sparse_matmul()`	Matmul that skips zero-state (`tend`) weights — speedup scales with sparsity
`dense_matmul()`	Standard ternary matmul for comparison
`TritMatrix`	Core ternary matrix type
`TritScalar`	Scalar ternary decision with confidence and zone classification
`TritEvidenceVec`	Multi-dimensional weighted evidence aggregation
`DeliberationEngine`	EMA-based convergence loop — iterates evidence until a stable trit is reached
`action_gate()`	Multi-dimensional hard-block gate for safety-critical decisions
`MLP`	2-layer ternary multi-layer perceptron
`benchmark()`	Wall-clock timing across matrix sizes with sparsity reporting

Performance

Speedup from sparse_matmul over dense float32 scales proportionally with zero-weight fraction:

Sparsity	Speedup (vs dense f32)
~50–70% (typical BitNet)	2–4×
~90%	~10×
~99% (theoretical upper bound, native ternary ASIC)	up to 122×

x86/ARM measured baseline: 2.3× at typical distributions on binary ALU hardware (commit 60f7ef6). Run the reproducible benchmark: cargo run --release --bin sparseskip_throughput -p moe-llm-core

The 122× figure is the mathematical upper bound for native ternary ASIC silicon at 99%+ weight sparsity, where bit-masking overhead is eliminated at the hardware level. On x86/ARM (binary ALU emulation) the realized speedup is lower due to branch prediction and cache effects. Real measured x86 numbers are in the benchmark output.

Usage

use ternlang_ml::{quantize, bitnet_threshold, sparse_matmul, TritMatrix};

let weights = vec![0.8, -0.3, 0.05, -0.9, 0.1, 0.7];
let tau     = bitnet_threshold(&weights);          // BitNet b1.58 threshold
let trits   = quantize(&weights, tau);

let a = TritMatrix::from_f32(2, 3, &weights[..6], tau);
let b = TritMatrix::from_f32(3, 2, &weights[..6], tau);
let c = sparse_matmul(&a, &b);
println!("sparsity: {:.1}%", ternlang_ml::sparsity(&trits) * 100.0);

License

BSL-1.1 (converts to Apache-2.0 on 2030-04-03). See LICENSE.

ternlang-ml 1.5.0

ternlang-ml

What's in this crate

Performance

Usage

License