1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
// SPDX-License-Identifier: MIT
// Copyright 2026 Tyler Zervas
//! Ternary bitsliced operations for memory-efficient inference.
//!
//! This module implements single-trit {-1, 0, +1} bitsliced operations
//! based on TWN (Ternary Weight Networks) and HDC/VSA literature.
//!
//! ## Key Features
//!
//! - **Memory Efficiency**: 16-32x weight reduction via bitsliced representation
//! - **Popcount-Based Matmul**: Exact dot products via hardware popcount intrinsics
//! - **Sparsity Acceleration**: 95%+ zero weights enable plane/dim skipping
//!
//! ## Mathematical Foundation
//!
//! Ternary dot product via +plane/-plane bitsliced representation:
//!
//! ```text
//! dot(A, B) = popcount(A+ & B+) + popcount(A- & B-)
//! - popcount(A+ & B-) - popcount(A- & B+)
//! scaled_dot = dot * scale_a * scale_b
//! ```
//!
//! ## Module Structure
//!
//! - [`config`] - Configuration for ternary kernels (thresholds, tile sizes)
//! - [`types`] - Core tensor types (`TernaryTensor`, `TernaryPlanes`)
//! - [`quantize`] - FP → ternary quantization with scale calibration
//! - [`matmul`] - Bitsliced matrix multiplication kernel
//! - [`linear`] - Drop-in `TernaryLinear` layer
//! - [`attention`] - Ternary attention scoring with online softmax
//!
//! ## Usage
//!
//! ```rust,ignore
//! use unsloth_rs::kernels::ternary::{TernaryTensor, TernaryLinear, quantize_weights};
//!
//! // Quantize FP32 weights to ternary
//! let (ternary_weights, scale) = quantize_weights(&fp_weights, TernaryConfig::default())?;
//!
//! // Create ternary linear layer
//! let layer = TernaryLinear::new(ternary_weights, scale, bias)?;
//!
//! // Forward pass (FP16 activations, ternary weights)
//! let output = layer.forward(&activations)?;
//! ```
//!
//! ## Performance Targets
//!
//! - **Speedup**: ≥5x vs FP16 matmul on sparse pruned models
//! - **Memory**: ≥10x weight reduction (targeting 20-30x with sparsity)
//! - **Accuracy**: <2% perplexity degradation post-calibration
// TODO: Re-enable once CubeCL API compatibility is fixed
// #[cfg(feature = "cuda")]
// pub mod attention_cubecl;
// TODO: Re-enable once CubeCL API compatibility is fixed
// #[cfg(feature = "cuda")]
// pub mod matmul_cubecl;
pub use ;
pub use TernaryConfig;
pub use TernaryLinear;
pub use ;
pub use ;
pub use ;
pub use ;