1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
//! Quantization Pipeline for RuvLTRA Models
//!
//! This module provides quantization capabilities for converting full-precision
//! models to optimized quantized formats suitable for edge inference on Apple Silicon.
//!
//! ## Supported Quantization Formats
//!
//! | Format | Bits | Memory (0.5B) | Quality | Use Case |
//! |--------|------|---------------|---------|----------|
//! | Q4_K_M | 4.5 | ~300 MB | Good | Best quality/size tradeoff |
//! | Q5_K_M | 5.5 | ~375 MB | Better | Higher quality, still compact |
//! | Q8_0 | 8.5 | ~500 MB | Best | Near-lossless quantization |
//! | PiQ3 | 3.0 | ~187 MB | Good | Ultra-low-bit with pi-scaling |
//!
//! ## Pi-Quantization (PiQ3)
//!
//! Pi-constant quantization uses irrational step sizes (pi/k) for better
//! information preservation at ultra-low bit-widths. Benefits include:
//! - Non-uniform grid aligned with Fourier transform properties
//! - Reduced quantization resonance (no rational harmonic buildup)
//! - ~5% lower MSE than uniform 3-bit quantization
//!
//! SIMD kernels provide high-performance dequantization:
//! - ARM NEON: >10 GB/s on Apple Silicon
//! - x86_64 AVX-512: >12 GB/s on Intel Ice Lake+ / AMD Zen4+
//! - x86_64 AVX2: >8 GB/s on modern Intel/AMD (fallback)
//!
//! ## Incoherence Processing (ADR-090 Phase 3)
//!
//! For ultra-low-bit quantization, this module provides incoherence transforms
//! using the Walsh-Hadamard algorithm. The incoherence transform spreads
//! outliers uniformly across all coefficients, reducing quantization error.
//!
//! Key property: H x H^T = n x I (orthogonal, self-inverse up to scaling)
//!
//! ```rust,ignore
//! use ruvllm::quantize::{IncoherenceTransform, IncoherenceConfig};
//!
//! // Apply incoherence before quantization
//! let mut transform = IncoherenceTransform::with_defaults()?;
//! let padded_dim = transform.apply_before_quantization(&mut weights)?;
//!
//! // ... quantize weights ...
//!
//! // Restore after dequantization
//! transform.restore_after_dequantization(&mut weights, Some(original_len))?;
//! ```
//!
//! ## Apple Neural Engine (ANE) Optimization
//!
//! The quantization pipeline produces weights optimized for ANE inference:
//! - 16-byte aligned weight layouts
//! - Blocked quantization compatible with ANE tile operations
//! - Optimized memory access patterns for M4 Pro's unified memory
//!
//! ## Example
//!
//! ```rust,ignore
//! use ruvllm::quantize::{RuvltraQuantizer, QuantConfig, TargetFormat};
//! use std::path::Path;
//!
//! // Create quantizer for Q4_K_M format
//! let config = QuantConfig::default()
//! .with_format(TargetFormat::Q4_K_M)
//! .with_ane_optimization(true);
//!
//! let quantizer = RuvltraQuantizer::new(config)?;
//!
//! // Quantize a model
//! quantizer.quantize_model(
//! Path::new("qwen-0.5b.safetensors"),
//! Path::new("ruvltra-small-q4.gguf"),
//! )?;
//! ```
pub use ;
// Pi-Quantization SIMD kernels
pub use ;
// Architecture-specific SIMD kernels (conditionally exported)
pub use pi_dequantize_neon;
pub use ;
// High-performance quantization (ADR-090 >1 GB/s target)
pub use ;
// Architecture-specific quantization kernels
pub use ;
pub use ;
// Hadamard transform (ADR-090 Phase 3)
pub use ;
// Incoherence transform (ADR-090 Phase 3)
pub use ;
// QuIP 2-bit quantization (ADR-090 Phase 3)
pub use ;
// TurboQuant data-oblivious compression (ICLR 2026)
pub use ;
// TurboQuant sidecar profile loading (ADR-129)
pub use ;