K-Quantization formats for GGUF/APR model weights (Toyota Way: ONE source of truth)
This crate provides quantization functions for converting F32 data to
K-quantization formats (Q4_K, Q5_K, Q6_K). This is the ONLY implementation
in the Sovereign AI Stack - aprender and realizar import from here.
Stack Architecture (Toyota Way)
┌─────────┐
│ apr CLI │
└────┬────┘
│
┌───────┼───────┬───────────┐
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌─────────┐
│entrenar│ │aprender│ │realizar │
└───┬────┘ └───┬────┘ └────┬────┘
│ │ │
└────┬─────┴───────────┴────┘
▼
┌────────────────┐
│ trueno-quant │ ← YOU ARE HERE
└───────┬────────┘
▼
┌────────────────┐
│ trueno │
└────────────────┘
Format Specifications
Q4_K: 256-element super-blocks, 144 bytes (4.5 bits/weight)Q5_K: 256-element super-blocks, 176 bytes (5.5 bits/weight)Q6_K: 256-element super-blocks, 210 bytes (6.5 bits/weight)
Usage
use ;
let data: = .map.collect;
let quantized = quantize_q4_k;
let restored = dequantize_q4_k_to_f32;