aprender-quant 0.29.0

K-quantization formats (Q4_K, Q5_K, Q6_K) for GGUF/APR model weights
Documentation
  • Coverage
  • 100%
    22 out of 22 items documented1 out of 3 items with examples
  • Size
  • Source code size: 45.31 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.12 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 39s Average build duration of successful builds.
  • all releases: 39s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • paiml/trueno
    25 3 28
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • noahgift

K-Quantization formats for GGUF/APR model weights (Toyota Way: ONE source of truth)

This crate provides quantization functions for converting F32 data to K-quantization formats (Q4_K, Q5_K, Q6_K). This is the ONLY implementation in the Sovereign AI Stack - aprender and realizar import from here.

Stack Architecture (Toyota Way)

       ┌─────────┐
       │ apr CLI │
       └────┬────┘
            │
    ┌───────┼───────┬───────────┐
    ▼       ▼       ▼           ▼
┌────────┐ ┌────────┐ ┌─────────┐
│entrenar│ │aprender│ │realizar │
└───┬────┘ └───┬────┘ └────┬────┘
    │          │           │
    └────┬─────┴───────────┴────┘
         ▼
      ┌────────────────┐
      │  trueno-quant  │  ← YOU ARE HERE
      └───────┬────────┘
              ▼
      ┌────────────────┐
      │     trueno     │
      └────────────────┘

Format Specifications

  • Q4_K: 256-element super-blocks, 144 bytes (4.5 bits/weight)
  • Q5_K: 256-element super-blocks, 176 bytes (5.5 bits/weight)
  • Q6_K: 256-element super-blocks, 210 bytes (6.5 bits/weight)

Usage

use trueno_quant::{quantize_q4_k, dequantize_q4_k_to_f32};

let data: Vec<f32> = (0..256).map(|i| i as f32 / 10.0).collect();
let quantized = quantize_q4_k(&data);
let restored = dequantize_q4_k_to_f32(&quantized, 256);