oxillama-quant
Quantization kernels for all GGUF quantization types used in LLM inference.
Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.
Status
Version: 0.1.3 — Tests: 389 passing
What's New in v0.1.3 (2026-05-05)
- AVX-512 IQ kernels — IQ2_XXS, IQ2_XS, IQ3_S, IQ4_XS with AVX-512BW (
_mm512_permutexvar_epi8); 2× throughput vs AVX2. Runtime-guarded viais_x86_feature_detected!("avx512bw"). 8 new tests. - Fused
matvec_q8for Q5_0 / Q5_1 / Q8_1 — single-pass dequant+dot in registers with no scratch allocation; AVX2 + NEON + scalar reference; 6 new tests (tol 1e-5 on 64×1024 GEMV). - Test count: 382 → 389 tests passing.
What's New in v0.1.2
- IQ3_S and IQ3_XXS codebook tables added in v0.1.2
What It Provides
- 25 quantization types: Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1, Q2_K, Q3_K_S/M/L, Q4_K_S/M, Q5_K_S/M, Q6_K, IQ1_S, IQ1_M, IQ2_XXS, IQ2_XS, IQ2_S, IQ2_M, IQ3_XXS, IQ3_S, IQ4_NL, IQ4_XS, Q1_0_G128
- SIMD-accelerated paths: AVX-512, AVX2 (27 kernels), ARM NEON (27 kernels), and portable scalar fallback; 11 AVX-512 kernels
- Fused dequant+GEMM for Q4_0 and Q4_K on AVX2 and NEON — single-pass matmul with no intermediate f32 scratch buffer
matvec_q8_fusedtrait method onQuantKernelwith scalar default impl and SIMD overridesoxiblasfloat GEMM fallback for F16/BF16/F32 tensors viaoxiblas::gemv_f32/f16- Parallel dequantization via
rayon(feature-gated) - Property-based test coverage for all kernels
Key Types
| Type | Description |
|---|---|
QuantKernel |
Trait implemented by every quantization format |
QuantTensor |
Owned buffer of quantized blocks + element type tag |
dispatch_kernel |
Runtime dispatch to the correct kernel for a GgmlType |
DequantizeSlice |
Zero-allocation dequantize into a caller-provided f32 slice |
Usage
use ;
use GgmlType;
Feature Flags
| Feature | Default | Description |
|---|---|---|
parallel |
yes | Enable Rayon-based parallel dequantization |
simd-avx2 |
no | AVX2 256-bit SIMD kernel path |
simd-avx512 |
no | AVX-512 512-bit SIMD kernel path |
simd-neon |
no | ARM NEON 128-bit SIMD kernel path |
License
Apache-2.0 — COOLJAPAN OU (Team Kitasan)