rage-quant 0.1.0

High-performance quantized GEMV kernels for CPU-only LLM inference. Direct dot product on Q8_0/Q6_K/Q4_K GGUF blocks with AVX2+FMA SIMD — 3.0x decode speedup.
Documentation
RAGE QUANT — Quantized GEMV Kernels for CPU Inference
Copyright (c) 2026 Carlos Enrique Castro Lazaro (OnCeUponTry)

Original author and maintainer:
  Carlos Enrique Castro Lazaro
  GitHub: https://github.com/OnCeUponTry
  Website: https://www.angriestboy.com

This software contains original implementations of:
  - Direct quantized dot product on Q8_0, Q6_K, Q4_K GGUF blocks
  - AVX2+FMA SIMD acceleration for quantized GEMV
  - Rayon-parallelized GEMV and GEMM operations
  - Dequantization routines for GGML quantization formats

Achieving 3.0x decode speedup on CPU-only LLM inference.
Validated on Qwen3-0.6B-Q8_0.gguf, Ryzen 9 9900X, 12 threads.

This NOTICE file must be included in all copies or substantial
portions of this software, as required by the AGPL-3.0 license.