Crate oxillama_quant

Expand description

§oxillama-quant

Quantization kernel library for OxiLLaMa.

Provides dequantization and fused matmul operations for all GGUF quantization formats. Each format has three implementation tiers:

Category	Types
Legacy	Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1
K-Quants	Q2_K, Q3_K, Q4_K, Q5_K, Q6_K
I-Quants	IQ1_S, IQ1_M, IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_S, IQ4_XS, IQ4_NL
1-Bit	Q1_0_G128 (from OxiBonsai)
Float	F16, BF16, F32

dispatch: Runtime kernel selection and dispatch.
error: Error types for quantization operations.
lora: LoRA (Low-Rank Adaptation) correction for quantized linear layers.
parallel: Parallel (multi-threaded) wrappers for quantized matrix operations.
quantize: Quantize-on-the-fly conversion utilities.
reference: Reference (naive) implementations of quantization kernels.
simd: Platform-specific SIMD quantization kernels.
traits: Core traits for quantization kernels.
types: Quantization data types and tensor wrapper.