Expand description
§oxicuda-quant — GPU-Accelerated Quantization & Model Compression Engine
oxicuda-quant provides a comprehensive suite of post-training quantization
(PTQ), quantization-aware training (QAT), pruning, knowledge distillation,
and mixed-precision analysis tools.
§Feature overview
| Category | Highlights |
|---|---|
| Schemes | MinMax INT4/8, NF4 (QLoRA), FP8 E4M3/E5M2, GPTQ, SmoothQuant |
| QAT | MinMax / MovingAvg / Histogram observers, FakeQuantize (STE) |
| Pruning | Magnitude unstructured, channel / filter / head structured |
| Distillation | KL / MSE / cosine response + feature distillation |
| Analysis | Layer sensitivity, compression metrics, mixed-precision policy |
| GPU kernels | PTX kernels for fake-quant, INT8 quant/dequant, NF4, pruning |
§Quick start
let q = MinMaxQuantizer::int8_symmetric();
let data = vec![-1.0_f32, 0.0, 0.5, 1.0];
let params = q.calibrate(&data).unwrap();
let codes = q.quantize(&data, ¶ms).unwrap();
let deq = q.dequantize(&codes, ¶ms);Re-exports§
pub use error::QuantError;pub use error::QuantResult;