Skip to main content

Crate oxicuda_quant

Crate oxicuda_quant 

Source
Expand description

§oxicuda-quant — GPU-Accelerated Quantization & Model Compression Engine

oxicuda-quant provides a comprehensive suite of post-training quantization (PTQ), quantization-aware training (QAT), pruning, knowledge distillation, and mixed-precision analysis tools.

§Feature overview

CategoryHighlights
SchemesMinMax INT4/8, NF4 (QLoRA), FP8 E4M3/E5M2, GPTQ, SmoothQuant
QATMinMax / MovingAvg / Histogram observers, FakeQuantize (STE)
PruningMagnitude unstructured, channel / filter / head structured
DistillationKL / MSE / cosine response + feature distillation
AnalysisLayer sensitivity, compression metrics, mixed-precision policy
GPU kernelsPTX kernels for fake-quant, INT8 quant/dequant, NF4, pruning

§Quick start

let q = MinMaxQuantizer::int8_symmetric();
let data = vec![-1.0_f32, 0.0, 0.5, 1.0];
let params = q.calibrate(&data).unwrap();
let codes  = q.quantize(&data, &params).unwrap();
let deq    = q.dequantize(&codes, &params);

Re-exports§

pub use error::QuantError;
pub use error::QuantResult;

Modules§

analysis
Quantization Analysis Tools
distill
Knowledge Distillation
error
Error types for oxicuda-quant
pruning
Pruning
ptx_kernels
PTX kernel source strings for GPU-side quantization operations.
qat
Quantization-Aware Training (QAT)
scheme
Quantization Schemes