oxicuda_quant/lib.rs
1//! # oxicuda-quant — GPU-Accelerated Quantization & Model Compression Engine
2//!
3//! `oxicuda-quant` provides a comprehensive suite of post-training quantization
4//! (PTQ), quantization-aware training (QAT), pruning, knowledge distillation,
5//! and mixed-precision analysis tools.
6//!
7//! ## Feature overview
8//!
9//! | Category | Highlights |
10//! |-------------|-------------------------------------------------------------|
11//! | Schemes | MinMax INT4/8, NF4 (QLoRA), FP8 E4M3/E5M2, GPTQ, SmoothQuant |
12//! | QAT | MinMax / MovingAvg / Histogram observers, FakeQuantize (STE) |
13//! | Pruning | Magnitude unstructured, channel / filter / head structured |
14//! | Distillation| KL / MSE / cosine response + feature distillation |
15//! | Analysis | Layer sensitivity, compression metrics, mixed-precision policy |
16//! | GPU kernels | PTX kernels for fake-quant, INT8 quant/dequant, NF4, pruning |
17//!
18//! ## Quick start
19//!
20//! ```rust,no_run
21//! # use oxicuda_quant::scheme::minmax::{MinMaxQuantizer, QuantScheme, QuantGranularity};
22//! let q = MinMaxQuantizer::int8_symmetric();
23//! let data = vec![-1.0_f32, 0.0, 0.5, 1.0];
24//! let params = q.calibrate(&data).unwrap();
25//! let codes = q.quantize(&data, ¶ms).unwrap();
26//! let deq = q.dequantize(&codes, ¶ms);
27//! ```
28
29// ─── Lints ───────────────────────────────────────────────────────────────────
30
31#![allow(clippy::module_name_repetitions)]
32
33// ─── Modules ─────────────────────────────────────────────────────────────────
34
35pub mod analysis;
36pub mod distill;
37pub mod error;
38pub mod pruning;
39pub mod qat;
40pub mod scheme;
41
42/// PTX kernel source strings for GPU-side quantization operations.
43pub mod ptx_kernels;
44
45// ─── Top-level re-exports ────────────────────────────────────────────────────
46
47pub use error::{QuantError, QuantResult};