Skip to main content

oxicuda_quant/
lib.rs

1//! # oxicuda-quant — GPU-Accelerated Quantization & Model Compression Engine
2//!
3//! `oxicuda-quant` provides a comprehensive suite of post-training quantization
4//! (PTQ), quantization-aware training (QAT), pruning, knowledge distillation,
5//! and mixed-precision analysis tools.
6//!
7//! ## Feature overview
8//!
9//! | Category    | Highlights                                                  |
10//! |-------------|-------------------------------------------------------------|
11//! | Schemes     | MinMax INT4/8, NF4 (QLoRA), FP8 E4M3/E5M2, GPTQ, SmoothQuant |
12//! | QAT         | MinMax / MovingAvg / Histogram observers, FakeQuantize (STE) |
13//! | Pruning     | Magnitude unstructured, channel / filter / head structured  |
14//! | Distillation| KL / MSE / cosine response + feature distillation           |
15//! | Analysis    | Layer sensitivity, compression metrics, mixed-precision policy |
16//! | GPU kernels | PTX kernels for fake-quant, INT8 quant/dequant, NF4, pruning |
17//!
18//! ## Quick start
19//!
20//! ```rust,no_run
21//! # use oxicuda_quant::scheme::minmax::{MinMaxQuantizer, QuantScheme, QuantGranularity};
22//! let q = MinMaxQuantizer::int8_symmetric();
23//! let data = vec![-1.0_f32, 0.0, 0.5, 1.0];
24//! let params = q.calibrate(&data).unwrap();
25//! let codes  = q.quantize(&data, &params).unwrap();
26//! let deq    = q.dequantize(&codes, &params);
27//! ```
28
29// ─── Lints ───────────────────────────────────────────────────────────────────
30
31#![allow(clippy::module_name_repetitions)]
32
33// ─── Modules ─────────────────────────────────────────────────────────────────
34
35pub mod analysis;
36pub mod distill;
37pub mod error;
38pub mod pruning;
39pub mod qat;
40pub mod scheme;
41
42/// PTX kernel source strings for GPU-side quantization operations.
43pub mod ptx_kernels;
44
45// ─── Top-level re-exports ────────────────────────────────────────────────────
46
47pub use error::{QuantError, QuantResult};