1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
//! # oxicuda-quant — GPU-Accelerated Quantization & Model Compression Engine
//!
//! `oxicuda-quant` provides a comprehensive suite of post-training quantization
//! (PTQ), quantization-aware training (QAT), pruning, knowledge distillation,
//! and mixed-precision analysis tools.
//!
//! ## Feature overview
//!
//! | Category | Highlights |
//! |-------------|-------------------------------------------------------------|
//! | Schemes | MinMax INT4/8, NF4 (QLoRA), FP8 E4M3/E5M2, GPTQ, SmoothQuant |
//! | QAT | MinMax / MovingAvg / Histogram observers, FakeQuantize (STE) |
//! | Pruning | Magnitude unstructured, channel / filter / head structured |
//! | Distillation| KL / MSE / cosine response + feature distillation |
//! | Analysis | Layer sensitivity, compression metrics, mixed-precision policy |
//! | GPU kernels | PTX kernels for fake-quant, INT8 quant/dequant, NF4, pruning |
//!
//! ## Quick start
//!
//! ```rust,no_run
//! # use oxicuda_quant::scheme::minmax::{MinMaxQuantizer, QuantScheme, QuantGranularity};
//! let q = MinMaxQuantizer::int8_symmetric();
//! let data = vec![-1.0_f32, 0.0, 0.5, 1.0];
//! let params = q.calibrate(&data).unwrap();
//! let codes = q.quantize(&data, ¶ms).unwrap();
//! let deq = q.dequantize(&codes, ¶ms);
//! ```
// ─── Lints ───────────────────────────────────────────────────────────────────
// ─── Modules ─────────────────────────────────────────────────────────────────
/// PTX kernel source strings for GPU-side quantization operations.
// ─── Top-level re-exports ────────────────────────────────────────────────────
pub use ;