Crate axonml_quant

Expand description

Model quantization for AxonML — GGUF formats + BitNet I2_S ternary.

types (QuantType enum, block structs for Q8_0/Q4_0/Q4_1/Q5_0/Q5_1/F16), quantize (tensor/model quantization with RMSE error analysis), dequantize (block/tensor reconstruction to f32), bitnet (I2_S 1.58-bit ternary — 128-weight blocks, fused add-only matmul, int8 activation quantizer, AVX-VNNI dispatch scaffolded), calibration (MinMax/Percentile/ MeanStd/Entropy methods), inference (QuantizedLinear drop-in layer, QuantizedModel wrapper), error (QuantError/QuantResult).

§File

crates/axonml-quant/src/lib.rs

§Author

Andrew Jewell Sr. — AutomataNexus LLC ORCID: 0009-0005-2158-7060

§Updated

April 14, 2026 11:15 PM EST

§Disclaimer

Use at own risk. This software is provided “as is”, without warranty of any kind, express or implied. The author and AutomataNexus shall not be held liable for any damages arising from the use of this software.

Re-exports§

pub use calibration::CalibrationData;
pub use calibration::calibrate;
pub use dequantize::dequantize_block;
pub use dequantize::dequantize_tensor;
pub use error::QuantError;
pub use error::QuantResult;
pub use inference::QuantizedLinear;
pub use inference::QuantizedModel;
pub use inference::deserialize_quantized;
pub use inference::serialize_quantized;
pub use quantize::quantize_model;
pub use quantize::quantize_tensor;
pub use types::QuantType;
pub use types::QuantizedBlock;
pub use types::QuantizedTensor;

Modules§

bitnet: BitNet b1.58 I2_S Ternary Quantization — Dequant + Fused Add-Only Matmul
calibration: Calibration for Quantization
dequantize: Dequantization Functions
error: Quantization Error Types — Block, Shape, and Calibration Failures
inference: Quantized Inference — fast inference with quantized weights
quantize: Quantization Functions
types: Quantization Types

Constants§

DEFAULT_BLOCK_SIZE: Default block size for quantization.
MAX_BLOCK_SIZE: Maximum block size supported.