Skip to main content

Crate axonml_quant

Crate axonml_quant 

Source
Expand description

Model quantization for AxonML — GGUF formats + BitNet I2_S ternary.

types (QuantType enum, block structs for Q8_0/Q4_0/Q4_1/Q5_0/Q5_1/F16), quantize (tensor/model quantization with RMSE error analysis), dequantize (block/tensor reconstruction to f32), bitnet (I2_S 1.58-bit ternary — 128-weight blocks, fused add-only matmul, int8 activation quantizer, AVX-VNNI dispatch scaffolded), calibration (MinMax/Percentile/ MeanStd/Entropy methods), inference (QuantizedLinear drop-in layer, QuantizedModel wrapper), error (QuantError/QuantResult).

§File

crates/axonml-quant/src/lib.rs

§Author

Andrew Jewell Sr. — AutomataNexus LLC ORCID: 0009-0005-2158-7060

§Updated

April 14, 2026 11:15 PM EST

§Disclaimer

Use at own risk. This software is provided “as is”, without warranty of any kind, express or implied. The author and AutomataNexus shall not be held liable for any damages arising from the use of this software.

Re-exports§

pub use calibration::CalibrationData;
pub use calibration::calibrate;
pub use dequantize::dequantize_block;
pub use dequantize::dequantize_tensor;
pub use error::QuantError;
pub use error::QuantResult;
pub use inference::QuantizedLinear;
pub use inference::QuantizedModel;
pub use inference::deserialize_quantized;
pub use inference::serialize_quantized;
pub use quantize::quantize_model;
pub use quantize::quantize_tensor;
pub use types::QuantType;
pub use types::QuantizedBlock;
pub use types::QuantizedTensor;

Modules§

bitnet
BitNet b1.58 I2_S Ternary Quantization — Dequant + Fused Add-Only Matmul
calibration
Calibration for Quantization
dequantize
Dequantization Functions
error
Quantization Error Types — Block, Shape, and Calibration Failures
inference
Quantized Inference — fast inference with quantized weights
quantize
Quantization Functions
types
Quantization Types

Constants§

DEFAULT_BLOCK_SIZE
Default block size for quantization.
MAX_BLOCK_SIZE
Maximum block size supported.