Struct OnnxModel

Source

pub struct OnnxModel { /* private fields */ }

Expand description

An ONNX model loaded from a protobuf file.

Provides methods for inspecting, extracting weights, saving quantized models, and validating graph connectivity.

Implementations§

Source §

impl OnnxModel

Source

pub fn load(path: impl AsRef<Path>) -> Result<Self>

Load an ONNX model from a file path.

Reads the entire file into a Vec<u8> before decoding. For multi-gigabyte models consider load_mmap (requires the mmap feature) to avoid the extra heap buffer.

§Errors

Returns QuantizeError::ModelLoad if the file cannot be opened, is too large (>10 GB), or contains invalid protobuf data.

Source

pub fn from_bytes(bytes: &[u8]) -> Result<Self>

Decode an ONNX model directly from a byte slice.

Useful for in-memory or fuzzing scenarios where the source isn’t a filesystem path. The same 10 GB size cap that load applies to files is also enforced here so callers feeding bytes from untrusted sources (HTTP, IPC, fuzz harnesses) can’t OOM the decoder with a pathologically large input.

§Errors

Returns QuantizeError::ModelLoad if bytes exceeds 10 GB or cannot be decoded as a ModelProto.

Source

pub fn info(&self) -> ModelInfo

Return a summary of the model’s structure.

Source

pub fn input_shapes(&self) -> Vec<Vec<i64>>

Return the shapes of each graph input from the protobuf type info.

Each inner Vec<i64> contains the dimension values. Dynamic dims (symbolic or missing) are returned as -1. Returns one entry per graph.input that has tensor type information.

Source

pub fn count_non_fp32_weight_initializers(&self) -> usize

Number of weight-shaped initializers whose dtype is a non-FP32 floating-point type (FP16, BF16, or Double). Useful for the CLI to explain why extract_weights returned nothing on a model that visibly has data — most commonly an FP16-exported HuggingFace model.

Only float-family dtypes are counted; rank-≥2 INT64 tensors are usually shape constants for Reshape / Tile / Gather and would otherwise show up as “non-FP32 weights” in the error message, confusing users.

Source

pub fn count_external_data_initializers(&self) -> usize

Number of initializers whose tensor data lives in an external file (data_location == EXTERNAL), rather than inline in the protobuf.

quantize-rs reads only inline raw_data / float_data, so external-data tensors are skipped by extract_weights. The CLI and Python layers use this to turn an otherwise-confusing “no weight tensors found” into a precise diagnostic: ONNX exports above ~2 GB (large LLMs in particular) commonly store weights in a sidecar .onnx.data file, which must be inlined before quantization.

Source

pub fn extract_weights(&self) -> Vec<WeightTensor>

Extract the quantizable FP32 weight tensors from the model’s initializers.

Only rank-≥2 tensors are returned. Rank-0/1 initializers — biases, BatchNorm scale/B/mean/var, LayerNorm parameters, PRelu slopes — are not weights and must not be quantized: per-tensor INT8 on a BatchNorm running_var rounds near-zero variances to 0, and the 1/sqrt(var) in BatchNorm then explodes the activations (this broke MobileNetV2 outright, cosine ≈ 0.10). Genuine quantization targets are always rank ≥ 2 (Conv 4-D, MatMul/Gemm 2-D, embedding tables 2-D).

QDQ scale scaffolding is also excluded: a {base}_scale FP32 initializer that has a sibling {base}_quantized is a DequantizeLinear scale (not a weight), as is any _quantize_rs_-prefixed initializer the save path synthesizes. Without this, loading an already-quantized model and quantizing it again would quantize the scales and silently corrupt the dequantization.

Source

pub fn total_size_bytes(&self) -> usize

Total size of all weight tensors in bytes (float32).

Prefer computing this from already-extracted weights when available: weights.iter().map(|w| w.size_bytes()).sum() avoids reparsing.

Source §

impl OnnxModel

Source

pub fn save_quantized( &mut self, quantized_data: &[QdqWeightInput], path: impl AsRef<Path>, ) -> Result<()>

Save a quantized model using the QDQ (DequantizeLinear) pattern.

Signature is identical to v0.2.0 — existing callers (CLI, calibration pipeline, examples) compile without changes.

§What changed internally

v0.2.0 appended metadata to initializer names (e.g. conv1.weight → conv1.weight__qINT8_s0.001_z-3_len9408) without updating the nodes that reference them. ONNX Runtime rejected these models on load.

v0.3.0 inserts a DequantizeLinear node per weight. The node’s output carries the original name, so every downstream node is unchanged. Graph connectivity is preserved by construction, and the resulting model loads and runs in ONNX Runtime.

§INT4 storage note

DequantizeLinear requires INT8 input in opsets < 21. By default, INT4-quantized values ([-8, 7]) are widened to INT8 bytes — 4× compression from FP32. For true 8× compression, call save_quantized_with_options with SaveOptions::with_native_int4(true), which emits native INT4 initializers and bumps the opset to 21.

Source

pub fn save_quantized_with_options( &mut self, quantized_data: &[QdqWeightInput], path: impl AsRef<Path>, options: SaveOptions, ) -> Result<()>

Save a quantized model with explicit SaveOptions control.

See save_quantized for the transform details. Enabling SaveOptions::native_int4 for INT4 weights bumps the required opset to 21 automatically.

§Fields not preserved on save

quantize-rs models a subset of the ONNX schema, so re-encoding drops any section outside it: ModelProto.functions (local-function custom ops), GraphProto.sparse_initializer, ModelProto.training_info, and assorted metadata_props/doc_string on nodes and tensors. When a loaded model carried functions, sparse_initializer, or training_info, this method prints a warning to stderr. Models built from such sections (notably custom-op graphs) should be quantized with care — the dequantized weights are correct, but the saved graph will not contain those sections.

Source §

impl OnnxModel

Source

pub fn validate_connectivity(&self) -> ConnectivityReport

Check that every node input in the graph resolves to a known tensor.

A “known tensor” is one of:

a declared graph input
an initializer
the output of a node appearing earlier in the node list

This is the exact check ONNX Runtime performs on load. It’s the check that v0.2.0’s validate command skipped, which is why the rename bug went undetected. Integrate report.summary() into the CLI validate output alongside the existing structure / weight checks.

Source §

impl OnnxModel

Source

pub fn load_quantized_info(&self) -> Vec<QuantizedWeightInfo>

Extract metadata about quantized weights from a QDQ-format model.

Looks for initializer triples: {base}_quantized, {base}_scale, {base}_zp

Scale and zero-point are decoded in full — per-tensor yields a single element; per-channel yields one entry per channel. Bit-width comes from metadata_props (written by save_quantized); defaults to 8 if the metadata entry is missing.

Native INT4 zero-point tensors (DataType::Int4) are unpacked from their two-per-byte on-disk layout automatically.

Trait Implementations§

Source §

impl Debug for OnnxModel

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl UnwindSafe for OnnxModel

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> Downcast for T
where T: Any,

Source §

fn into_any(self: Box<T>) -> Box<dyn Any>

Convert Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>. Box<dyn Any> can then be further downcast into Box<ConcreteType> where ConcreteType implements Trait.

Source §

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Convert Rc<Trait> (where Trait: Downcast) to Rc<Any>. Rc<Any> can then be further downcast into Rc<ConcreteType> where ConcreteType implements Trait.

Source §

fn as_any(&self) -> &(dyn Any + 'static)

Convert &Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &Any’s vtable from &Trait’s.

Source §

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

Convert &mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &mut Any’s vtable from &mut Trait’s.

Source §

impl<T> DowncastSync for T
where T: Any + Send + Sync,

Source §

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Convert Arc<Trait> (where Trait: Downcast) to Arc<Any>. Arc<Any> can then be further downcast into Arc<ConcreteType> where ConcreteType implements Trait.

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

impl<T> Pointable for T

Source §

const ALIGN: usize

The alignment of pointer.

Source §

type Init = T

The type for initializers.

Source §

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

Source §

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

Source §

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

Source §

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Source §

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source §

Struct OnnxModel Copy item path

Implementations§

impl OnnxModel

pub fn load(path: impl AsRef<Path>) -> Result<Self>

§Errors

pub fn from_bytes(bytes: &[u8]) -> Result<Self>

§Errors

pub fn info(&self) -> ModelInfo

pub fn input_shapes(&self) -> Vec<Vec<i64>>

pub fn count_non_fp32_weight_initializers(&self) -> usize

pub fn count_external_data_initializers(&self) -> usize

pub fn extract_weights(&self) -> Vec<WeightTensor>

pub fn total_size_bytes(&self) -> usize

impl OnnxModel

pub fn save_quantized( &mut self, quantized_data: &[QdqWeightInput], path: impl AsRef<Path>, ) -> Result<()>

§What changed internally

§INT4 storage note

pub fn save_quantized_with_options( &mut self, quantized_data: &[QdqWeightInput], path: impl AsRef<Path>, options: SaveOptions, ) -> Result<()>

§Fields not preserved on save

impl OnnxModel

pub fn validate_connectivity(&self) -> ConnectivityReport

impl OnnxModel

pub fn load_quantized_info(&self) -> Vec<QuantizedWeightInfo>

Trait Implementations§

impl Debug for OnnxModel

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl Freeze for OnnxModel

impl RefUnwindSafe for OnnxModel

impl Send for OnnxModel

impl Sync for OnnxModel

impl Unpin for OnnxModel

impl UnsafeUnpin for OnnxModel

impl UnwindSafe for OnnxModel

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> Downcast for Twhere T: Any,

fn into_any(self: Box<T>) -> Box<dyn Any>

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

fn as_any(&self) -> &(dyn Any + 'static)

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

impl<T> DowncastSync for Twhere T: Any + Send + Sync,

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

Struct OnnxModel

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> Downcast for T
where T: Any,

impl<T> DowncastSync for T
where T: Any + Send + Sync,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,