pub struct OnnxModel { /* private fields */ }Expand description
An ONNX model loaded from a protobuf file.
Provides methods for inspecting, extracting weights, saving quantized models, and validating graph connectivity.
Implementations§
Source§impl OnnxModel
impl OnnxModel
Sourcepub fn load(path: impl AsRef<Path>) -> Result<Self>
pub fn load(path: impl AsRef<Path>) -> Result<Self>
Load an ONNX model from a file path.
§Errors
Returns QuantizeError::ModelLoad if the file cannot be opened,
is too large (>10 GB), or contains invalid protobuf data.
Sourcepub fn input_shapes(&self) -> Vec<Vec<i64>>
pub fn input_shapes(&self) -> Vec<Vec<i64>>
Return the shapes of each graph input from the protobuf type info.
Each inner Vec<i64> contains the dimension values. Dynamic dims
(symbolic or missing) are returned as -1. Returns one entry per
graph.input that has tensor type information.
Sourcepub fn extract_weights(&self) -> Vec<WeightTensor>
pub fn extract_weights(&self) -> Vec<WeightTensor>
Extract all FP32 weight tensors from the model’s initializers.
Sourcepub fn total_size_bytes(&self) -> usize
pub fn total_size_bytes(&self) -> usize
Total size of all weight tensors in bytes (float32).
Prefer computing this from already-extracted weights when available:
weights.iter().map(|w| w.size_bytes()).sum() avoids reparsing.
Source§impl OnnxModel
impl OnnxModel
Sourcepub fn save_quantized(
&mut self,
quantized_data: &[QdqWeightInput],
path: impl AsRef<Path>,
) -> Result<()>
pub fn save_quantized( &mut self, quantized_data: &[QdqWeightInput], path: impl AsRef<Path>, ) -> Result<()>
Save a quantized model using the QDQ (DequantizeLinear) pattern.
Signature is identical to v0.2.0 — existing callers (CLI, calibration pipeline, examples) compile without changes.
§What changed internally
v0.2.0 appended metadata to initializer names (e.g. conv1.weight →
conv1.weight__qINT8_s0.001_z-3_len9408) without updating the nodes that
reference them. ONNX Runtime rejected these models on load.
v0.3.0 inserts a DequantizeLinear node per weight. The node’s output
carries the original name, so every downstream node is unchanged.
Graph connectivity is preserved by construction, and the resulting model
loads and runs in ONNX Runtime.
§INT4 storage note
DequantizeLinear requires INT8 input (opset < 21). INT4-quantized values
([-8, 7]) are stored as INT8 bytes. Quantization accuracy is still
INT4-level; only the on-disk size is 4× instead of the 8× that bit-packing
would give. True INT4 packing is a v0.4.0 target.
Source§impl OnnxModel
impl OnnxModel
Sourcepub fn validate_connectivity(&self) -> ConnectivityReport
pub fn validate_connectivity(&self) -> ConnectivityReport
Check that every node input in the graph resolves to a known tensor.
A “known tensor” is one of:
- a declared graph input
- an initializer
- the output of a node appearing earlier in the node list
This is the exact check ONNX Runtime performs on load. It’s the check
that v0.2.0’s validate command skipped, which is why the rename bug
went undetected. Integrate report.summary() into the CLI validate
output alongside the existing structure / weight checks.
Source§impl OnnxModel
impl OnnxModel
Sourcepub fn load_quantized_info(&self) -> Vec<QuantizedWeightInfo>
pub fn load_quantized_info(&self) -> Vec<QuantizedWeightInfo>
Extract metadata about quantized weights from a QDQ-format model.
Looks for initializer triples:
{base}_quantized, {base}_scale, {base}_zp
Scale and zero-point values are read directly from the tensors.
Bit-width comes from metadata_props (written by save_quantized);
defaults to 8 if the metadata entry is missing.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for OnnxModel
impl RefUnwindSafe for OnnxModel
impl Send for OnnxModel
impl Sync for OnnxModel
impl Unpin for OnnxModel
impl UnsafeUnpin for OnnxModel
impl UnwindSafe for OnnxModel
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
Source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>. Box<dyn Any> can
then be further downcast into Box<ConcreteType> where ConcreteType implements Trait.Source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait> (where Trait: Downcast) to Rc<Any>. Rc<Any> can then be
further downcast into Rc<ConcreteType> where ConcreteType implements Trait.Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &Any’s vtable from &Trait’s.Source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &mut Any’s vtable from &mut Trait’s.Source§impl<T> DowncastSync for T
impl<T> DowncastSync for T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more