Skip to main content

Module inference

Module inference 

Source
Expand description

Inference engine: run models, batch inference, ONNX loading, quantization.

Structs§

InferenceEngine
Inference engine wrapping a model and target device.
InferenceStats
Statistics from a forward pass.
OnnxLoader
ONNX model loader.

Enums§

Device
Compute device target.

Functions§

quantize_model
Simple weight quantization: clamp weights to int8 range then dequantize. This simulates the effect of lower-precision storage.