Expand description
Inference engine: run models, batch inference, ONNX loading, quantization.
Structs§
- Inference
Engine - Inference engine wrapping a model and target device.
- Inference
Stats - Statistics from a forward pass.
- Onnx
Loader - ONNX model loader.
Enums§
- Device
- Compute device target.
Functions§
- quantize_
model - Simple weight quantization: clamp weights to int8 range then dequantize. This simulates the effect of lower-precision storage.