Expand description
Low-level builders for ONNX QDQ (Quantize-Dequantize) graph primitives.
Each quantized weight becomes four graph elements:
Initializers:
"{name}_quantized" — INT8 tensor, same shape as original
"{name}_scale" — FP32 scalar
"{name}_zp" — INT8 scalar
Node:
DequantizeLinear
inputs: ["{name}_quantized", "{name}_scale", "{name}_zp"]
outputs: ["{name}"] ← original name; downstream graph untouchedThe DequantizeLinear op runs at inference time:
output = (input - zero_point) × scale
which matches the dequantize formula already used in QuantParams and
QuantParamsInt4.
Structs§
- Dequant
Linear Names - Canonical names for the four graph elements that replace one FP32 initializer.
Functions§
- build_
dequantize_ linear_ node - Build a DequantizeLinear
NodeProto. - build_
quantized_ weight_ tensor - INT8 tensor holding the quantized weight values.
- build_
scale_ tensor - FP32 scale tensor.
- build_
zero_ point_ tensor - INT8 zero-point tensor.