Expand description
dynamic_range_quantize — compose op (Phase 8 Milestone 8.3).
Computes a (scale, zero_point) pair from the runtime dynamic range
of the input — the canonical post-training-quantization-at-inference
recipe — then quantizes. The Plan returns the quantized tensor AND
the computed scale vector so the caller can later dequantize.
§Granularity
DynamicRangeMode::Symmetric vs DynamicRangeMode::Asymmetric:
- Symmetric:
max_abs = max |x|over the segment;scale = max_abs / qmax;zero_point = 0. Standard for activations. - Asymmetric (offset):
xmin / xmaxreduced separately;scale = (xmax - xmin) / (qmax - qmin);zp = qmin - round(xmin / scale).
§Scope
DynamicRangeScope::Token is the Phase 8.3 trailblazer
— [N, D] input with one (scale, zp) pair per row, matching the
LLM W8A8 activation-quantize recipe.
DynamicRangeScope::Tensor, DynamicRangeScope::Channel and
DynamicRangeScope::Group are reserved scopes that return
Error::Unsupported from DynamicRangeQuantizePlan::select
today; they wire up in follow-up milestones by orchestrating the
matching primitive ReducePlan + per-channel /
per-group quantize plans (the per-tensor / per-channel / per-group
quantize plans from 8.1 / 8.2 already exist).
§Dtype coverage (trailblazer)
TIn ∈ {f32, f64}, TOut = S8. f16 / bf16 activation, u8
output, and asymmetric mode are deferred.
Structs§
- Dynamic
Range Quantize Args - Args bundle for a
dynamic_range_quantizelaunch. - Dynamic
Range Quantize Descriptor - Descriptor for a
dynamic_range_quantizeop. - Dynamic
Range Quantize Plan dynamic_range_quantizeplan.
Enums§
- Dynamic
Range Mode - Symmetric vs asymmetric (offset) dynamic-range quantization.
- Dynamic
Range Scope - Per-tensor / per-channel / per-token / per-group granularity for the scale + zero_point computation.