Skip to main content

Module dynamic_range

Module dynamic_range 

Source
Expand description

dynamic_range_quantize — compose op (Phase 8 Milestone 8.3).

Computes a (scale, zero_point) pair from the runtime dynamic range of the input — the canonical post-training-quantization-at-inference recipe — then quantizes. The Plan returns the quantized tensor AND the computed scale vector so the caller can later dequantize.

§Granularity

DynamicRangeMode::Symmetric vs DynamicRangeMode::Asymmetric:

  • Symmetric: max_abs = max |x| over the segment; scale = max_abs / qmax; zero_point = 0. Standard for activations.
  • Asymmetric (offset): xmin / xmax reduced separately; scale = (xmax - xmin) / (qmax - qmin); zp = qmin - round(xmin / scale).

§Scope

DynamicRangeScope::Token is the Phase 8.3 trailblazer — [N, D] input with one (scale, zp) pair per row, matching the LLM W8A8 activation-quantize recipe.

DynamicRangeScope::Tensor, DynamicRangeScope::Channel and DynamicRangeScope::Group are reserved scopes that return Error::Unsupported from DynamicRangeQuantizePlan::select today; they wire up in follow-up milestones by orchestrating the matching primitive ReducePlan + per-channel / per-group quantize plans (the per-tensor / per-channel / per-group quantize plans from 8.1 / 8.2 already exist).

§Dtype coverage (trailblazer)

TIn ∈ {f32, f64}, TOut = S8. f16 / bf16 activation, u8 output, and asymmetric mode are deferred.

Structs§

DynamicRangeQuantizeArgs
Args bundle for a dynamic_range_quantize launch.
DynamicRangeQuantizeDescriptor
Descriptor for a dynamic_range_quantize op.
DynamicRangeQuantizePlan
dynamic_range_quantize plan.

Enums§

DynamicRangeMode
Symmetric vs asymmetric (offset) dynamic-range quantization.
DynamicRangeScope
Per-tensor / per-channel / per-token / per-group granularity for the scale + zero_point computation.