pub struct DynamicRangeQuantizePlan<TIn: Element, TOut: IntElement> { /* private fields */ }Expand description
dynamic_range_quantize plan.
Composes per-row max-abs reduction + scale computation + per-row quantize into a single fused kernel launch (the dynamic-range recipe IS the fused composition — running the reduce and quantize as one kernel avoids a scale-buffer round-trip).
The trailblazer kernel is symmetric per-token only. Future fanout adds asymmetric mode (requires xmin + xmax reductions) and the other three scopes (tensor / channel / group) by orchestrating existing primitives.
When to use: post-training activation quantization at inference — compute scale from runtime range and quantize in one launch. No BW (inference-only).
Dtypes (trailblazer): TIn ∈ {f32, f64}, TOut = S8.
f16 / bf16 activation, u8 output, and asymmetric mode
gated as Unsupported until follow-up milestones wire the
xmin/xmax reductions and offset-compute kernel.
Shape limits: rank-2 [N, D]; N ≤ 65535 (block-per-row
grid cap, lifts when row tiling lands); q_max > 0 (symmetric
divisor); q_max ≥ q_min.
Workspace: none — single-launch fused kernel.
Precision guarantee: deterministic, bit-stable. One block per row, no atomics; block-tree reduction is associative-stable on a single GPU.
Implementations§
Source§impl<TIn: Element, TOut: IntElement> DynamicRangeQuantizePlan<TIn, TOut>
impl<TIn: Element, TOut: IntElement> DynamicRangeQuantizePlan<TIn, TOut>
Sourcepub fn select(
_stream: &Stream,
desc: &DynamicRangeQuantizeDescriptor,
_pref: PlanPreference,
) -> Result<Self>
pub fn select( _stream: &Stream, desc: &DynamicRangeQuantizeDescriptor, _pref: PlanPreference, ) -> Result<Self>
Pick a kernel for desc.
Sourcepub fn can_implement(
&self,
args: &DynamicRangeQuantizeArgs<'_, TIn, TOut>,
) -> Result<()>
pub fn can_implement( &self, args: &DynamicRangeQuantizeArgs<'_, TIn, TOut>, ) -> Result<()>
Validate args at run time.
Sourcepub fn workspace_size(&self) -> usize
pub fn workspace_size(&self) -> usize
Workspace bytes — none. The kernel is single-launch.
Sourcepub fn precision_guarantee(&self) -> PrecisionGuarantee
pub fn precision_guarantee(&self) -> PrecisionGuarantee
Numerical guarantees.