pub struct QuantizePerTensorPlan<TIn: Element, TOut: IntElement> { /* private fields */ }Expand description
quantize_per_tensor forward plan.
q = clamp(round(x / scale) + zero_point, q_min, q_max). One scalar
scale (FP) and zero_point (i32) for the entire tensor (PyTorch
torch.quantize_per_tensor).
When to use: post-training quantization where one scale +
zero-point is used across the whole tensor. Pair with
QuantizePerTensorBackwardPlan
for STE autograd. Use QuantizePerChannelPlan
for weight quantization along output channels, or
QuantizePerTokenPlan for LLM
activations.
Dtypes: input FP {f32, f64, f16, bf16} × output int
{s8 ([-128, 127]), u8 ([0, 255])}. Sub-byte (s4 / u4)
deferred.
Shape limits: trailblazer flattens to 1-D [numel]. Caller
collapses multi-D inputs (per-tensor quant is axis-agnostic).
numel ≥ 0; q_max ≥ q_min.
Workspace: none.
Precision guarantee: deterministic, bit-stable. Round-ties-
even (__float2int_rn).
Implementations§
Source§impl<TIn: Element, TOut: IntElement> QuantizePerTensorPlan<TIn, TOut>
impl<TIn: Element, TOut: IntElement> QuantizePerTensorPlan<TIn, TOut>
Sourcepub fn select(
_stream: &Stream,
desc: &QuantizePerTensorDescriptor,
_pref: PlanPreference,
) -> Result<Self>
pub fn select( _stream: &Stream, desc: &QuantizePerTensorDescriptor, _pref: PlanPreference, ) -> Result<Self>
Pick a kernel for desc.
Sourcepub fn can_implement(
&self,
args: &QuantizePerTensorArgs<'_, TIn, TOut>,
) -> Result<()>
pub fn can_implement( &self, args: &QuantizePerTensorArgs<'_, TIn, TOut>, ) -> Result<()>
Validate args at run time.
Sourcepub fn workspace_size(&self) -> usize
pub fn workspace_size(&self) -> usize
Workspace bytes — none.
Sourcepub fn precision_guarantee(&self) -> PrecisionGuarantee
pub fn precision_guarantee(&self) -> PrecisionGuarantee
Numerical guarantees.