#[non_exhaustive]pub struct SmoothQuantLinearDescriptor {
pub m: i32,
pub n: i32,
pub k: i32,
pub act_scale: f32,
pub activation_element: ElementKind,
pub weight_element: ElementKind,
pub output_element: ElementKind,
}Expand description
Descriptor for a SmoothQuant linear op.
The per-tensor activation scale lives in the descriptor (not the args) because in the SmoothQuant flow it’s part of the model’s frozen quantization metadata — it doesn’t change between launches for the same layer.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.m: i32Number of token rows in the activation (and rows of the output).
n: i32Number of output channels (rows of weight_q, cols of output).
k: i32Inner reduction dim (cols of act_q and weight_q).
act_scale: f32Per-tensor activation scale produced by the offline SmoothQuant
Python flow. Always f32 regardless of TIn — the underlying
quantized_linear_w8a8 kernel does the scale multiply in float
space irrespective of output dtype.
activation_element: ElementKindActivation int element kind. Today wired only for S8.
weight_element: ElementKindWeight int element kind. Today wired only for S8.
output_element: ElementKindOutput FP element kind. Must match TIn::KIND.
Implementations§
Trait Implementations§
Source§impl Clone for SmoothQuantLinearDescriptor
impl Clone for SmoothQuantLinearDescriptor
Source§fn clone(&self) -> SmoothQuantLinearDescriptor
fn clone(&self) -> SmoothQuantLinearDescriptor
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more