Struct IDequantizeLayer

Source

pub struct IDequantizeLayer { /* private fields */ }

Expand description

IDequantizeLayer

A Dequantize layer in a network definition.

This layer accepts a quantized type input tensor, and uses the configured scale and zeroPt inputs to dequantize the input according to: output = ( input - zeroPt) * scale

The first input (index 0) is the tensor to be dequantized. The second (index 1) and third (index 2) are the scale and zero point respectively. scale and zeroPt should have identical dimensions, and a rank that is lower or equal to 2.

The zeroPt tensor is optional, and if not set, will be assumed to be zero. Its data type must be identical to the input’s data type. zeroPt must only contain zero-valued coefficients, because only symmetric quantization is supported. The scale value must be a scalar for per-tensor quantization, a 1D tensor for per-channel quantization, or the same rank as the input tensor for block quantization. All scale coefficients must have strictly positive values. The size of the 1D scale tensor must match the size of the quantization axis. For block quantization, the shape of scale tensor must match the shape of the input, except for one dimension (the last or second to last dimension) in which blocking occurs. The size of zeroPt must match the size of scale.

The subgraph which terminates with the zeroPt tensor must be a build-time constant containing only zeros. The output type, if constrained, must be constrained to DataType::kFLOAT, DataType::kHALF, or DataType::kBF16. The input type, if constrained, must be constrained to DataType::kINT8, DataType::kFP8, DataType::kINT4 or DataType::kFP4. The output size is the same as the input size. The quantization axis is in reference to the input tensor’s dimensions.

IDequantizeLayer supports DataType::kINT8 (default), DataType::kFP8, DataType::kINT4 or DataType::kFP4. For strongly typed networks, input data type must be the same as zeroPt data type.

IDequantizeLayer supports DataType::kFLOAT, DataType::kHALF, or DataType::kBF16 output. The output data type must be configured explicitly using setToType.

As an example of the operation of this layer, imagine a 4D NCHW activation input which can be quantized using a single scale coefficient (referred to as per-tensor quantization): For each n in N: For each c in C: For each h in H: For each w in W: output[n,c,h,w] = ( input[n,c,h,w] - zeroPt) * scale

Per-channel dequantization is supported only for input that is rooted at an IConstantLayer (i.e. weights). Activations cannot be quantized per-channel. As an example of per-channel operation, imagine a 4D KCRS weights input and K (dimension 0) as the quantization axis. The scale is an array of coefficients, which is the same size as the quantization axis. For each k in K: For each c in C: For each r in R: For each s in S: output[k,c,r,s] = ( input[k,c,r,s] - zeroPt[k]) * scale[k]

Block dequantization is supported for input types DataType::kFP4, DataType::kFP8 and DataType::kINT4. As an example of blocked operation, imagine a 2D RS input with R (dimension 0) as the blocking axis and B as the block size. The scale is a 2D array of coefficients, with dimensions (R//B, S). For each r in R: For each s in S: output[r,s] = ( input[r,s] - zeroPt[r//B, s]) * scale[r//B, s]

Only symmetric quantization is supported. Currently the only allowed build-time constant zeroPt subgraphs are:

Constant -> Quantize
Constant -> Cast -> Quantize

The input tensor for this layer must not be a scalar.

Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

IDequantizeLayer

Struct IDequantizeLayer Copy item path

Implementations§

impl IDequantizeLayer

pub fn getAxis(self: &IDequantizeLayer) -> i32

pub fn setAxis(self: Pin<&mut IDequantizeLayer>, axis: i32)

pub fn setBlockShape( self: Pin<&mut IDequantizeLayer>, blockShape: &Dims64, ) -> bool

pub fn getBlockShape(self: &IDequantizeLayer) -> Dims64

pub fn setToType(self: Pin<&mut IDequantizeLayer>, toType: DataType)

pub fn getToType(self: &IDequantizeLayer) -> DataType

Trait Implementations§

impl AsLayer for IDequantizeLayer

fn as_layer(&self) -> &ILayer

fn as_layer_pin_mut(&mut self) -> Pin<&mut ILayer>

impl AsLayerTyped for IDequantizeLayer

const TYPE: LayerType = LayerType::kDEQUANTIZE

impl AsRef<ILayer> for IDequantizeLayer

fn as_ref(self: &IDequantizeLayer) -> &ILayer

impl ExternType for IDequantizeLayer

type Id = (n, v, i, n, f, e, r, _1, (), I, D, e, q, u, a, n, t, i, z, e, L, a, y, e, r)

type Kind = Opaque

impl MakeCppStorage for IDequantizeLayer

unsafe fn allocate_uninitialized_cpp_storage() -> *mut IDequantizeLayer

unsafe fn free_uninitialized_cpp_storage(arg0: *mut IDequantizeLayer)

Auto Trait Implementations§

impl !Freeze for IDequantizeLayer

impl !RefUnwindSafe for IDequantizeLayer

impl !Send for IDequantizeLayer

impl !Sync for IDequantizeLayer

impl Unpin for IDequantizeLayer

impl UnsafeUnpin for IDequantizeLayer

impl UnwindSafe for IDequantizeLayer

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct IDequantizeLayer

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,