Skip to main content

IDequantizeLayer

Struct IDequantizeLayer 

Source
pub struct IDequantizeLayer { /* private fields */ }
Expand description

IDequantizeLayer

A Dequantize layer in a network definition.

This layer accepts a quantized type input tensor, and uses the configured scale and zeroPt inputs to dequantize the input according to: output = ( input - zeroPt) * scale

The first input (index 0) is the tensor to be dequantized. The second (index 1) and third (index 2) are the scale and zero point respectively. scale and zeroPt should have identical dimensions, and a rank that is lower or equal to 2.

The zeroPt tensor is optional, and if not set, will be assumed to be zero. Its data type must be identical to the input’s data type. zeroPt must only contain zero-valued coefficients, because only symmetric quantization is supported. The scale value must be a scalar for per-tensor quantization, a 1D tensor for per-channel quantization, or the same rank as the input tensor for block quantization. All scale coefficients must have strictly positive values. The size of the 1D scale tensor must match the size of the quantization axis. For block quantization, the shape of scale tensor must match the shape of the input, except for one dimension (the last or second to last dimension) in which blocking occurs. The size of zeroPt must match the size of scale.

The subgraph which terminates with the zeroPt tensor must be a build-time constant containing only zeros. The output type, if constrained, must be constrained to DataType::kFLOAT, DataType::kHALF, or DataType::kBF16. The input type, if constrained, must be constrained to DataType::kINT8, DataType::kFP8, DataType::kINT4 or DataType::kFP4. The output size is the same as the input size. The quantization axis is in reference to the input tensor’s dimensions.

IDequantizeLayer supports DataType::kINT8 (default), DataType::kFP8, DataType::kINT4 or DataType::kFP4. For strongly typed networks, input data type must be the same as zeroPt data type.

IDequantizeLayer supports DataType::kFLOAT, DataType::kHALF, or DataType::kBF16 output. The output data type must be configured explicitly using setToType.

As an example of the operation of this layer, imagine a 4D NCHW activation input which can be quantized using a single scale coefficient (referred to as per-tensor quantization): For each n in N: For each c in C: For each h in H: For each w in W: output[n,c,h,w] = ( input[n,c,h,w] - zeroPt) * scale

Per-channel dequantization is supported only for input that is rooted at an IConstantLayer (i.e. weights). Activations cannot be quantized per-channel. As an example of per-channel operation, imagine a 4D KCRS weights input and K (dimension 0) as the quantization axis. The scale is an array of coefficients, which is the same size as the quantization axis. For each k in K: For each c in C: For each r in R: For each s in S: output[k,c,r,s] = ( input[k,c,r,s] - zeroPt[k]) * scale[k]

Block dequantization is supported for input types DataType::kFP4, DataType::kFP8 and DataType::kINT4. As an example of blocked operation, imagine a 2D RS input with R (dimension 0) as the blocking axis and B as the block size. The scale is a 2D array of coefficients, with dimensions (R//B, S). For each r in R: For each s in S: output[r,s] = ( input[r,s] - zeroPt[r//B, s]) * scale[r//B, s]

Only symmetric quantization is supported. Currently the only allowed build-time constant zeroPt subgraphs are:

  1. Constant -> Quantize
  2. Constant -> Cast -> Quantize

The input tensor for this layer must not be a scalar.

Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Implementations§

Source§

impl IDequantizeLayer

Source

pub fn getAxis(self: &IDequantizeLayer) -> i32

Get the quantization axis.

axis parameter set by setAxis(). The return value is the index of the quantization axis in the input tensor’s dimensions. A value of -1 indicates per-tensor quantization. The default value is -1.

Source

pub fn setAxis(self: Pin<&mut IDequantizeLayer>, axis: i32)

Set the quantization axis.

Set the index of the quantization axis (with reference to the input tensor’s dimensions). The axis must be a valid axis if the scale tensor has more than one coefficient. The axis value will be ignored if the scale tensor has exactly one coefficient (per-tensor quantization).

Source

pub fn setBlockShape( self: Pin<&mut IDequantizeLayer>, blockShape: &Dims64, ) -> bool

Set the shape of the quantization block.

  • blockShape The shape of the quantization block.

Set the shape of the quantization block. Allowed values are positive values and -1 which denotes a fully blocked dimension. Returns true if the block shape was set successfully, false if the block shape is invalid. The default value is empty Dims.

See [getBlockShape()]

Source

pub fn getBlockShape(self: &IDequantizeLayer) -> Dims64

Get the shape of the quantization block.

The default value is empty Dims. See [setBlockShape()]

Source

pub fn setToType(self: Pin<&mut IDequantizeLayer>, toType: DataType)

Set the Dequantize layer output type.

  • toType The DataType of the output tensor.

Set the output type of the dequantize layer. Valid values are DataType::kFLOAT, DataType::kHALF and DataType::kBF16. If the network is strongly typed, setToType must be used to set the output type, and use of setOutputType is an error. Otherwise, types passed to setOutputType and setToType must be the same.

See NetworkDefinitionCreationFlag::kSTRONGLY_TYPED

Source

pub fn getToType(self: &IDequantizeLayer) -> DataType

Return the Dequantize layer output type.

toType parameter set during layer creation or by setToType(). The return value is the output type of the quantize layer. The default value is DataType::kFLOAT.

Trait Implementations§

Source§

impl AsLayer for IDequantizeLayer

Source§

fn as_layer(&self) -> &ILayer

Source§

fn as_layer_pin_mut(&mut self) -> Pin<&mut ILayer>

Source§

impl AsLayerTyped for IDequantizeLayer

Source§

const TYPE: LayerType = LayerType::kDEQUANTIZE

Source§

impl AsRef<ILayer> for IDequantizeLayer

Source§

fn as_ref(self: &IDequantizeLayer) -> &ILayer

Converts this type into a shared reference of the (usually inferred) input type.
Source§

impl ExternType for IDequantizeLayer

Source§

type Id = (n, v, i, n, f, e, r, _1, (), I, D, e, q, u, a, n, t, i, z, e, L, a, y, e, r)

A type-level representation of the type’s C++ namespace and type name. Read more
Source§

type Kind = Opaque

Source§

impl MakeCppStorage for IDequantizeLayer

Source§

unsafe fn allocate_uninitialized_cpp_storage() -> *mut IDequantizeLayer

Allocates heap space for this type in C++ and return a pointer to that space, but do not initialize that space (i.e. do not yet call a constructor). Read more
Source§

unsafe fn free_uninitialized_cpp_storage(arg0: *mut IDequantizeLayer)

Frees a C++ allocation which has not yet had a constructor called. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.