pub struct IDynamicQuantizeLayer { /* private fields */ }Expand description
IDynamicQuantizeLayer
A network layer to perform dynamic quantization.
This layer accepts a floating-point input tensor and computes the block scale factors needed to quantize the input’s data. It outputs the quantized tensor as its first output and the scale factors as its second output.
Use ILayer::setInput to add an input for the double-quantization scale factor.
Only symmetric quantization is supported. The input tensor for this layer must not be a scalar.
Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.
Implementations§
Source§impl IDynamicQuantizeLayer
impl IDynamicQuantizeLayer
Sourcepub fn setToType(self: Pin<&mut IDynamicQuantizeLayer>, toType: DataType)
pub fn setToType(self: Pin<&mut IDynamicQuantizeLayer>, toType: DataType)
Set DynamicQuantizeLayer’s quantized output type.
toTypeThe data type of the quantized output tensor.
Set the type of the dynamic quantization layer’s quantized output.If the network is strongly typed, setToType must be used to set the output type, and use of setOutputType is an error. Otherwise, types passed to setOutputType and setToType must be the same. Valid values for toType are DataType::kFP4 (NVFP4 quantization) and DataType::kFP8 (MXFP8 quantization).
Sourcepub fn getToType(self: &IDynamicQuantizeLayer) -> DataType
pub fn getToType(self: &IDynamicQuantizeLayer) -> DataType
Return DynamicQuantizeLayer’s quantized output type.
toType parameter set during layer creation or by setToType().
The return value is the type of the quantized output tensor. The default value is DataType::kFP4.
Sourcepub fn setScaleType(self: Pin<&mut IDynamicQuantizeLayer>, scaleType: DataType)
pub fn setScaleType(self: Pin<&mut IDynamicQuantizeLayer>, scaleType: DataType)
Set the data type of the scale factors used to quantize the data.
scaleTypeThe scale factors data type.
Set the scale-factors type. Valid values are DataType::kFP8, DataType::kE8M0 or DataType::kFLOAT.
Sourcepub fn getScaleType(self: &IDynamicQuantizeLayer) -> DataType
pub fn getScaleType(self: &IDynamicQuantizeLayer) -> DataType
Return the scale factors data type.
scaleType parameter set during layer creation or by setScaleType().
The return value is the type of the scale factors used to quantize the dynamic data. The default value is DataType::kFP8.
Sourcepub fn setAxis(self: Pin<&mut IDynamicQuantizeLayer>, axis: i32)
pub fn setAxis(self: Pin<&mut IDynamicQuantizeLayer>, axis: i32)
Set the axis along which block quantization occurs.
The axis must be the last dimension or second to last dimension. The input’s shape along the axis must be constant.
See [getAxis()]
Sourcepub fn getAxis(self: &IDynamicQuantizeLayer) -> i32
pub fn getAxis(self: &IDynamicQuantizeLayer) -> i32
Get the axis along which blocking occurs.
See [setAxis()]
Sourcepub fn setBlockSize(self: Pin<&mut IDynamicQuantizeLayer>, size: i32)
pub fn setBlockSize(self: Pin<&mut IDynamicQuantizeLayer>, size: i32)
Set the size of the quantization block.
Note: The block size must divide the input in the blocked axis without remainder. Valid values are 16 (NVFP4 quantization) and 32 (MXFP8 quantization).
See [getBlockSize()]
Sourcepub fn getBlockSize(self: &IDynamicQuantizeLayer) -> i32
pub fn getBlockSize(self: &IDynamicQuantizeLayer) -> i32
Get the size of the quantization block.
See [setBlockSize()]
Sourcepub fn setBlockShape(self: Pin<&mut IDynamicQuantizeLayer>, blockShape: &Dims64)
pub fn setBlockShape(self: Pin<&mut IDynamicQuantizeLayer>, blockShape: &Dims64)
Set the shape of the quantization block.
Note: The block shape rank must match the input rank. The default value is empty Dims.
See [getBlockShape()]
Sourcepub fn getBlockShape(self: &IDynamicQuantizeLayer) -> Dims64
pub fn getBlockShape(self: &IDynamicQuantizeLayer) -> Dims64
Get the shape of the quantization block.
The default value is empty Dims.
See [setBlockShape()]