Struct QuantizationHardwareFeatures

Source

pub struct QuantizationHardwareFeatures {
    pub supports_int8_simd: bool,
    pub supports_int4_packed: bool,
    pub supports_vnni: bool,
    pub supports_dp4a: bool,
    pub supports_tensor_cores: bool,
    pub supports_mixed_precision: bool,
    pub max_parallel_ops: usize,
}

Expand description

Hardware-specific quantization features available on the current device

This struct encapsulates the hardware capabilities available for quantization operations, enabling the system to choose optimal implementations based on what the hardware supports.

Fields§

§supports_int8_simd: bool

Supports INT8 SIMD operations

Indicates whether the hardware can perform vectorized INT8 operations, which significantly accelerates quantized computations.

§supports_int4_packed: bool

Supports packed INT4 operations

Some hardware can efficiently handle sub-byte quantization formats like INT4, where multiple values are packed into single bytes.

§supports_vnni: bool

Supports VNNI (Vector Neural Network Instructions)

Intel’s VNNI instructions provide hardware acceleration for neural network workloads, particularly beneficial for quantized models.

§supports_dp4a: bool

Supports DP4A (4-element dot product and accumulate)

NVIDIA’s DP4A instruction performs 4-element dot products in a single operation, ideal for quantized matrix operations on CUDA devices.

§supports_tensor_cores: bool

Supports tensor core operations

Modern GPUs include specialized tensor cores for mixed-precision and quantized neural network computations.

§supports_mixed_precision: bool

Supports mixed precision operations

Hardware capability to efficiently mix different quantization precisions within the same computation.

§max_parallel_ops: usize

Maximum number of parallel operations

The optimal number of parallel operations for this hardware, used for scheduling and batching decisions.

Implementations§

Source §

impl QuantizationHardwareFeatures

Source

pub fn detect_for_device(device: &Device) -> Self

Detect hardware features for the given device

Performs runtime detection of available hardware acceleration features and returns a capabilities structure.

§Arguments

device - The target device to analyze

§Returns

A QuantizationHardwareFeatures struct with detected capabilities

Source

pub fn supports_dtype_efficiently(&self, dtype: &QuantizedDType) -> bool

Check if the hardware supports a specific quantization data type efficiently

§Arguments

dtype - The quantization data type to check

§Returns

true if the hardware can efficiently process this data type

Source

pub fn optimal_block_size(&self) -> usize

Get the optimal block size for parallel operations

Returns the recommended block size for batching operations based on hardware characteristics and parallelism capabilities.

Source

pub fn performance_ranking(&self) -> Vec<QuantizationScheme>

Get the performance preference ranking for quantization schemes

Returns quantization schemes ordered by expected performance on this hardware, with the fastest schemes first.

Trait Implementations§

Source §

impl Clone for QuantizationHardwareFeatures

Source §

fn clone(&self) -> QuantizationHardwareFeatures

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for QuantizationHardwareFeatures

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Default for QuantizationHardwareFeatures

Source §

fn default() -> Self

Conservative default hardware features

Returns a conservative set of capabilities that should work on any hardware without advanced acceleration features.

Auto Trait Implementations§

§

impl UnwindSafe for QuantizationHardwareFeatures

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §