Module quantization

Module quantization 

Source
Expand description

Quantization-aware linear algebra operations Quantization-aware linear algebra operations

This module provides functions and types for working with quantized matrices and vectors. Quantization reduces the precision of floating-point numbers to save memory and computational resources, which is particularly useful in machine learning applications.

§Overview

  • Quantization of matrices and vectors to lower bit-width representations
  • Linear algebra operations on quantized data
  • Support for different quantization methods (uniform, symmetric, affine)
  • Efficient operations with mixed quantized and floating-point data

§Examples

Basic quantization:

use scirs2_core::ndarray::{Array2, array};
use scirs2_linalg::quantization::{quantize_matrix, dequantize_matrix, QuantizationMethod};

let a = array![[1.0_f32, 2.5, 3.7], [4.2, 5.0, 6.1]];

// Quantize to 8-bit
let (quantized, params) = quantize_matrix(&a.view(), 8, QuantizationMethod::Affine);

// Dequantize back to floating point
let a_dequantized = dequantize_matrix(&quantized, &params);

// Check the error exists but is bounded
let max_error = (&a - &a_dequantized).mapv(|x| x.abs()).fold(0.0_f32, |acc, &b| acc.max(b));
assert!(max_error > 0.0); // There should be some quantization error
assert!(max_error < 10.0); // But it should be bounded

Quantized matrix multiplication:

use scirs2_core::ndarray::{Array2, array};
use scirs2_linalg::quantization::{quantize_matrix, QuantizationMethod, quantized_matmul};

let a = array![[1.0_f32, 2.0], [3.0, 4.0]];
let b = array![[5.0_f32, 6.0], [7.0, 8.0]];

// Quantize both matrices to 8-bit
let (a_q, a_params) = quantize_matrix(&a.view(), 8, QuantizationMethod::Symmetric);
let (b_q, b_params) = quantize_matrix(&b.view(), 8, QuantizationMethod::Symmetric);

// Perform quantized matrix multiplication
let c_q = quantized_matmul(&a_q, &a_params, &b_q, &b_params).unwrap();

// Regular matrix multiplication for comparison
let c = a.dot(&b);

// Check the error is acceptable
let rel_error = (&c - &c_q).mapv(|x| x.abs()).sum() / c.sum();
assert!(rel_error < 0.1); // Relative error should be small

Re-exports§

pub use self::types::QuantizationMethod;
pub use self::types::QuantizationParams;
pub use self::types::QuantizedDataType;
pub use self::matrix::get_quantizedmatrix_2d_i8;
pub use self::matrix::QuantizedData2D;
pub use self::matrix::QuantizedMatrix;
pub use self::vector::get_quantized_vector_1d_i8;
pub use self::vector::QuantizedData1D;
pub use self::vector::QuantizedVector;
pub use self::conversions::dequantize_matrix;
pub use self::conversions::dequantize_vector_public as dequantize_vector;
pub use self::conversions::fake_quantize;
pub use self::conversions::fake_quantize_vector;
pub use self::conversions::quantize_matrix;
pub use self::conversions::quantize_matrix_per_channel;
pub use self::conversions::quantize_vector;
pub use self::operations::quantized_dot;
pub use self::operations::quantized_matmul;
pub use self::operations::quantized_matvec;
pub use calibration::*;
pub use calibration_ema::*;
pub use fusion::*;
pub use out_of_core::*;
pub use quantized_matrixfree::*;
pub use simd::*;
pub use solvers::*;
pub use stability::*;

Modules§

calibration
Calibration utilities for quantization
calibration_ema
conversions
Quantization conversion functions
fusion
Fusion of consecutive quantized operations
matrix
Matrix quantization types and implementations
operations
Quantized linear algebra operations
out_of_core
Out-of-core operations for quantized tensors
quantized_matrixfree
Matrix-free operations for quantized tensors
simd
SIMD-accelerated operations for quantized matrices
solvers
Specialized iterative solvers for quantized matrix representations
stability
Numerical stability analysis and validation for quantization operations
types
Quantization types and enums
vector
Vector quantization types and implementations