Expand description
Quantization-aware linear algebra operations Quantization-aware linear algebra operations
This module provides functions and types for working with quantized matrices and vectors. Quantization reduces the precision of floating-point numbers to save memory and computational resources, which is particularly useful in machine learning applications.
§Overview
- Quantization of matrices and vectors to lower bit-width representations
- Linear algebra operations on quantized data
- Support for different quantization methods (uniform, symmetric, affine)
- Efficient operations with mixed quantized and floating-point data
§Examples
Basic quantization:
use scirs2_core::ndarray::{Array2, array};
use scirs2_linalg::quantization::{quantize_matrix, dequantize_matrix, QuantizationMethod};
let a = array![[1.0_f32, 2.5, 3.7], [4.2, 5.0, 6.1]];
// Quantize to 8-bit
let (quantized, params) = quantize_matrix(&a.view(), 8, QuantizationMethod::Affine);
// Dequantize back to floating point
let a_dequantized = dequantize_matrix(&quantized, ¶ms);
// Check the error exists but is bounded
let max_error = (&a - &a_dequantized).mapv(|x| x.abs()).fold(0.0_f32, |acc, &b| acc.max(b));
assert!(max_error > 0.0); // There should be some quantization error
assert!(max_error < 10.0); // But it should be bounded
Quantized matrix multiplication:
use scirs2_core::ndarray::{Array2, array};
use scirs2_linalg::quantization::{quantize_matrix, QuantizationMethod, quantized_matmul};
let a = array![[1.0_f32, 2.0], [3.0, 4.0]];
let b = array![[5.0_f32, 6.0], [7.0, 8.0]];
// Quantize both matrices to 8-bit
let (a_q, a_params) = quantize_matrix(&a.view(), 8, QuantizationMethod::Symmetric);
let (b_q, b_params) = quantize_matrix(&b.view(), 8, QuantizationMethod::Symmetric);
// Perform quantized matrix multiplication
let c_q = quantized_matmul(&a_q, &a_params, &b_q, &b_params).unwrap();
// Regular matrix multiplication for comparison
let c = a.dot(&b);
// Check the error is acceptable
let rel_error = (&c - &c_q).mapv(|x| x.abs()).sum() / c.sum();
assert!(rel_error < 0.1); // Relative error should be small
Re-exports§
pub use self::types::QuantizationMethod;
pub use self::types::QuantizationParams;
pub use self::types::QuantizedDataType;
pub use self::matrix::get_quantizedmatrix_2d_i8;
pub use self::matrix::QuantizedData2D;
pub use self::matrix::QuantizedMatrix;
pub use self::vector::get_quantized_vector_1d_i8;
pub use self::vector::QuantizedData1D;
pub use self::vector::QuantizedVector;
pub use self::conversions::dequantize_matrix;
pub use self::conversions::dequantize_vector_public as dequantize_vector;
pub use self::conversions::fake_quantize;
pub use self::conversions::fake_quantize_vector;
pub use self::conversions::quantize_matrix;
pub use self::conversions::quantize_matrix_per_channel;
pub use self::conversions::quantize_vector;
pub use self::operations::quantized_dot;
pub use self::operations::quantized_matmul;
pub use self::operations::quantized_matvec;
pub use calibration::*;
pub use calibration_ema::*;
pub use fusion::*;
pub use out_of_core::*;
pub use quantized_matrixfree::*;
pub use simd::*;
pub use solvers::*;
pub use stability::*;
Modules§
- calibration
- Calibration utilities for quantization
- calibration_
ema - conversions
- Quantization conversion functions
- fusion
- Fusion of consecutive quantized operations
- matrix
- Matrix quantization types and implementations
- operations
- Quantized linear algebra operations
- out_
of_ core - Out-of-core operations for quantized tensors
- quantized_
matrixfree - Matrix-free operations for quantized tensors
- simd
- SIMD-accelerated operations for quantized matrices
- solvers
- Specialized iterative solvers for quantized matrix representations
- stability
- Numerical stability analysis and validation for quantization operations
- types
- Quantization types and enums
- vector
- Vector quantization types and implementations