Skip to main content

quantized_matmul

Function quantized_matmul 

Source
pub fn quantized_matmul(
    lhs: &QuantizedTensor,
    rhs: &QuantizedTensor,
    mode: QuantMode,
) -> Result<QuantizedTensor, ModelError>
Expand description

Quantized matmul: dequantize -> f32 matmul -> re-quantize.

This is a naive implementation; a production path would use integer GEMM. Integer-accumulating quantized matmul: C = A @ B in INT8 with INT32 accumulation.

Avoids dequantizing to f32 — computes directly in integer domain: C_f32[i,j] = scale_a * scale_b * sum_k((A_i8[i,k] - zp_a) * (B_i8[k,j] - zp_b))

Then re-quantizes the result.