pub fn quantized_matmul(
lhs: &QuantizedTensor,
rhs: &QuantizedTensor,
mode: QuantMode,
) -> Result<QuantizedTensor, ModelError>Expand description
Quantized matmul: dequantize -> f32 matmul -> re-quantize.
This is a naive implementation; a production path would use integer GEMM.
Integer-accumulating quantized matmul: C = A @ B in INT8 with INT32 accumulation.
Avoids dequantizing to f32 — computes directly in integer domain:
C_f32[i,j] = scale_a * scale_b * sum_k((A_i8[i,k] - zp_a) * (B_i8[k,j] - zp_b))
Then re-quantizes the result.