Module linear

Expand description

QuantizedLinear: an INT8 weight matrix with per-channel or per-tensor scale/zero_point.

§Design

Weights are stored as Array2<i8> with shape (out_features, in_features). The forward pass dequantizes the weight matrix on every call and then performs a standard f64 matmul. This is the CPU-first honest cut; integer-matmul (packed int8 GEMM) is a future follow-up.

For PerChannel granularity each row (output channel) has its own scale[c] and zero_point[c]. For PerTensor a single pair applies to all elements.

Structs§

QuantizedLinear: A weight matrix quantized to i8, with per-channel or per-tensor scale/zero_point.

Enums§

QuantizationError: Error type for quantization operations on linear layers.

Module linear

Module linear Copy item path

§Design

Structs§

Enums§

Module linear