pub fn quantize_activations(
activations: &Tensor,
config: &BitNetConfig,
) -> Result<QuantizedActivations>Expand description
Quantize activations using per-token AbsMax scaling to INT8.
§Algorithm
For each token (row):
- Compute
scale = max(|X|) / 127 - Compute
X_q = round(X / scale)clamped to [-127, 127]
§Arguments
activations- Input tensor [batch, seq_len, hidden_dim] or [batch, hidden_dim]config- BitNet configuration
§Errors
Returns error if quantization fails.