Module dot_i8

Expand description

Int8 Dot Product Kernel

This module provides SIMD-accelerated int8 dot product computation for reranking candidates after the initial BPS scan.

§Algorithm

dot(Q, V) = Σ_{d=0}^{D-1} Q[d] × V[d]

For D=768 dimensions with i8 values in [-127, 127]:

max_product = 127 × 127 = 16,129
max_sum = 768 × 16,129 = 12,387,072 < 2^31 - 1 (i32 max)

Thus, i32 accumulation is sufficient.

Uses sign-extension to i16 followed by _mm256_madd_epi16:

Uses vmull_s8 to multiply 8 i8 pairs to i16, then vpadalq_s16 to widen and accumulate to i32.

dot_i8: Compute the dot product of two i8 vectors.
dot_i8_batch: Compute dot products for a batch of vectors with dequantization.
dot_i8_indexed: Compute dot products for indexed candidates.
l2_distance_i8: Compute squared L2 distance between two i8 vectors.