Expand description
Shared kernel primitives: dot product, softmax row, score matrix.
These building blocks are used across attention, GQA, and flash attention kernels. Centralizing them eliminates duplicated DataTransformation patterns.
Functions§
- dot
- Dot product of two slices.
- matmul_
sv - Matrix multiply:
output = scores * V, where scores isrows x colsand V iscols x d_v. - score_
matrix - Compute scaled dot-product score matrix:
scores[i,j] = Q[i] . K[j] / sqrt(d). - softmax_
row - In-place softmax over a contiguous row.
- softmax_
rows - Apply softmax to each row of a
rows x colsmatrix (in-place). - weighted_
accumulate - Weighted sum:
output[i] += weight * v_row[i]for accumulation in attention.