Skip to main content

Module ops

Module ops 

Source
Expand description

Shared kernel primitives: dot product, softmax row, score matrix.

These building blocks are used across attention, GQA, and flash attention kernels. Centralizing them eliminates duplicated DataTransformation patterns.

Functions§

dot
Dot product of two slices.
matmul_sv
Matrix multiply: output = scores * V, where scores is rows x cols and V is cols x d_v.
score_matrix
Compute scaled dot-product score matrix: scores[i,j] = Q[i] . K[j] / sqrt(d).
softmax_row
In-place softmax over a contiguous row.
softmax_rows
Apply softmax to each row of a rows x cols matrix (in-place).
weighted_accumulate
Weighted sum: output[i] += weight * v_row[i] for accumulation in attention.