Module ops

Expand description

Shared kernel primitives: dot product, softmax row, score matrix.

These building blocks are used across attention, GQA, and flash attention kernels. Centralizing them eliminates duplicated DataTransformation patterns.

Functions§

dot: Dot product of two slices.
matmul_sv: Matrix multiply: output = scores * V, where scores is rows x cols and V is cols x d_v.
score_matrix: Compute scaled dot-product score matrix: scores[i,j] = Q[i] . K[j] / sqrt(d).
softmax_row: In-place softmax over a contiguous row.
softmax_rows: Apply softmax to each row of a rows x cols matrix (in-place).
weighted_accumulate: Weighted sum: output[i] += weight * v_row[i] for accumulation in attention.

Module ops

Module ops Copy item path

Functions§

Module ops