Skip to main content

Module row_sum

Module row_sum 

Source
Expand description

Per-row sum reduction along the last dimension of a 2-D tensor + its broadcast-along-cols backward.

Used by reverse-mode autograd in downstream crates (hf2q ADR-020 Track 1: KL-divergence loss composition needs Σ_j p · (log_p − log_q) per row).

Statics§

ROW_SUM_SHADER_SOURCE

Functions§

dispatch_row_sum_backward_f32
Encode dx[b, i] = d_out[b] (broadcast along the cols dim). This is the backward of dispatch_row_sum_f32.
dispatch_row_sum_f32
Encode output[b] = Σ_j input[b, j] for a 2-D [rows, cols] f32 input.
register