Expand description
Per-row sum reduction along the last dimension of a 2-D tensor + its broadcast-along-cols backward.
Used by reverse-mode autograd in downstream crates (hf2q ADR-020 Track 1: KL-divergence loss composition needs Σ_j p · (log_p − log_q) per row).
Statics§
Functions§
- dispatch_
row_ sum_ backward_ f32 - Encode
dx[b, i] = d_out[b](broadcast along the cols dim). This is the backward ofdispatch_row_sum_f32. - dispatch_
row_ sum_ f32 - Encode
output[b] = Σ_j input[b, j]for a 2-D[rows, cols]f32 input. - register