Expand description
ADR-020 iter-11h-b — depthwise causal 1D convolution forward + backward kernels for the GpuTape autograd pipeline.
Distinct from ssm_conv (which fuses SiLU + handles autoregressive
decode state). This module is for TRAINING-MODE backward pass:
Forward shape contract:
x : [n_tokens, channels] row-major (f32)
kernel_w : [channels, K] row-major (f32)
y : [n_tokens, channels] row-major (f32)
Math (per output element (t, c)):
y[t, c] = Σ_{k=0..K-1, t+k-(K-1)>=0} kernel_w[c, k] · x[t+k-(K-1), c]
Zero-pad on the past: outputs at t < K-1 see fewer than K input
taps (the missing taps default to 0 — equivalent to “no prior
decode state”, which is the training-time invariant).