Expand description
Activation functions.
A dense layer computes a pre-activation value z = W x + b and then applies an
activation function element-wise: y = activation(z).
In this crate we cache the post-activation outputs y in Scratch. During
backprop we compute dL/dz from dL/dy using y (when possible). This keeps
the per-sample hot path allocation-free without needing a separate z buffer.
Enumsยง
- Activation
- Element-wise activation function.