Expand description
GRU (Gated Recurrent Unit) byte-level predictor with truncated BPTT.
A byte-level neural predictor providing a DIFFERENT signal from the bit-level CM engine. The GRU captures cross-byte sequential patterns via a recurrent hidden state trained with backpropagation through time (BPTT).
Architecture: Input: one-hot byte embedding (256 → 32 via embedding matrix) GRU: 128 hidden cells, 1 layer Output: 128 → 256 linear → softmax → byte probabilities
Training: truncated BPTT-10. At each byte completion, gradients propagate back through the last 10 steps of GRU history. This is the same strategy used by cmix (which uses BPTT-100) and gives the majority of the gain at 10% of the BPTT-100 cost.
~43K parameters (~170KB at f32). History + gradient buffers: ~260KB.
CRITICAL: Encoder and decoder must maintain IDENTICAL GRU state. Both must call train(byte) then forward(byte) in the same order on the same bytes so that history buffers and weight updates are identical.
Structs§
- GruModel
- GRU byte-level predictor with BPTT-10 online training.