Skip to main content

Module gru_model

Module gru_model 

Source
Expand description

GRU (Gated Recurrent Unit) byte-level predictor with truncated BPTT.

A byte-level neural predictor providing a DIFFERENT signal from the bit-level CM engine. The GRU captures cross-byte sequential patterns via a recurrent hidden state trained with backpropagation through time (BPTT).

Architecture: Input: one-hot byte embedding (256 → 32 via embedding matrix) GRU: 128 hidden cells, 1 layer Output: 128 → 256 linear → softmax → byte probabilities

Training: truncated BPTT-10. At each byte completion, gradients propagate back through the last 10 steps of GRU history. This is the same strategy used by cmix (which uses BPTT-100) and gives the majority of the gain at 10% of the BPTT-100 cost.

~43K parameters (~170KB at f32). History + gradient buffers: ~260KB.

CRITICAL: Encoder and decoder must maintain IDENTICAL GRU state. Both must call train(byte) then forward(byte) in the same order on the same bytes so that history buffers and weight updates are identical.

Structs§

GruModel
GRU byte-level predictor with BPTT-10 online training.