Skip to main content

Module embedding_autograd

Module embedding_autograd 

Source
Expand description

FP32 embedding-table lookup with reverse-mode autograd backward.

Used by hf2q’s ADR-020 Track 1 multi-layer model on GpuTape (iter-11d).

Forward: output[b, h] = embedding[ids[b], h] Backward: d_embedding[id, h] = Σ_{b: ids[b] == id} dy[b, h]

The existing shaders/embedding.metal covers QUANTIZED 4-bit/6-bit lookup for inference; this module is the FP32-everywhere variant needed by the autograd tape.

The backward kernel is O(vocab × hidden × batch) — fine for the test fixtures (vocab ≤ a few hundred); production-scale performance (vocab=150k+) is a follow-up optimization (atomic float adds or sort-segment-sum).

Statics§

EMBEDDING_AUTOGRAD_SHADER_SOURCE

Functions§

dispatch_embedding_lookup_f32
Encode output[b, h] = embedding[ids[b], h].
dispatch_embedding_scatter_add_f32
Encode the embedding backward (scatter-add).
register