Skip to main content

Module embedding

Module embedding 

Source
Expand description

GPU-accelerated quantized embedding table lookup.

Supports 4-bit and 6-bit quantized embedding tables, performing on-the-fly dequantization during gather. The dequantization formula is float_val = uint_val * scale + bias with bf16 scales and biases.

Structs§

EmbeddingGatherParams
Parameters for quantized embedding gather.

Functions§

embedding_gather
Encode a quantized embedding gather operation into the command buffer.