Expand description
GPU-accelerated quantized embedding table lookup.
Supports 4-bit and 6-bit quantized embedding tables, performing
on-the-fly dequantization during gather. The dequantization formula
is float_val = uint_val * scale + bias with bf16 scales and biases.
Structs§
- Embedding
Gather Params - Parameters for quantized embedding gather.
Functions§
- embedding_
gather - Encode a quantized embedding gather operation into the command buffer.