Expand description
GPU-accelerated elementwise operations: add, multiply, and dtype cast.
These kernels are used for residual connections (add), scaling (multiply), and dtype conversion (cast) in the inference pipeline.
Enums§
- Cast
Direction - Cast direction for dtype conversion.
Functions§
- cast
- Encode a dtype cast operation.
- dispatch_
cast_ bf16_ to_ f32_ with_ encoder - Cast bf16 to f32 using an externally-provided encoder (no commit).
- dispatch_
cast_ f32_ to_ bf16_ with_ encoder - Cast f32 to bf16 using an externally-provided encoder (no commit).
- dispatch_
scalar_ mul_ bf16_ with_ encoder - Scale bf16 values by a scalar using an externally-provided encoder (no commit).
- elementwise_
add - Encode elementwise addition:
output = a + b. - elementwise_
mul - Encode elementwise multiplication:
output = a * b. - embedding_
gather_ scale_ batch_ f32 - Batched embedding gather + scale for prefill (f32).
- embedding_
gather_ scale_ f32 - Encode an embedding gather + scale:
output[i] = embed[token_id * hs + i] * scale. - scalar_
mul_ bf16 - Encode scalar multiplication:
output[i] = input[i] * scalar(bf16). - scalar_
mul_ f32 - Encode scalar multiplication:
output[i] = input[i] * scalar(f32).