Expand description
GPU kernel host-side dispatch functions.
Each submodule implements dispatch for a specific kernel family.
Modulesยง
- argmax
- Greedy argmax GPU dispatch โ finds the index of the maximum value in a float array entirely on the GPU.
- argsort
- GPU-accelerated argsort (descending) for MoE top-K routing.
- copy
- GPU-accelerated strided copy for making tensors contiguous.
- dense_
gemm - Dense F16 matrix multiply for the lm_head vocabulary projection.
- elementwise
- GPU-accelerated elementwise operations: add, multiply, and dtype cast.
- embedding
- GPU-accelerated quantized embedding table lookup.
- encode_
helpers - Helper utilities for encoding compute dispatches with inline constant parameters (bytes) alongside buffer bindings.
- flash_
attn_ vec - Flash attention vector kernel dispatch โ SIMD-vectorized decode-path SDPA.
- flash_
attn_ vec_ tq - Flash attention vector kernel dispatch for TurboQuant-compressed KV cache.
- fused_
head_ norm_ rope - Fused per-head RMS normalization + NeoX RoPE GPU dispatch (bf16).
- fused_
norm_ add - Fused RMS normalization + residual addition GPU dispatch (bf16).
- fused_
residual_ norm - Fused residual addition + RMS normalization GPU dispatch (bf16).
- fwht_
standalone - Standalone Fast Walsh-Hadamard Transform dispatch (SIMD shuffle, zero barriers).
- gather
- GPU-accelerated gather / index_select along dim=0.
- gather_
bench - Gather throughput microbenchmark dispatch.
- gelu
- GELU activation (pytorch_tanh variant) GPU dispatch.
- hadamard
- Fast Walsh-Hadamard Transform (FWHT) GPU kernel dispatch.
- hadamard_
quantize_ kv - Hadamard-quantize KV cache kernel dispatch (ADR-007 Phase 1.1).
- kv_
cache_ copy - KV cache GPU copy dispatch.
- moe_
dispatch - GPU-accelerated MoE expert dispatch (Stage 1: loop over selected experts).
- moe_
gate - GPU-accelerated MoE gating: parallel top-K expert selection with softmax routing.
- quantized_
matmul - Quantized matrix multiplication host-side dispatch.
- quantized_
matmul_ ggml - GGML block-format quantized matrix-vector multiply dispatch.
- quantized_
matmul_ id - Expert-routed (MoE) quantized matrix-vector multiply dispatch.
- quantized_
matmul_ id_ ggml - GGML block-format expert-routed (MoE) quantized matrix-vector multiply dispatch.
- rms_
norm - RMS Normalization GPU dispatch.
- rope
- Rotary Position Embedding (RoPE) GPU dispatch.
- sdpa
- Scaled dot-product attention (SDPA) host dispatch.
- sdpa_
sliding - Sliding-window scaled dot-product attention host dispatch.
- softcap
- Softcap (tanh-based logit capping) GPU dispatch.
- softmax
- Numerically stable softmax GPU dispatch.
- softmax_
sample - Temperature-scaled softmax + categorical sample, entirely on GPU.
- top_k
- GPU top-K dispatch โ returns the K largest elements of a float array.
- transpose
- GPU-accelerated 2D matrix transpose.