Skip to main content

Module ops

Module ops 

Source
Expand description

GPU kernel host-side dispatch functions.

Each submodule implements dispatch for a specific kernel family.

Modulesยง

argmax
Greedy argmax GPU dispatch โ€” finds the index of the maximum value in a float array entirely on the GPU.
argsort
GPU-accelerated argsort (descending) for MoE top-K routing.
copy
GPU-accelerated strided copy for making tensors contiguous.
dense_gemm
Dense F16 matrix multiply for the lm_head vocabulary projection.
elementwise
GPU-accelerated elementwise operations: add, multiply, and dtype cast.
embedding
GPU-accelerated quantized embedding table lookup.
encode_helpers
Helper utilities for encoding compute dispatches with inline constant parameters (bytes) alongside buffer bindings.
flash_attn_vec
Flash attention vector kernel dispatch โ€” SIMD-vectorized decode-path SDPA.
flash_attn_vec_tq
Flash attention vector kernel dispatch for TurboQuant-compressed KV cache.
fused_head_norm_rope
Fused per-head RMS normalization + NeoX RoPE GPU dispatch (bf16).
fused_norm_add
Fused RMS normalization + residual addition GPU dispatch (bf16).
fused_residual_norm
Fused residual addition + RMS normalization GPU dispatch (bf16).
fwht_standalone
Standalone Fast Walsh-Hadamard Transform dispatch (SIMD shuffle, zero barriers).
gather
GPU-accelerated gather / index_select along dim=0.
gather_bench
Gather throughput microbenchmark dispatch.
gelu
GELU activation (pytorch_tanh variant) GPU dispatch.
hadamard
Fast Walsh-Hadamard Transform (FWHT) GPU kernel dispatch.
hadamard_quantize_kv
Hadamard-quantize KV cache kernel dispatch (ADR-007 Phase 1.1).
kv_cache_copy
KV cache GPU copy dispatch.
moe_dispatch
GPU-accelerated MoE expert dispatch (Stage 1: loop over selected experts).
moe_gate
GPU-accelerated MoE gating: parallel top-K expert selection with softmax routing.
quantized_matmul
Quantized matrix multiplication host-side dispatch.
quantized_matmul_ggml
GGML block-format quantized matrix-vector multiply dispatch.
quantized_matmul_id
Expert-routed (MoE) quantized matrix-vector multiply dispatch.
quantized_matmul_id_ggml
GGML block-format expert-routed (MoE) quantized matrix-vector multiply dispatch.
rms_norm
RMS Normalization GPU dispatch.
rope
Rotary Position Embedding (RoPE) GPU dispatch.
sdpa
Scaled dot-product attention (SDPA) host dispatch.
sdpa_sliding
Sliding-window scaled dot-product attention host dispatch.
softcap
Softcap (tanh-based logit capping) GPU dispatch.
softmax
Numerically stable softmax GPU dispatch.
softmax_sample
Temperature-scaled softmax + categorical sample, entirely on GPU.
top_k
GPU top-K dispatch โ€” returns the K largest elements of a float array.
transpose
GPU-accelerated 2D matrix transpose.