Module ops

Source

Expand description

GPU kernel host-side dispatch functions.

Each submodule implements dispatch for a specific kernel family.

Modules§

argmax: Greedy argmax GPU dispatch — finds the index of the maximum value in a float array entirely on the GPU.
argsort: GPU-accelerated argsort (descending) for MoE top-K routing.
copy: GPU-accelerated strided copy for making tensors contiguous.
dense_gemm: Dense F16 matrix multiply for the lm_head vocabulary projection.
elementwise: GPU-accelerated elementwise operations: add, multiply, and dtype cast.
embedding: GPU-accelerated quantized embedding table lookup.
encode_helpers: Helper utilities for encoding compute dispatches with inline constant parameters (bytes) alongside buffer bindings.
flash_attn_vec: Flash attention vector kernel dispatch — SIMD-vectorized decode-path SDPA.
flash_attn_vec_tq: Flash attention vector kernel dispatch for TurboQuant-compressed KV cache.
fused_head_norm_rope: Fused per-head RMS normalization + NeoX RoPE GPU dispatch (bf16).
fused_norm_add: Fused RMS normalization + residual addition GPU dispatch (bf16).
fused_residual_norm: Fused residual addition + RMS normalization GPU dispatch (bf16).
fwht_standalone: Standalone Fast Walsh-Hadamard Transform dispatch (SIMD shuffle, zero barriers).
gather: GPU-accelerated gather / index_select along dim=0.
gather_bench: Gather throughput microbenchmark dispatch.
gelu: GELU activation (pytorch_tanh variant) GPU dispatch.
hadamard: Fast Walsh-Hadamard Transform (FWHT) GPU kernel dispatch.
hadamard_quantize_kv: Hadamard-quantize KV cache kernel dispatch (ADR-007 Phase 1.1).
kv_cache_copy: KV cache GPU copy dispatch.
moe_dispatch: GPU-accelerated MoE expert dispatch (Stage 1: loop over selected experts).
moe_gate: GPU-accelerated MoE gating: parallel top-K expert selection with softmax routing.
quantized_matmul: Quantized matrix multiplication host-side dispatch.
quantized_matmul_ggml: GGML block-format quantized matrix-vector multiply dispatch.
quantized_matmul_id: Expert-routed (MoE) quantized matrix-vector multiply dispatch.
quantized_matmul_id_ggml: GGML block-format expert-routed (MoE) quantized matrix-vector multiply dispatch.
rms_norm: RMS Normalization GPU dispatch.
rope: Rotary Position Embedding (RoPE) GPU dispatch.
sdpa: Scaled dot-product attention (SDPA) host dispatch.
sdpa_sliding: Sliding-window scaled dot-product attention host dispatch.
softcap: Softcap (tanh-based logit capping) GPU dispatch.
softmax: Numerically stable softmax GPU dispatch.
softmax_sample: Temperature-scaled softmax + categorical sample, entirely on GPU.
top_k: GPU top-K dispatch — returns the K largest elements of a float array.
transpose: GPU-accelerated 2D matrix transpose.

Module ops

Module ops Copy item path

Modules§

Module ops