Skip to main content

Module kernels

Module kernels 

Source
Expand description

Kernel implementations: scalar reference, AVX2 SIMD, and CUDA PTX.

Each submodule provides three variants of its kernel:

  • fn {name}_scalar(...) — Pure Rust scalar reference (ground truth)
  • unsafe fn {name}_avx2(...) — AVX2 SIMD implementation
  • fn {name}_ptx() -> &'static str — PTX assembly source string

Modules§

absolute_position
Absolute position embeddings kernel.
activation
Activation kernels: ReLU, GELU, SiLU.
adamw
AdamW optimizer kernel.
alibi
ALiBi (Attention with Linear Biases) kernel.
attention
Scaled dot-product attention kernel.
batchnorm
Batch normalization kernel.
bias_add
Bias addition kernel.
cma_es
CMA-ES sampling kernel.
conv1d
1D Convolution kernel.
cross_entropy
Cross-entropy loss kernel with log-softmax.
dropout
Dropout kernel.
embedding
Embedding lookup kernel.
f16_convert
F16 (half-precision) conversion kernel.
flash_attention
Flash Attention: IO-aware tiled attention.
gated_delta_net
Gated Delta Net recurrence kernel.
gelu
GELU kernel (standalone module).
gqa
Grouped Query Attention kernel.
kmeans
K-means clustering kernel.
layernorm
Layer normalization kernel.
lbfgs
L-BFGS two-loop recursion kernel.
linear
Linear projection kernel.
matmul
Matrix multiplication kernel.
ops
Shared kernel primitives: dot product, softmax row, score matrix.
pagerank
PageRank iteration kernel.
rmsnorm
RMSNorm kernel: root mean square layer normalization.
rope
Rotary Position Embedding (RoPE) kernel.
sampling
Sampling algorithms kernel.
silu_standalone
Standalone SiLU kernel with explicit sigmoid.
softmax
Softmax kernel: numerically stable exponential normalization.
ssm
State-Space Model (SSM) scan kernel.
swiglu
SwiGLU gated MLP kernel.
tied_embeddings
Tied embeddings kernel (language model head).
transpose
Matrix transpose kernel: out-of-place B = A^T with AVX2 8×8 micro-kernel.
ulp
ULP (Unit in the Last Place) distance utilities for floating-point comparison.

Enums§

Backend
Backend selector for kernel dispatch.