Expand description
Kernel implementations: scalar reference, AVX2 SIMD, and CUDA PTX.
Each submodule provides three variants of its kernel:
fn {name}_scalar(...)— Pure Rust scalar reference (ground truth)unsafe fn {name}_avx2(...)— AVX2 SIMD implementationfn {name}_ptx() -> &'static str— PTX assembly source string
Modules§
- absolute_
position - Absolute position embeddings kernel.
- activation
- Activation kernels:
ReLU,GELU,SiLU. - adamw
- AdamW optimizer kernel.
- alibi
- ALiBi (Attention with Linear Biases) kernel.
- attention
- Scaled dot-product attention kernel.
- batchnorm
- Batch normalization kernel.
- bias_
add - Bias addition kernel.
- cma_es
- CMA-ES sampling kernel.
- conv1d
- 1D Convolution kernel.
- cross_
entropy - Cross-entropy loss kernel with log-softmax.
- dropout
- Dropout kernel.
- embedding
- Embedding lookup kernel.
- f16_
convert - F16 (half-precision) conversion kernel.
- flash_
attention - Flash Attention: IO-aware tiled attention.
- gated_
delta_ net - Gated Delta Net recurrence kernel.
- gelu
- GELU kernel (standalone module).
- gqa
- Grouped Query Attention kernel.
- kmeans
- K-means clustering kernel.
- layernorm
- Layer normalization kernel.
- lbfgs
- L-BFGS two-loop recursion kernel.
- linear
- Linear projection kernel.
- matmul
- Matrix multiplication kernel.
- ops
- Shared kernel primitives: dot product, softmax row, score matrix.
- pagerank
- PageRank iteration kernel.
- rmsnorm
- RMSNorm kernel: root mean square layer normalization.
- rope
- Rotary Position Embedding (RoPE) kernel.
- sampling
- Sampling algorithms kernel.
- silu_
standalone - Standalone
SiLUkernel with explicit sigmoid. - softmax
- Softmax kernel: numerically stable exponential normalization.
- ssm
- State-Space Model (SSM) scan kernel.
- swiglu
- SwiGLU gated MLP kernel.
- tied_
embeddings - Tied embeddings kernel (language model head).
- transpose
- Matrix transpose kernel: out-of-place B = A^T with AVX2 8×8 micro-kernel.
- ulp
- ULP (Unit in the Last Place) distance utilities for floating-point comparison.
Enums§
- Backend
- Backend selector for kernel dispatch.