Skip to main content

Module kernel_ops

Module kernel_ops 

Source
Expand description

Kernel backend abstraction layer for LLM-specific fused operations.

This module defines a mid-level abstraction between raw KernelExecutor (too low-level: grid/block sizes) and TensorOps (too high-level: no LLM-specific fused ops). It enables pluggable CUDA/Metal/CPU backends through six focused sub-traits composed into one umbrella KernelOps.

Structs§

AttentionParams
Parameters describing a single attention call.
KernelOpsDispatch
Dispatch wrapper that tries KernelOps first, then falls back to TensorOps for operations that have a TensorOps equivalent.
RoPEConfig
Rotary position embedding configuration.
SamplingParams
Sampling parameters for GPU-side token sampling.

Enums§

QuantScheme
Quantization scheme descriptor for quantized linear ops.

Traits§

ActivationOps
Activation function operations (including fused variants).
AttentionOps
Attention operations.
KernelOps
Unified kernel operations interface.
LinearOps
Linear / matrix-multiply operations.
NormOps
Normalization operations.
PositionOps
Positional encoding operations.
SamplingOps
Token sampling operations (GPU-side when possible).