Module kernel_ops

Expand description

Kernel backend abstraction layer for LLM-specific fused operations.

This module defines a mid-level abstraction between raw KernelExecutor (too low-level: grid/block sizes) and TensorOps (too high-level: no LLM-specific fused ops). It enables pluggable CUDA/Metal/CPU backends through six focused sub-traits composed into one umbrella KernelOps.

Structs§

AttentionParams: Parameters describing a single attention call.
KernelOpsDispatch: Dispatch wrapper that tries KernelOps first, then falls back to TensorOps for operations that have a TensorOps equivalent.
RoPEConfig: Rotary position embedding configuration.
SamplingParams: Sampling parameters for GPU-side token sampling.

Enums§

QuantScheme: Quantization scheme descriptor for quantized linear ops.

Traits§

ActivationOps: Activation function operations (including fused variants).
AttentionOps: Attention operations.
KernelOps: Unified kernel operations interface.
LinearOps: Linear / matrix-multiply operations.
NormOps: Normalization operations.
PositionOps: Positional encoding operations.
SamplingOps: Token sampling operations (GPU-side when possible).

Module kernel_ops

Module kernel_ops Copy item path

Structs§

Enums§

Traits§

Module kernel_ops