Skip to main content

Module backend

Module backend 

Source
Expand description

Unified Backend trait for CUDA, Metal, and CPU compute.

Each backend implements the same set of transformer-layer primitives (GEMM, norms, RoPE, attention, activations). layer_forward() and ModelRunner are generic over Backend, so one forward path serves all hardware targets.

Modules§

cpu
CPU backend using Accelerate (macOS) / portable fallback (Linux). Context = () — all ops execute immediately, no batching needed.

Structs§

AttnConfig
Configuration for attention dispatch.
KvCache
Per-layer KV cache. Each model owns its own Vec<KvCache<B>> per sequence.
QuantWeights
Packed quantized weight buffers passed to Backend::gemm_quant.

Enums§

GgufQuantType
GGUF quantization sub-type (expand as kernels are added).
QuantKind
Quantization flavour discriminator for Backend::gemm_quant.
ReduceOp
Collective-op reduction kind for TP all_reduce.
SrcDtype
Source dtype for a weight tensor read straight from safetensors mmap.

Traits§

Backend
The core abstraction over CUDA / Metal / CPU.