Skip to main content

Module backend

Module backend

Expand description

Unified Backend trait for CUDA, Metal, and CPU compute.

Each backend implements the same set of transformer-layer primitives (GEMM, norms, RoPE, attention, activations). layer_forward() and ModelRunner are generic over Backend, so one forward path serves all hardware targets.

Modules§

cpu: CPU backend using Accelerate (macOS) / portable fallback (Linux). Context = () — all ops execute immediately, no batching needed.

Structs§

AttnConfig: Configuration for attention dispatch.
KvCache: Per-layer KV cache. Each model owns its own Vec<KvCache<B>> per sequence.
QuantWeights: Packed quantized weight buffers passed to Backend::gemm_quant.

Enums§

GgufQuantType: GGUF quantization sub-type (expand as kernels are added).
QuantKind: Quantization flavour discriminator for Backend::gemm_quant.
ReduceOp: Collective-op reduction kind for TP all_reduce.
SrcDtype: Source dtype for a weight tensor read straight from safetensors mmap.

Traits§

Backend: The core abstraction over CUDA / Metal / CPU.