Skip to main content

Module model

Module model 

Source
Expand description

Llama-family transformer model for inference.

Composes trueno primitives (rms_norm, Q4K matmul, fused attention) into a complete transformer that loads GGUF weights and generates text.

Structs§

ForwardArena
Pre-allocated scratch buffers for the forward pass. Eliminates all per-token heap allocations (FALSIFY-ARENA-001). Contract: contracts/cgp/cgp-inference-arena-v1.yaml
KvCache
KV cache for incremental decoding.
LayerWeights
Weights for a single transformer layer.
LlamaModel
Complete transformer model ready for inference.
ModelConfig
Model hyperparameters extracted from GGUF metadata.
ModelWeights
Full model weights.

Enums§

WeightMatrix
A weight matrix that may be Q4K (bytes) or any-other-quant dequantized to F32.