Expand description
Main inference engine — orchestrates model loading and text generation.
Structs§
- Engine
Config - Configuration for the inference engine.
- Inference
Engine - The main inference engine.
Constants§
- FLASH_
ATTN_ THRESHOLD - Sequence-length threshold above which the engine routes attention through the memory-efficient tiled flash-attention kernel rather than the naïve full-score-matrix path.