Skip to main content

Module engine

Module engine 

Source
Expand description

Main inference engine — orchestrates model loading and text generation.

Structs§

EngineConfig
Configuration for the inference engine.
InferenceEngine
The main inference engine.

Constants§

FLASH_ATTN_THRESHOLD
Sequence-length threshold above which the engine routes attention through the memory-efficient tiled flash-attention kernel rather than the naïve full-score-matrix path.