Skip to main content

Module engine

Module engine 

Source
Expand description

Inference engine orchestrating model loading and generation.

The InferenceEngine is the main entry point for running inference. It owns the model, kernel dispatcher, and sampler, and provides both blocking (InferenceEngine::generate) and streaming (InferenceEngine::generate_streaming) generation APIs.

Structs§

EngineStats
Statistics about engine usage, accumulated over the engine’s lifetime.
InferenceEngine
Top-level inference engine.

Constants§

EOS_TOKEN_ID
EOS token for Qwen3 models.