Expand description
Inference engine orchestrating model loading and generation.
The InferenceEngine is the main entry point for running inference.
It owns the model, kernel dispatcher, and sampler, and provides both
blocking (InferenceEngine::generate) and streaming
(InferenceEngine::generate_streaming) generation APIs.
Structs§
- Engine
Stats - Statistics about engine usage, accumulated over the engine’s lifetime.
- Inference
Engine - Top-level inference engine.
Constants§
- EOS_
TOKEN_ ID - EOS token for Qwen3 models.