Expand description
Inference engine interface with streaming and batch support
This module provides the top-level inference engine interface that orchestrates all other components: tokenizer, model executor, scheduler, and sampler.
Traits§
- Advanced
Inference Engine - Advanced engine capabilities
- Inference
Engine - Core inference engine trait
Type Aliases§
- Hardware
Constraints - Hardware constraints alias
- Latency
Requirements - Latency requirements alias
- Request
Characteristics - Request characteristics alias
- Speculation
Config - Speculation configuration for speculative decoding