Expand description
Llama model executor using our custom Llama implementation.
Uses GenericKvCacheHandle (like Qwen3) with per-request cache_id. Supports CUDA decode runner for GPU acceleration.
Structsยง
- Candle
Model Executor - Llama model executor
Llama model executor using our custom Llama implementation.
Uses GenericKvCacheHandle (like Qwen3) with per-request cache_id. Supports CUDA decode runner for GPU acceleration.