Skip to main content

Module candle_executor

Module candle_executor 

Source
Expand description

Llama model executor using our custom Llama implementation.

Uses GenericKvCacheHandle (like Qwen3) with per-request cache_id. Supports CUDA decode runner for GPU acceleration.

Structsยง

CandleModelExecutor
Llama model executor