Module llm_executor

Expand description

LlmExecutor<M> — adapts a DecoderOnlyLLM to the ModelExecutor trait the engine scheduler calls.

This is the Model-as-Code equivalent of GenericModelExecutor: where GenericModelExecutor wraps a Box<dyn RunnerInterface> (legacy ModelRunner<B>), LlmExecutor wraps a Box<dyn DecoderOnlyLLM> (new-style per-model code such as Qwen3Model<B>).

Tokens/logits are currently bridged through candle Tensor for TensorRef — Phase C will likely replace that with SmallTensor to drop candle from the hot path.

Structs§

LlmExecutor