Expand description
LlmExecutor<M> โ adapts a DecoderOnlyLLM to the ModelExecutor trait
the engine scheduler calls.
This is the Model-as-Code equivalent of GenericModelExecutor: where
GenericModelExecutor wraps a Box<dyn RunnerInterface> (legacy
ModelRunner<B>), LlmExecutor wraps a Box<dyn DecoderOnlyLLM>
(new-style per-model code such as Qwen3Model<B>).
Tokens/logits are currently bridged through candle Tensor for
TensorRef โ Phase C will likely replace that with SmallTensor to
drop candle from the hot path.