pub trait LmRunner: Send {
// Required methods
fn family(&self) -> &'static str;
fn vocab_size(&self) -> usize;
fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>>;
// Provided methods
fn generate(
&mut self,
prompt_ids: &[u32],
n_new: usize,
on_token: &mut dyn FnMut(u32) -> bool,
) -> Result<Vec<u32>> { ... }
fn supports_multimodal(&self) -> bool { ... }
fn generate_multimodal(
&mut self,
_prompt: &str,
_rgb: &[u8],
_img_w: usize,
_img_h: usize,
_tokenizer: Option<&Path>,
_n_new: usize,
_on_token: &mut dyn FnMut(u32) -> bool,
) -> Result<Vec<u32>> { ... }
}Expand description
Minimal per-family runner interface.
Implementations must be Send so the boxed trait can move across
threads (e.g. when skill runs inference on a worker pool).
Sync is intentionally not required — most runners hold mutable
per-call compile / cache state.
Required Methods§
Sourcefn family(&self) -> &'static str
fn family(&self) -> &'static str
Short family identifier matching rlx-cli::arch_runner_name
(e.g. "qwen3", "qwen35", "gemma", "llama32"). Useful
for logging / metrics / per-family branches in the caller.
Sourcefn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
LM head vocab size — useful for callers that need to size a
logit buffer or validate token ids before calling
Self::predict_logits. PLAN.md M9.
Provided Methods§
Sourcefn generate(
&mut self,
prompt_ids: &[u32],
n_new: usize,
on_token: &mut dyn FnMut(u32) -> bool,
) -> Result<Vec<u32>>
fn generate( &mut self, prompt_ids: &[u32], n_new: usize, on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>>
Generate up to n_new tokens after prompt_ids using greedy
(argmax) sampling. on_token fires once per generated token
and returns true to continue, false to stop. Returns
the generated id sequence (excluding the prompt).
Stop-signal honoring varies by family (PLAN.md M9):
- default impl +
Qwen35Runner— honor the return value. Qwen3Runner/GemmaRunner/Llama32Runner— call the callback but ignore its return (their inherentgeneratedoesn’t take a bool callback). Pass an EOS-aware sampler in the caller, or checkproduced.last()after the call.
Default impl is naive: re-prefill on the full context each step. Per-family runners override with their cached decode fast path.
Sourcefn supports_multimodal(&self) -> bool
fn supports_multimodal(&self) -> bool
Whether this runner supports multimodal (image+text) generation
via Self::generate_multimodal. Default false. Per-family
runners that wire a vision encoder (e.g. Qwen35Runner with an
mmproj path) override to true.
Sourcefn generate_multimodal(
&mut self,
_prompt: &str,
_rgb: &[u8],
_img_w: usize,
_img_h: usize,
_tokenizer: Option<&Path>,
_n_new: usize,
_on_token: &mut dyn FnMut(u32) -> bool,
) -> Result<Vec<u32>>
fn generate_multimodal( &mut self, _prompt: &str, _rgb: &[u8], _img_w: usize, _img_h: usize, _tokenizer: Option<&Path>, _n_new: usize, _on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>>
Multimodal text generation: prefill the trunk with prompt text
where image markers are spliced with vision embeddings derived
from rgb (raw RGB bytes, row-major [h, w, 3]). Streams one
token per on_token call; returns the full produced sequence.
Default impl returns an error — only family runners that wire a vision encoder override this. Match parity with llama-cpp’s MtmdContext-based multimodal eval path.
Dyn Compatibility§
This trait is dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".