Skip to main content

LmRunner

Trait LmRunner 

Source
pub trait LmRunner: Send {
    // Required methods
    fn family(&self) -> &'static str;
    fn vocab_size(&self) -> usize;
    fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>>;

    // Provided methods
    fn generate(
        &mut self,
        prompt_ids: &[u32],
        n_new: usize,
        on_token: &mut dyn FnMut(u32) -> bool,
    ) -> Result<Vec<u32>> { ... }
    fn supports_multimodal(&self) -> bool { ... }
    fn generate_multimodal(
        &mut self,
        _prompt: &str,
        _rgb: &[u8],
        _img_w: usize,
        _img_h: usize,
        _tokenizer: Option<&Path>,
        _n_new: usize,
        _on_token: &mut dyn FnMut(u32) -> bool,
    ) -> Result<Vec<u32>> { ... }
}
Expand description

Minimal per-family runner interface.

Implementations must be Send so the boxed trait can move across threads (e.g. when skill runs inference on a worker pool). Sync is intentionally not required — most runners hold mutable per-call compile / cache state.

Required Methods§

Source

fn family(&self) -> &'static str

Short family identifier matching rlx-cli::arch_runner_name (e.g. "qwen3", "qwen35", "gemma", "llama32"). Useful for logging / metrics / per-family branches in the caller.

Source

fn vocab_size(&self) -> usize

LM head vocab size — useful for callers that need to size a logit buffer or validate token ids before calling Self::predict_logits. PLAN.md M9.

Source

fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>>

Run prefill on prompt_ids and return the last-token logits over the full vocab. Mirrors the existing predict_logits method on every per-family runner.

Provided Methods§

Source

fn generate( &mut self, prompt_ids: &[u32], n_new: usize, on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>>

Generate up to n_new tokens after prompt_ids using greedy (argmax) sampling. on_token fires once per generated token and returns true to continue, false to stop. Returns the generated id sequence (excluding the prompt).

Stop-signal honoring varies by family (PLAN.md M9):

  • default impl + Qwen35Runner — honor the return value.
  • Qwen3Runner / GemmaRunner / Llama32Runner — call the callback but ignore its return (their inherent generate doesn’t take a bool callback). Pass an EOS-aware sampler in the caller, or check produced.last() after the call.

Default impl is naive: re-prefill on the full context each step. Per-family runners override with their cached decode fast path.

Source

fn supports_multimodal(&self) -> bool

Whether this runner supports multimodal (image+text) generation via Self::generate_multimodal. Default false. Per-family runners that wire a vision encoder (e.g. Qwen35Runner with an mmproj path) override to true.

Source

fn generate_multimodal( &mut self, _prompt: &str, _rgb: &[u8], _img_w: usize, _img_h: usize, _tokenizer: Option<&Path>, _n_new: usize, _on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>>

Multimodal text generation: prefill the trunk with prompt text where image markers are spliced with vision embeddings derived from rgb (raw RGB bytes, row-major [h, w, 3]). Streams one token per on_token call; returns the full produced sequence.

Default impl returns an error — only family runners that wire a vision encoder override this. Match parity with llama-cpp’s MtmdContext-based multimodal eval path.

Dyn Compatibility§

This trait is dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementors§