Skip to main content

Qwen3Runner

Struct Qwen3Runner 

Source
pub struct Qwen3Runner { /* private fields */ }
Expand description

Resolved Qwen3 runner — call Qwen3Runner::generate for streaming decode (F32 path), or Qwen3Runner::predict_logits for a single forward pass (works in both F32 and packed modes).

Implementations§

Source§

impl Qwen3Runner

Source

pub fn builder() -> Qwen3RunnerBuilder

Source

pub fn config(&self) -> &Qwen3Config

Source

pub fn device(&self) -> Device

Source

pub fn disable_decode_compile_cache(&mut self)

Bypass the cached decode path; every generated token re-runs the full prefill graph from scratch. Slow (O(N²)) but a reference for numerical parity checks against the cached path.

Source

pub fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>, Error>

Generate n_new tokens after the given prompt. on_token is called once per generated id when stream(true) is set; otherwise the callback fires once at the end with the full vector. Returns the full generated id sequence.

The prompt is expected as raw token ids — tokenizer integration lives outside this module today (use the example binary for an end-to-end pipeline that wires tokenizers). Run a single prefill pass and return the last-position logits. Works in both F32 mode and packed-weights mode — in packed mode this is the only forward path supported today (streaming decode still goes through the F32 generator).

The prompt length must match the bucket the runner was built for (max_seq); shorter prompts are padded with the first token, longer prompts are truncated.

Source

pub fn generate_packed( &mut self, prompt_ids: &[u32], n_new: usize, on_token: impl FnMut(u32), ) -> Result<Vec<u32>, Error>

Generate n_new tokens via repeated packed-mode prefills. Each step runs the full prefill graph against the growing token history (padded/truncated to max_seq), samples the next id, and appends it. Calls on_token per id.

Trade-off vs generate() on the F32 path: every token pays a full prefill instead of one decode step, so wall-clock throughput is ~max_seq × slower. Memory stays packed though — the only path that actually loads 14 B+ Q4_K_M GGUFs on a 32 GB Mac today. Tighter throughput needs the real bucketed decode-graph machinery (separate TODO; see CHANGELOG known-limitations).

Source

pub fn generate( &mut self, prompt_ids: &[u32], n_new: usize, on_token: impl FnMut(u32), ) -> Result<Vec<u32>, Error>

Source

pub fn generate_stoppable( &mut self, prompt_ids: &[u32], n_new: usize, on_token: impl FnMut(u32) -> bool, ) -> Result<Vec<u32>, Error>

Like [generate] but the callback can return false to stop sampling early (e.g. on EOS).

Trait Implementations§

Source§

impl LmRunner for Qwen3Runner

Source§

fn family(&self) -> &'static str

Short family identifier ("qwen3", "llama32", "gemma").
Source§

fn vocab_size(&self) -> usize

LM head vocabulary size.
Source§

fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>, Error>

Run prefill on prompt_ids and return last-token logits.
Source§

fn generate( &mut self, prompt_ids: &[u32], n_new: usize, on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>, Error>

Generate up to n_new tokens after prompt_ids using greedy (argmax) sampling. The default impl re-prefills on the full context each step — per-family runners should override with their cached decode fast path. Read more
Source§

fn supports_multimodal(&self) -> bool

Whether this runner supports multimodal (image+text) generation.
Source§

fn generate_multimodal( &mut self, _prompt: &str, _rgb: &[u8], _img_w: usize, _img_h: usize, _tokenizer: Option<&Path>, _n_new: usize, _on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>, Error>

Multimodal generation: prefill with text where image markers are spliced with vision embeddings derived from rgb.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V