pub struct Qwen3Runner { /* private fields */ }Expand description
Resolved Qwen3 runner — call Qwen3Runner::generate for
streaming decode (F32 path), or Qwen3Runner::predict_logits
for a single forward pass (works in both F32 and packed modes).
Implementations§
Source§impl Qwen3Runner
impl Qwen3Runner
pub fn builder() -> Qwen3RunnerBuilder
pub fn config(&self) -> &Qwen3Config
pub fn device(&self) -> Device
Sourcepub fn disable_decode_compile_cache(&mut self)
pub fn disable_decode_compile_cache(&mut self)
Bypass the cached decode path; every generated token re-runs the full prefill graph from scratch. Slow (O(N²)) but a reference for numerical parity checks against the cached path.
Sourcepub fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>, Error>
pub fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>, Error>
Generate n_new tokens after the given prompt. on_token is
called once per generated id when stream(true) is set;
otherwise the callback fires once at the end with the full
vector. Returns the full generated id sequence.
The prompt is expected as raw token ids — tokenizer integration
lives outside this module today (use the example binary for an
end-to-end pipeline that wires tokenizers).
Run a single prefill pass and return the last-position
logits. Works in both F32 mode and packed-weights mode —
in packed mode this is the only forward path supported
today (streaming decode still goes through the F32
generator).
The prompt length must match the bucket the runner was
built for (max_seq); shorter prompts are padded with the
first token, longer prompts are truncated.
Sourcepub fn generate_packed(
&mut self,
prompt_ids: &[u32],
n_new: usize,
on_token: impl FnMut(u32),
) -> Result<Vec<u32>, Error>
pub fn generate_packed( &mut self, prompt_ids: &[u32], n_new: usize, on_token: impl FnMut(u32), ) -> Result<Vec<u32>, Error>
Generate n_new tokens via repeated packed-mode prefills.
Each step runs the full prefill graph against the growing
token history (padded/truncated to max_seq), samples the
next id, and appends it. Calls on_token per id.
Trade-off vs generate() on the F32 path: every token pays
a full prefill instead of one decode step, so wall-clock
throughput is ~max_seq × slower. Memory stays packed
though — the only path that actually loads 14 B+ Q4_K_M
GGUFs on a 32 GB Mac today. Tighter throughput needs the
real bucketed decode-graph machinery (separate TODO; see
CHANGELOG known-limitations).
pub fn generate( &mut self, prompt_ids: &[u32], n_new: usize, on_token: impl FnMut(u32), ) -> Result<Vec<u32>, Error>
Trait Implementations§
Source§impl LmRunner for Qwen3Runner
impl LmRunner for Qwen3Runner
Source§fn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
Source§fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>, Error>
fn predict_logits(&mut self, prompt_ids: &[u32]) -> Result<Vec<f32>, Error>
prompt_ids and return last-token logits.Source§fn generate(
&mut self,
prompt_ids: &[u32],
n_new: usize,
on_token: &mut dyn FnMut(u32) -> bool,
) -> Result<Vec<u32>, Error>
fn generate( &mut self, prompt_ids: &[u32], n_new: usize, on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>, Error>
n_new tokens after prompt_ids using greedy
(argmax) sampling. The default impl re-prefills on the full
context each step — per-family runners should override with
their cached decode fast path. Read moreSource§fn supports_multimodal(&self) -> bool
fn supports_multimodal(&self) -> bool
Source§fn generate_multimodal(
&mut self,
_prompt: &str,
_rgb: &[u8],
_img_w: usize,
_img_h: usize,
_tokenizer: Option<&Path>,
_n_new: usize,
_on_token: &mut dyn FnMut(u32) -> bool,
) -> Result<Vec<u32>, Error>
fn generate_multimodal( &mut self, _prompt: &str, _rgb: &[u8], _img_w: usize, _img_h: usize, _tokenizer: Option<&Path>, _n_new: usize, _on_token: &mut dyn FnMut(u32) -> bool, ) -> Result<Vec<u32>, Error>
rgb.Auto Trait Implementations§
impl !RefUnwindSafe for Qwen3Runner
impl !Sync for Qwen3Runner
impl !UnwindSafe for Qwen3Runner
impl Freeze for Qwen3Runner
impl Send for Qwen3Runner
impl Unpin for Qwen3Runner
impl UnsafeUnpin for Qwen3Runner
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more