Struct SpeculativeDecoder

Source

pub struct SpeculativeDecoder<'a> {
    pub draft_engine: InferenceEngine<'a>,
    pub config: SpeculativeConfig,
    pub total_steps: u64,
    pub total_draft_tokens: u64,
    pub total_accepted_tokens: u64,
    /* private fields */
}

Expand description

Speculative decoder: wraps a draft InferenceEngine and provides draft-then-verify generation with running acceptance statistics.

Fields§

§draft_engine: InferenceEngine<'a>

Draft model engine (smaller/faster model).

§config: SpeculativeConfig

Speculative decoding configuration.

§total_steps: u64

Total number of speculative steps taken.

§total_draft_tokens: u64

Total number of tokens proposed by the draft model.

§total_accepted_tokens: u64

Total number of tokens accepted after target verification.

Implementations§

Source §

impl<'a> SpeculativeDecoder<'a>

Source

pub fn new(draft_engine: InferenceEngine<'a>, config: SpeculativeConfig) -> Self

Create a new speculative decoder with the given draft engine and config.

Source

pub fn with_adaptive( draft_engine: InferenceEngine<'a>, config: SpeculativeConfig, adaptive_config: AdaptiveLookaheadConfig, ) -> Result<Self, AdaptiveLookaheadError>

Create a speculative decoder with an AdaptiveLookahead controller active. The initial lookahead is taken from adaptive_config.initial and overrides config.lookahead for the first step.

Source

pub fn adaptive(&self) -> Option<&AdaptiveLookahead>

Read the current adaptive controller, if any.

Source

pub fn adaptive_mut(&mut self) -> Option<&mut AdaptiveLookahead>

Mutable access to the adaptive controller, if any.

Source

pub fn draft(&mut self, context: &[u32], _params: &SamplingParams) -> Vec<u32>

Generate up to config.lookahead draft tokens from the draft model.

In this implementation, the draft engine uses its sampler to produce tokens autoregressively from context. The returned tokens are the draft candidates for target-model verification.

Source

pub fn verify( &self, draft_tokens: &[u32], target_logits: &[Vec<f32>], _params: &SamplingParams, ) -> Vec<u32>

Verify draft tokens against target-model logits.

For each draft position i, the target’s probability p_t(t_i) is compared against a mock draft probability p_d(t_i) derived from the target logits (as a self-consistency check when target logits are provided). In production, p_d comes from the draft model’s softmax.

Acceptance criterion (speculative sampling):

Accept if p_t(t_i) >= p_d(t_i)
Else accept with probability p_t(t_i) / p_d(t_i)

Returns only the prefix of tokens accepted before the first rejection.

Source

pub fn step( &mut self, context: &[u32], target_logits: &[Vec<f32>], params: &SamplingParams, ) -> SpeculativeStep

Perform one complete speculative decoding step: draft K tokens then verify.

Returns a SpeculativeStep with the draft proposals, accepted subset, and per-step acceptance rate.

Source

pub fn generate_speculative( &mut self, prompt_tokens: &[u32], max_tokens: usize, params: &SamplingParams, ) -> Vec<u32>

Generate up to max_tokens tokens using speculative decoding.

Each step drafts lookahead candidates, verifies them, and appends accepted tokens. The loop continues until max_tokens are collected or generation stalls (no tokens accepted/generated).

In this mock implementation, target logits are synthesised from the draft engine’s perspective — in production the target model would score all positions in one batched forward pass.

Source

pub fn acceptance_rate(&self) -> f32

Overall acceptance rate: accepted tokens / draft tokens, across all steps.

Returns 0.0 if no drafts have been generated yet.

Source

pub fn speedup_estimate(&self) -> f32

Theoretical speedup estimate from speculative decoding.

Speedup ≈ accepted tokens per step (capped at lookahead). Returns the mean accepted tokens per step, which indicates how many target forward passes were “skipped” relative to autoregressive decoding.

A return of 1.0 means no speedup (equivalent to autoregressive); higher values indicate benefit from speculative parallelism.

Source

pub fn reset_stats(&mut self)

Reset all accumulated statistics (steps, tokens, acceptance counts). If an adaptive controller is attached, its EWMA is also reset.

Auto Trait Implementations§

§

impl<'a> UnwindSafe for SpeculativeDecoder<'a>

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §