pub struct SpeculativeDecoder<'a> {
pub draft_engine: InferenceEngine<'a>,
pub config: SpeculativeConfig,
pub total_steps: u64,
pub total_draft_tokens: u64,
pub total_accepted_tokens: u64,
/* private fields */
}Expand description
Speculative decoder: wraps a draft InferenceEngine and provides
draft-then-verify generation with running acceptance statistics.
Fields§
§draft_engine: InferenceEngine<'a>Draft model engine (smaller/faster model).
config: SpeculativeConfigSpeculative decoding configuration.
total_steps: u64Total number of speculative steps taken.
total_draft_tokens: u64Total number of tokens proposed by the draft model.
total_accepted_tokens: u64Total number of tokens accepted after target verification.
Implementations§
Source§impl<'a> SpeculativeDecoder<'a>
impl<'a> SpeculativeDecoder<'a>
Sourcepub fn new(draft_engine: InferenceEngine<'a>, config: SpeculativeConfig) -> Self
pub fn new(draft_engine: InferenceEngine<'a>, config: SpeculativeConfig) -> Self
Create a new speculative decoder with the given draft engine and config.
Sourcepub fn with_adaptive(
draft_engine: InferenceEngine<'a>,
config: SpeculativeConfig,
adaptive_config: AdaptiveLookaheadConfig,
) -> Result<Self, AdaptiveLookaheadError>
pub fn with_adaptive( draft_engine: InferenceEngine<'a>, config: SpeculativeConfig, adaptive_config: AdaptiveLookaheadConfig, ) -> Result<Self, AdaptiveLookaheadError>
Create a speculative decoder with an AdaptiveLookahead controller
active. The initial lookahead is taken from adaptive_config.initial
and overrides config.lookahead for the first step.
Sourcepub fn adaptive(&self) -> Option<&AdaptiveLookahead>
pub fn adaptive(&self) -> Option<&AdaptiveLookahead>
Read the current adaptive controller, if any.
Sourcepub fn adaptive_mut(&mut self) -> Option<&mut AdaptiveLookahead>
pub fn adaptive_mut(&mut self) -> Option<&mut AdaptiveLookahead>
Mutable access to the adaptive controller, if any.
Sourcepub fn draft(&mut self, context: &[u32], _params: &SamplingParams) -> Vec<u32>
pub fn draft(&mut self, context: &[u32], _params: &SamplingParams) -> Vec<u32>
Generate up to config.lookahead draft tokens from the draft model.
In this implementation, the draft engine uses its sampler to produce tokens
autoregressively from context. The returned tokens are the draft candidates
for target-model verification.
Sourcepub fn verify(
&self,
draft_tokens: &[u32],
target_logits: &[Vec<f32>],
_params: &SamplingParams,
) -> Vec<u32>
pub fn verify( &self, draft_tokens: &[u32], target_logits: &[Vec<f32>], _params: &SamplingParams, ) -> Vec<u32>
Verify draft tokens against target-model logits.
For each draft position i, the target’s probability p_t(t_i) is
compared against a mock draft probability p_d(t_i) derived from
the target logits (as a self-consistency check when target logits are
provided). In production, p_d comes from the draft model’s softmax.
Acceptance criterion (speculative sampling):
- Accept if
p_t(t_i) >= p_d(t_i) - Else accept with probability
p_t(t_i) / p_d(t_i)
Returns only the prefix of tokens accepted before the first rejection.
Sourcepub fn step(
&mut self,
context: &[u32],
target_logits: &[Vec<f32>],
params: &SamplingParams,
) -> SpeculativeStep
pub fn step( &mut self, context: &[u32], target_logits: &[Vec<f32>], params: &SamplingParams, ) -> SpeculativeStep
Perform one complete speculative decoding step: draft K tokens then verify.
Returns a SpeculativeStep with the draft proposals, accepted subset,
and per-step acceptance rate.
Sourcepub fn generate_speculative(
&mut self,
prompt_tokens: &[u32],
max_tokens: usize,
params: &SamplingParams,
) -> Vec<u32>
pub fn generate_speculative( &mut self, prompt_tokens: &[u32], max_tokens: usize, params: &SamplingParams, ) -> Vec<u32>
Generate up to max_tokens tokens using speculative decoding.
Each step drafts lookahead candidates, verifies them, and appends
accepted tokens. The loop continues until max_tokens are collected
or generation stalls (no tokens accepted/generated).
In this mock implementation, target logits are synthesised from the draft engine’s perspective — in production the target model would score all positions in one batched forward pass.
Sourcepub fn acceptance_rate(&self) -> f32
pub fn acceptance_rate(&self) -> f32
Overall acceptance rate: accepted tokens / draft tokens, across all steps.
Returns 0.0 if no drafts have been generated yet.
Sourcepub fn speedup_estimate(&self) -> f32
pub fn speedup_estimate(&self) -> f32
Theoretical speedup estimate from speculative decoding.
Speedup ≈ accepted tokens per step (capped at lookahead). Returns the mean accepted tokens per step, which indicates how many target forward passes were “skipped” relative to autoregressive decoding.
A return of 1.0 means no speedup (equivalent to autoregressive); higher values indicate benefit from speculative parallelism.
Sourcepub fn reset_stats(&mut self)
pub fn reset_stats(&mut self)
Reset all accumulated statistics (steps, tokens, acceptance counts). If an adaptive controller is attached, its EWMA is also reset.
Auto Trait Implementations§
impl<'a> Freeze for SpeculativeDecoder<'a>
impl<'a> RefUnwindSafe for SpeculativeDecoder<'a>
impl<'a> Send for SpeculativeDecoder<'a>
impl<'a> Sync for SpeculativeDecoder<'a>
impl<'a> Unpin for SpeculativeDecoder<'a>
impl<'a> UnsafeUnpin for SpeculativeDecoder<'a>
impl<'a> UnwindSafe for SpeculativeDecoder<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more