Expand description
Speculative decoding engine.
SpeculativeDecoder composes a DraftModel and a TargetModel and
runs the Leviathan / Chen speculative generation loop:
loop until max_tokens reached:
1. draft.propose(prefix, k) → k candidate tokens + distributions
2. target.verify(prefix, draft) → k+1 target distributions
3. for i in 0..k:
if accept(p_draft_i, p_target_i, rng):
append draft[i]
else:
append resample_from_adjusted_target(p_target_i, p_draft_i, rng)
break
4. if all k accepted:
append sample_from_logprobs(p_target_{k+1}, rng) (bonus)
5. update metrics; continue.The number of tokens appended per round is therefore in 1..=k+1,
and — crucially — the marginal distribution of each appended token is
identical to p_target(prefix). That correctness is what the empirical
chi-square test in tests.rs validates against 10 000 samples.
Structs§
- Speculative
Decoder - Speculative decoder composing a draft model and a target model.
- Speculative
Decoder Config - Configuration for the speculative decoder.