Expand description
VIL Speculative Decoding Proxy.
Implements speculative decoding: a small/fast draft model proposes candidate tokens, then a large target model verifies them in a single forward pass. When draft and target agree, multiple tokens are accepted at once, yielding 2-3x faster generation without quality loss.
§Architecture
User Prompt
|
v
┌─────────────────┐ ┌──────────────┐
│ DraftProvider │────>│ Verifier │
│ (small model) │ │ (target LLM) │
│ N tokens fast │ │ 1-call verify│
└─────────────────┘ └──────┬───────┘
│
accept prefix + correct divergence
│
v
SpeculativeResultRe-exports§
pub use config::SpeculativeConfig;pub use decoder::SpeculativeDecoder;pub use decoder::SpeculativeError;pub use decoder::SpeculativeResult;pub use draft::DraftError;pub use draft::DraftProvider;pub use verifier::verify_draft;pub use verifier::VerificationResult;pub use plugin::SpeculativePlugin;pub use semantic::SpeculativeEvent;pub use semantic::SpeculativeFault;pub use semantic::SpeculativeState;