Skip to main content

Crate vil_speculative

Crate vil_speculative 

Source
Expand description

VIL Speculative Decoding Proxy.

Implements speculative decoding: a small/fast draft model proposes candidate tokens, then a large target model verifies them in a single forward pass. When draft and target agree, multiple tokens are accepted at once, yielding 2-3x faster generation without quality loss.

§Architecture

User Prompt
    |
    v
┌─────────────────┐     ┌──────────────┐
│  DraftProvider   │────>│   Verifier   │
│  (small model)   │     │ (target LLM) │
│  N tokens fast   │     │ 1-call verify│
└─────────────────┘     └──────┬───────┘
                               │
             accept prefix + correct divergence
                               │
                               v
                      SpeculativeResult

Re-exports§

pub use config::SpeculativeConfig;
pub use decoder::SpeculativeDecoder;
pub use decoder::SpeculativeError;
pub use decoder::SpeculativeResult;
pub use draft::DraftError;
pub use draft::DraftProvider;
pub use verifier::verify_draft;
pub use verifier::VerificationResult;
pub use plugin::SpeculativePlugin;
pub use semantic::SpeculativeEvent;
pub use semantic::SpeculativeFault;
pub use semantic::SpeculativeState;

Modules§

config
decoder
draft
handlers
pipeline_sse
plugin
semantic
Semantic types for speculative decoding operations.
verifier