Expand description
§abyo-speculate
Pure Rust Speculative Decoding library for local LLMs, optimized for batch size 1.
See the crate README and the project plan for design context.
§Quick example
use abyo_speculate::{SpeculateEngine, Method};
let mut engine = SpeculateEngine::builder()
.target_model("llama-3.1-8b-instruct")
.method(Method::Vanilla)
.draft_model("tinyllama-1.1b")
.build()?;
// engine.with_target(...).with_draft(...) attach loaded models;
// see model::qwen2::Qwen2Decoder for a concrete loader.
let _tokens = engine.generate_tokens(&[1u32, 2, 3], 64)?;Re-exports§
pub use engine::GenerationOptions;pub use engine::SpeculateEngine;pub use engine::SpeculateEngineBuilder;pub use error::Error;pub use error::Result;pub use methods::Method;
Modules§
- cache
- KV-cache primitives with rollback support.
- device
- Device selection helpers.
- engine
SpeculateEngine— the public façade that ties model loading, the chosen SD method, and sampling together.- error
- Crate error types.
- methods
- SD method implementations.
- model
- Model abstraction over candle decoders.
- presets
- Curated configurations for the four supported model families.
- sampling
- Sampling utilities: softmax, top-p, temperature, rejection sampling.
- tree
- Draft-tree primitives shared by Medusa, EAGLE, and any other tree-style speculative decoder.