Crate abyo_speculate

Expand description

§abyo-speculate

Pure Rust Speculative Decoding library for local LLMs, optimized for batch size 1.

See the crate README and the project plan for design context.

§Quick example

use abyo_speculate::{SpeculateEngine, Method};

let mut engine = SpeculateEngine::builder()
    .target_model("llama-3.1-8b-instruct")
    .method(Method::Vanilla)
    .draft_model("tinyllama-1.1b")
    .build()?;

// engine.with_target(...).with_draft(...) attach loaded models;
// see model::qwen2::Qwen2Decoder for a concrete loader.
let _tokens = engine.generate_tokens(&[1u32, 2, 3], 64)?;

Re-exports§

pub use engine::GenerationOptions;
pub use engine::SpeculateEngine;
pub use engine::SpeculateEngineBuilder;
pub use error::Error;
pub use error::Result;
pub use methods::Method;

Modules§

cache: KV-cache primitives with rollback support.
device: Device selection helpers.
engine: SpeculateEngine — the public façade that ties model loading, the chosen SD method, and sampling together.
error: Crate error types.
methods: SD method implementations.
model: Model abstraction over candle decoders.
presets: Curated configurations for the four supported model families.
sampling: Sampling utilities: softmax, top-p, temperature, rejection sampling.
tree: Draft-tree primitives shared by Medusa, EAGLE, and any other tree-style speculative decoder.

Crate abyo_speculate

Crate abyo_speculate Copy item path

§abyo-speculate

§Quick example

Re-exports§

Modules§

Crate abyo_speculate