Skip to main content

Crate abyo_speculate

Crate abyo_speculate 

Source
Expand description

§abyo-speculate

Pure Rust Speculative Decoding library for local LLMs, optimized for batch size 1.

See the crate README and the project plan for design context.

§Quick example

use abyo_speculate::{SpeculateEngine, Method};

let mut engine = SpeculateEngine::builder()
    .target_model("llama-3.1-8b-instruct")
    .method(Method::Vanilla)
    .draft_model("tinyllama-1.1b")
    .build()?;

// engine.with_target(...).with_draft(...) attach loaded models;
// see model::qwen2::Qwen2Decoder for a concrete loader.
let _tokens = engine.generate_tokens(&[1u32, 2, 3], 64)?;

Re-exports§

pub use engine::GenerationOptions;
pub use engine::SpeculateEngine;
pub use engine::SpeculateEngineBuilder;
pub use error::Error;
pub use error::Result;
pub use methods::Method;

Modules§

cache
KV-cache primitives with rollback support.
device
Device selection helpers.
engine
SpeculateEngine — the public façade that ties model loading, the chosen SD method, and sampling together.
error
Crate error types.
methods
SD method implementations.
model
Model abstraction over candle decoders.
presets
Curated configurations for the four supported model families.
sampling
Sampling utilities: softmax, top-p, temperature, rejection sampling.
tree
Draft-tree primitives shared by Medusa, EAGLE, and any other tree-style speculative decoder.