Expand description
Safe wrapper around the C++ MTP draft session.
MtpSession pairs a target LlamaContext with an MTP draft
LlamaContext (built with
crate::context::params::LlamaContextType::Mtp) and drives the
multi-token-prediction speculative-decoding loop introduced in upstream
llama.cpp PR #22673.
The actual draft algorithm lives in upstream’s
common/speculative.cpp (common_speculative_state_draft_mtp); this
module is a thin Rust safe wrapper around a small C++ shim in
llama-cpp-sys-4/mtp_shim/ that re-exposes that C++ class with C linkage.
§Usage outline
ⓘ
// Build the target context (default) and the MTP draft context.
let target = model.new_context(&backend, LlamaContextParams::default())?;
let draft = model.new_context(
&backend,
LlamaContextParams::default()
.with_ctx_type(LlamaContextType::Mtp)
.with_n_rs_seq(4),
)?;
let mut sess = MtpSession::new(&target, &draft, 1, 3)?;
// After every llama_decode on the target context, hand the batch to MTP:
sess.process(&target_batch)?;
// Then ask for a draft starting from the last sampled token:
let drafts = sess.draft(0, n_past, last_token)?;
// Verify against target, decide how many to accept, then:
sess.accept(0, n_accepted as u16)?;Structs§
- MtpSession
- Owned MTP draft session.
Enums§
- MtpSession
Error - Errors raised by the MTP draft session.