Skip to main content

Module mtp

Module mtp 

Source
Expand description

Safe wrapper around the C++ MTP draft session.

MtpSession pairs a target LlamaContext with an MTP draft LlamaContext (built with crate::context::params::LlamaContextType::Mtp) and drives the multi-token-prediction speculative-decoding loop introduced in upstream llama.cpp PR #22673.

The actual draft algorithm lives in upstream’s common/speculative.cpp (common_speculative_state_draft_mtp); this module is a thin Rust safe wrapper around a small C++ shim in llama-cpp-sys-4/mtp_shim/ that re-exposes that C++ class with C linkage.

§Usage outline

// Build the target context (default) and the MTP draft context.
let target = model.new_context(&backend, LlamaContextParams::default())?;
let draft  = model.new_context(
    &backend,
    LlamaContextParams::default()
        .with_ctx_type(LlamaContextType::Mtp)
        .with_n_rs_seq(4),
)?;

let mut sess = MtpSession::new(&target, &draft, 1, 3)?;

// After every llama_decode on the target context, hand the batch to MTP:
sess.process(&target_batch)?;

// Then ask for a draft starting from the last sampled token:
let drafts = sess.draft(0, n_past, last_token)?;

// Verify against target, decide how many to accept, then:
sess.accept(0, n_accepted as u16)?;

Structs§

MtpSession
Owned MTP draft session.

Enums§

MtpSessionError
Errors raised by the MTP draft session.