pub struct MtpSession { /* private fields */ }Expand description
Owned MTP draft session.
Drops the underlying mtp_session * (and the C++ common_speculative *
it holds) when freed.
§Lifetime contract (manual)
The session holds raw pointers to both the target and draft
LlamaContexts. The caller must keep both contexts alive (i.e. not
drop them) for as long as the session exists.
Implementations§
Source§impl MtpSession
impl MtpSession
Sourcepub fn new(
target: &LlamaContext<'_>,
draft: &LlamaContext<'_>,
n_seq: u32,
n_draft_max: i32,
) -> Result<Self, MtpSessionError>
pub fn new( target: &LlamaContext<'_>, draft: &LlamaContext<'_>, n_seq: u32, n_draft_max: i32, ) -> Result<Self, MtpSessionError>
Construct an MTP draft session with upstream defaults for n_min and
p_min.
Equivalent to new_with_config(MtpSessionConfig::new(n_seq, n_draft_max)).
§Examples
let mut session = MtpSession::new(&target, &draft, 1, 3)?;§Errors
Returns MtpSessionError::Init or MtpSessionError::InvalidConfig.
Sourcepub fn new_with_config(
target: &LlamaContext<'_>,
draft: &LlamaContext<'_>,
config: MtpSessionConfig,
) -> Result<Self, MtpSessionError>
pub fn new_with_config( target: &LlamaContext<'_>, draft: &LlamaContext<'_>, config: MtpSessionConfig, ) -> Result<Self, MtpSessionError>
Construct an MTP draft session with full speculative draft parameters.
target must be a LlamaContextType::Default context.
draft must be a LlamaContextType::Mtp context from the same model,
with LlamaContextParams::with_n_rs_seq
>= config.n_draft_max.
§Examples
let config = MtpSessionConfig::new(1, 1)
.with_p_min(0.0); // match upstream default after #23269
let session = MtpSession::new_with_config(&target, &draft, config)?;§Errors
Returns MtpSessionError::Init or MtpSessionError::InvalidConfig.
Sourcepub fn config(&self) -> MtpSessionConfig
pub fn config(&self) -> MtpSessionConfig
Session configuration passed at construction.
Sourcepub fn need_embd(&self) -> bool
pub fn need_embd(&self) -> bool
True when the speculative backend needs post-norm embeddings on the
target context (llama_set_embeddings).
MTP returns false; use Self::need_embd_pre_norm for MTP.
Sourcepub fn need_embd_pre_norm(&self) -> bool
pub fn need_embd_pre_norm(&self) -> bool
True when the speculative backend needs pre-norm hidden states on the
target context (llama_set_embeddings_pre_norm).
MTP returns true. Upstream configures this on both contexts during session init; callers normally do not need to set it manually.
Sourcepub fn n_draft_max(&self) -> i32
pub fn n_draft_max(&self) -> i32
Configured maximum number of tokens drafted per draft
call.
Sourcepub fn print_stats(&self)
pub fn print_stats(&self)
Log speculative-decoding statistics (draft/accept counts and timings) via
llama.cpp LOG_INF. Install a log callback with crate::log_set to
capture output.
§Examples
// After your generation loop:
session.print_stats();Sourcepub fn begin(
&mut self,
seq_id: i32,
prompt: &[LlamaToken],
) -> Result<(), MtpSessionError>
pub fn begin( &mut self, seq_id: i32, prompt: &[LlamaToken], ) -> Result<(), MtpSessionError>
Optional: call once at the start of a fresh generation with the prompt tokens that were just decoded into the target context.
Upstream uses this for prompt tracking; MTP speculative loops often
work without it if you call Self::process after every target decode.
§Examples
session.begin(0, &prompt_tokens)?;Sourcepub fn process(&mut self, batch: &LlamaBatch) -> Result<(), MtpSessionError>
pub fn process(&mut self, batch: &LlamaBatch) -> Result<(), MtpSessionError>
Sourcepub fn draft(
&mut self,
seq_id: i32,
n_past: i32,
id_last: LlamaToken,
) -> Result<Vec<LlamaToken>, MtpSessionError>
pub fn draft( &mut self, seq_id: i32, n_past: i32, id_last: LlamaToken, ) -> Result<Vec<LlamaToken>, MtpSessionError>
Generate up to n_draft_max speculative tokens.
n_past is the number of tokens already in the target KV cache for
seq_id. id_last is the last token accepted on the target (usually
the token you just sampled).
§Examples
let drafts = session.draft(0, n_past, last_token)?;
for draft in &drafts {
// verify each draft against target logits ...
}