Skip to main content

MtpSession

Struct MtpSession 

Source
pub struct MtpSession { /* private fields */ }
Expand description

Owned MTP draft session.

Drops the underlying mtp_session * (and the C++ common_speculative * it holds) when freed.

§Lifetime contract (manual)

The session holds raw pointers to both the target and draft LlamaContexts. The caller must keep both contexts alive (i.e. not drop them) for as long as the session exists.

Implementations§

Source§

impl MtpSession

Source

pub fn new( target: &LlamaContext<'_>, draft: &LlamaContext<'_>, n_seq: u32, n_draft_max: i32, ) -> Result<Self, MtpSessionError>

Construct an MTP draft session with upstream defaults for n_min and p_min.

Equivalent to new_with_config(MtpSessionConfig::new(n_seq, n_draft_max)).

§Examples
let mut session = MtpSession::new(&target, &draft, 1, 3)?;
§Errors

Returns MtpSessionError::Init or MtpSessionError::InvalidConfig.

Source

pub fn new_with_config( target: &LlamaContext<'_>, draft: &LlamaContext<'_>, config: MtpSessionConfig, ) -> Result<Self, MtpSessionError>

Construct an MTP draft session with full speculative draft parameters.

target must be a LlamaContextType::Default context. draft must be a LlamaContextType::Mtp context from the same model, with LlamaContextParams::with_n_rs_seq >= config.n_draft_max.

§Examples
let config = MtpSessionConfig::new(1, 1)
    .with_p_min(0.0); // match upstream default after #23269
let session = MtpSession::new_with_config(&target, &draft, config)?;
§Errors

Returns MtpSessionError::Init or MtpSessionError::InvalidConfig.

Source

pub fn config(&self) -> MtpSessionConfig

Session configuration passed at construction.

Source

pub fn need_embd(&self) -> bool

True when the speculative backend needs post-norm embeddings on the target context (llama_set_embeddings).

MTP returns false; use Self::need_embd_pre_norm for MTP.

Source

pub fn need_embd_pre_norm(&self) -> bool

True when the speculative backend needs pre-norm hidden states on the target context (llama_set_embeddings_pre_norm).

MTP returns true. Upstream configures this on both contexts during session init; callers normally do not need to set it manually.

Source

pub fn n_draft_max(&self) -> i32

Configured maximum number of tokens drafted per draft call.

Source

pub fn n_min(&self) -> i32

Configured minimum draft tokens (n_min).

Source

pub fn p_min(&self) -> f32

Configured draft probability floor (p_min).

Source

pub fn n_seq(&self) -> u32

Configured number of sequences.

Source

pub fn print_stats(&self)

Log speculative-decoding statistics (draft/accept counts and timings) via llama.cpp LOG_INF. Install a log callback with crate::log_set to capture output.

§Examples
// After your generation loop:
session.print_stats();
Source

pub fn begin( &mut self, seq_id: i32, prompt: &[LlamaToken], ) -> Result<(), MtpSessionError>

Optional: call once at the start of a fresh generation with the prompt tokens that were just decoded into the target context.

Upstream uses this for prompt tracking; MTP speculative loops often work without it if you call Self::process after every target decode.

§Examples
session.begin(0, &prompt_tokens)?;
Source

pub fn process(&mut self, batch: &LlamaBatch) -> Result<(), MtpSessionError>

Hand the session a batch that was just decoded on the target context.

Call this after every successful target.decode(batch) so upstream can sync draft recurrent state with the target KV cache.

§Examples
target.decode(&mut batch)?;
session.process(&batch)?;
Source

pub fn draft( &mut self, seq_id: i32, n_past: i32, id_last: LlamaToken, ) -> Result<Vec<LlamaToken>, MtpSessionError>

Generate up to n_draft_max speculative tokens.

n_past is the number of tokens already in the target KV cache for seq_id. id_last is the last token accepted on the target (usually the token you just sampled).

§Examples
let drafts = session.draft(0, n_past, last_token)?;
for draft in &drafts {
    // verify each draft against target logits ...
}
Source

pub fn accept( &mut self, seq_id: i32, n_accepted: u16, ) -> Result<(), MtpSessionError>

Inform the session how many draft tokens the target verifier accepted.

Pass 0 when every draft was rejected. Upstream rolls back draft recurrent state accordingly.

§Examples
session.accept(0, n_accepted)?;

Trait Implementations§

Source§

impl Debug for MtpSession

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Drop for MtpSession

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more
Source§

impl Send for MtpSession

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more