Skip to main content

MtpSession

Struct MtpSession 

Source
pub struct MtpSession { /* private fields */ }
Expand description

Owned MTP draft session.

Drops the underlying mtp_session * (and the C++ common_speculative * it holds) when freed.

§Lifetime contract (manual)

The session holds raw pointers to both the target and draft LlamaContexts. The caller must keep both contexts alive (i.e. not drop them) for as long as the session exists. This contract is not enforced by the borrow checker — the session does not hold Rust borrows of the contexts, because both contexts must remain individually mutable (you’ll be calling target.decode(...) while the session exists, and the session also mutates the draft context internally).

Dropping a context that the session still references is undefined behaviour at the C++ level (use-after-free inside common_speculative_*).

Implementations§

Source§

impl MtpSession

Source

pub fn new( target: &LlamaContext<'_>, draft: &LlamaContext<'_>, n_seq: u32, n_draft_max: i32, ) -> Result<Self, MtpSessionError>

Construct an MTP draft session.

target must be a LlamaContextType::Default context. draft must be a LlamaContextType::Mtp context built from the same model and configured with with_n_rs_seq(>= n_draft_max).

n_seq is the number of concurrent sequences (1 for a single conversation). n_draft_max caps the number of tokens drafted per round (commonly 3 for Qwen3.6 MTP).

§Errors

Returns MtpSessionError::Init if upstream rejects the configuration (e.g. the model has no MTP heads).

Source

pub fn need_embd(&self) -> bool

True if MTP requires embeddings to be extractable from the target context. For MTP this is always true — exposed for symmetry with upstream’s common_speculative_need_embd.

Source

pub fn n_draft_max(&self) -> i32

Configured maximum number of tokens drafted per draft call.

Source

pub fn n_seq(&self) -> u32

Configured number of sequences.

Source

pub fn begin( &mut self, seq_id: i32, prompt: &[LlamaToken], ) -> Result<(), MtpSessionError>

Optional: call once at the start of a fresh generation with the prompt tokens that were just decoded into the target context.

Source

pub fn process(&mut self, batch: &LlamaBatch) -> Result<(), MtpSessionError>

Hand the session a batch that was just decoded on the target context. MTP needs to see every target batch (prompt prefill + each verification step) to keep its per-sequence pre-norm-embedding carryover in sync.

§Errors

Returns MtpSessionError::Process if upstream rejects the batch (most often: the batch carries embd directly rather than tokens).

Source

pub fn draft( &mut self, seq_id: i32, n_past: i32, id_last: LlamaToken, ) -> Result<Vec<LlamaToken>, MtpSessionError>

Generate up to n_draft_max speculative tokens for sequence seq_id, starting from id_last at position n_past.

Returns an owned Vec<LlamaToken> of length <= n_draft_max.

§Errors

Returns MtpSessionError::BadSeqId if seq_id is outside the configured n_seq range.

Source

pub fn accept( &mut self, seq_id: i32, n_accepted: u16, ) -> Result<(), MtpSessionError>

Inform the session that n_accepted tokens from the last draft were accepted by the target verifier. This is required after every draft call to keep the draft context’s recurrent state consistent.

Trait Implementations§

Source§

impl Debug for MtpSession

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Drop for MtpSession

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more
Source§

impl Send for MtpSession

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more