pub struct Eagle3Session { /* private fields */ }Expand description
Owned EAGLE-3 draft session.
Drops the underlying speculative context when freed.
§Lifetime contract (manual)
The session holds raw pointers to both the target and draft
LlamaContexts. The caller must keep both contexts alive (i.e. not
drop them) for as long as the session exists.
Implementations§
Source§impl Eagle3Session
impl Eagle3Session
Sourcepub fn new(
target: &LlamaContext<'_>,
draft: &LlamaContext<'_>,
n_seq: u32,
n_draft_max: i32,
) -> Result<Self, Eagle3SessionError>
pub fn new( target: &LlamaContext<'_>, draft: &LlamaContext<'_>, n_seq: u32, n_draft_max: i32, ) -> Result<Self, Eagle3SessionError>
Construct an EAGLE-3 draft session with upstream defaults for n_min
and p_min.
Equivalent to new_with_config(target, draft, Eagle3SessionConfig::new(n_seq, n_draft_max)).
§Errors
Returns Eagle3SessionError::Init or Eagle3SessionError::InvalidConfig.
Sourcepub fn new_with_config(
target: &LlamaContext<'_>,
draft: &LlamaContext<'_>,
config: Eagle3SessionConfig,
) -> Result<Self, Eagle3SessionError>
pub fn new_with_config( target: &LlamaContext<'_>, draft: &LlamaContext<'_>, config: Eagle3SessionConfig, ) -> Result<Self, Eagle3SessionError>
Construct an EAGLE-3 draft session with full speculative draft parameters.
target must be a
LlamaContextType::Default
context over the main model. draft must be a Default context over a
separate EAGLE-3 draft model trained against that target.
§Errors
Returns Eagle3SessionError::Init (e.g. the draft model is not a
valid EAGLE-3 model) or Eagle3SessionError::InvalidConfig.
Sourcepub fn config(&self) -> Eagle3SessionConfig
pub fn config(&self) -> Eagle3SessionConfig
Session configuration passed at construction.
Sourcepub fn need_embd(&self) -> bool
pub fn need_embd(&self) -> bool
True when the speculative backend needs post-norm embeddings on the
target context (llama_set_embeddings).
Sourcepub fn need_embd_pre_norm(&self) -> bool
pub fn need_embd_pre_norm(&self) -> bool
True when the speculative backend needs pre-norm hidden states on the
target context (llama_set_embeddings_pre_norm).
Configured automatically during session init; callers normally do not need to set it manually.
Sourcepub fn n_draft_max(&self) -> i32
pub fn n_draft_max(&self) -> i32
Configured maximum number of tokens drafted per draft call.
Sourcepub fn print_stats(&self)
pub fn print_stats(&self)
Log speculative-decoding statistics (draft/accept counts and timings)
via llama.cpp LOG_INF. Install a log callback with crate::log_set
to capture output.
Sourcepub fn begin(
&mut self,
seq_id: i32,
prompt: &[LlamaToken],
) -> Result<(), Eagle3SessionError>
pub fn begin( &mut self, seq_id: i32, prompt: &[LlamaToken], ) -> Result<(), Eagle3SessionError>
Optional: call once at the start of a fresh generation with the prompt tokens that were just decoded into the target context.
§Errors
Returns Eagle3SessionError::BadSeqId if seq_id is out of range.
Sourcepub fn process(&mut self, batch: &LlamaBatch) -> Result<(), Eagle3SessionError>
pub fn process(&mut self, batch: &LlamaBatch) -> Result<(), Eagle3SessionError>
Hand the session a batch that was just decoded on the target context.
Call this after every successful target.decode(batch) so upstream can
harvest the target hidden states EAGLE-3 drafts from.
§Errors
Returns Eagle3SessionError::Process if the underlying call fails.
Sourcepub fn draft(
&mut self,
seq_id: i32,
n_past: i32,
id_last: LlamaToken,
) -> Result<Vec<LlamaToken>, Eagle3SessionError>
pub fn draft( &mut self, seq_id: i32, n_past: i32, id_last: LlamaToken, ) -> Result<Vec<LlamaToken>, Eagle3SessionError>
Generate up to n_draft_max speculative tokens.
n_past is the number of tokens already in the target KV cache for
seq_id. id_last is the last token accepted on the target (usually
the token you just sampled).
§Errors
Returns Eagle3SessionError::BadSeqId if seq_id is out of range.
Sourcepub fn accept(
&mut self,
seq_id: i32,
n_accepted: u16,
) -> Result<(), Eagle3SessionError>
pub fn accept( &mut self, seq_id: i32, n_accepted: u16, ) -> Result<(), Eagle3SessionError>
Inform the session how many draft tokens the target verifier accepted.
Pass 0 when every draft was rejected.
§Errors
Returns Eagle3SessionError::BadSeqId if seq_id is out of range.