Expand description
Safe wrapper around the C++ EAGLE-3 draft session.
Eagle3Session drives EAGLE-3 speculative decoding
(COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3 in upstream llama.cpp). EAGLE-3
pairs a target model with a small, separately-trained EAGLE-3 draft
model that predicts the next tokens from hidden states extracted out of
the target model.
The draft algorithm lives in upstream’s common/speculative.cpp
(common_speculative_impl_draft_eagle3). This module wraps it through the
same stable C shim used for MTP (llama-cpp-sys-4/mtp_shim/); the two
techniques share an identical session lifecycle and differ only in how the
draft context is built.
§EAGLE-3 vs MTP
EAGLE-3 (Eagle3Session) | MTP (crate::mtp::MtpSession) | |
|---|---|---|
| Draft weights | a separate EAGLE-3 draft model | the same model as the target |
| Draft context type | LlamaContextType::Default | LlamaContextType::Mtp |
| Requirement | draft model must expose 3 target-extract layers | target model must have MTP heads |
§Setup
use llama_cpp_4::context::params::LlamaContextParams;
use llama_cpp_4::eagle::{Eagle3Session, Eagle3SessionConfig};
let n_draft_max = 3;
// Target: the main model, a normal (default) context.
let target = main_model.new_context(&backend, LlamaContextParams::default())?;
// Draft: a SEPARATE EAGLE-3 draft model, also a default context.
let draft = eagle3_model.new_context(&backend, LlamaContextParams::default())?;
let config = Eagle3SessionConfig::new(1, n_draft_max);
let mut session = Eagle3Session::new_with_config(&target, &draft, config)?;§Speculative loop
Identical in shape to MTP: after each decode on the target context call
process, then draft
to get candidate tokens, verify them on the target, and report how many
were accepted with accept.
target.decode(&mut batch)?;
session.process(&batch)?;
let drafts = session.draft(0, n_past, last_token)?;
// verify `drafts` against the target, count acceptances ...
session.accept(0, n_accepted)?;§Hidden-state extraction
EAGLE-3 needs the target model to expose internal hidden states. The
session configures the required extraction on both contexts at construction
time; need_embd and
need_embd_pre_norm report which kind
the active backend requested (rarely needed by callers).
Structs§
- Eagle3
Session - Owned EAGLE-3 draft session.
- Eagle3
Session Config - Parameters for
Eagle3Session::new_with_config.
Enums§
- Eagle3
Session Error - Errors raised by the EAGLE-3 draft session.