pub struct SpeculativeDecoder<D: DraftModel, T: TargetModel> { /* private fields */ }Expand description
Speculative decoder composing a draft model and a target model.
Trait bounds are intentionally deferred to the impl blocks rather than
baked into the struct definition so callers can hold a
SpeculativeDecoder whose inner models do not implement Debug.
Implementations§
Source§impl<D: DraftModel, T: TargetModel> SpeculativeDecoder<D, T>
impl<D: DraftModel, T: TargetModel> SpeculativeDecoder<D, T>
Sourcepub fn new(
draft: D,
target: T,
config: SpeculativeDecoderConfig,
) -> SpeculativeDecodingResult<Self>
pub fn new( draft: D, target: T, config: SpeculativeDecoderConfig, ) -> SpeculativeDecodingResult<Self>
Build a decoder. Returns an error if the draft and target disagree on vocabulary size, or if the config is invalid.
Sourcepub fn metrics(&self) -> &SpeculativeMetrics
pub fn metrics(&self) -> &SpeculativeMetrics
Read-only access to the current metrics snapshot.
Sourcepub fn reset_metrics(&mut self)
pub fn reset_metrics(&mut self)
Reset the metrics counters.
Sourcepub fn config(&self) -> &SpeculativeDecoderConfig
pub fn config(&self) -> &SpeculativeDecoderConfig
Read-only access to the configuration.
Sourcepub fn generate(
&mut self,
prefix: &[TokenId],
max_tokens: usize,
) -> SpeculativeDecodingResult<Vec<TokenId>>
pub fn generate( &mut self, prefix: &[TokenId], max_tokens: usize, ) -> SpeculativeDecodingResult<Vec<TokenId>>
Run speculative decoding starting from prefix and producing at most
max_tokens new tokens. The returned vector contains only the
generated continuation, not the original prefix.
Uses an internally-seeded deterministic StdRng (seed 42).
See Self::generate_with_rng for caller-controlled seeding.
Sourcepub fn generate_with_rng(
&mut self,
prefix: &[TokenId],
max_tokens: usize,
rng: &mut dyn SpecRng,
) -> SpeculativeDecodingResult<Vec<TokenId>>
pub fn generate_with_rng( &mut self, prefix: &[TokenId], max_tokens: usize, rng: &mut dyn SpecRng, ) -> SpeculativeDecodingResult<Vec<TokenId>>
Run speculative decoding with a caller-supplied RNG.
Trait Implementations§
Source§impl<D: DraftModel + Debug, T: TargetModel + Debug> Debug for SpeculativeDecoder<D, T>
impl<D: DraftModel + Debug, T: TargetModel + Debug> Debug for SpeculativeDecoder<D, T>
Auto Trait Implementations§
impl<D, T> Freeze for SpeculativeDecoder<D, T>
impl<D, T> RefUnwindSafe for SpeculativeDecoder<D, T>where
D: RefUnwindSafe,
T: RefUnwindSafe,
impl<D, T> Send for SpeculativeDecoder<D, T>
impl<D, T> Sync for SpeculativeDecoder<D, T>
impl<D, T> Unpin for SpeculativeDecoder<D, T>
impl<D, T> UnsafeUnpin for SpeculativeDecoder<D, T>where
D: UnsafeUnpin,
T: UnsafeUnpin,
impl<D, T> UnwindSafe for SpeculativeDecoder<D, T>where
D: UnwindSafe,
T: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more