pub enum DecodeOutput {
Fused(Tensor),
Dequantized(DequantResult),
}Expand description
Result of a decode step.
The cache implementation decides internally whether to compute attention via a fused kernel or return decompressed data for the caller’s SDPA.
Variants§
Fused(Tensor)
The implementation computed attention internally (e.g. fused CUDA kernel).
Shape: [batch, num_attention_heads, 1, head_dim]
Dequantized(DequantResult)
The implementation decompressed the cache — caller runs SDPA. Used on CPU, Metal, or when fused attention is not available.
Auto Trait Implementations§
impl Freeze for DecodeOutput
impl !RefUnwindSafe for DecodeOutput
impl Send for DecodeOutput
impl Sync for DecodeOutput
impl Unpin for DecodeOutput
impl UnsafeUnpin for DecodeOutput
impl !UnwindSafe for DecodeOutput
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more