pub struct KvCache<B: Backend> {
pub step: usize,
/* private fields */
}Expand description
Precomputed and growing KV cache for O(n) per-step decoder inference.
cross_kv holds the encoder cross-attention K,V projections for every decoder
layer. These are constant for the entire decoding of one audio chunk and are
computed once by KvCache::new before the decode loop.
self_kv holds the growing decoder self-attention K,V cache; one new row is
appended per forward_decoder_cached call.
Fields§
§step: usizeNumber of tokens decoded so far (indexes into positional embedding).
Implementations§
Source§impl<B: Backend> KvCache<B>
impl<B: Backend> KvCache<B>
Sourcepub fn new(model: &Whisper<B>, encoder_output: Tensor<B, 3>) -> Self
pub fn new(model: &Whisper<B>, encoder_output: Tensor<B, 3>) -> Self
Build a cache pre-populated with cross-attention K,V from encoder_output.
The encoder K,V projections (n_layers × 2 linear ops) are computed once
here and reused at every subsequent decode step instead of being recomputed
each time forward_decoder is called from scratch.
§Performance
Call this once per audio chunk. The cost is n_decoder_layers × 2
matrix multiplications of shape [n_encoder_frames × n_text_state].
Auto Trait Implementations§
impl<B> Freeze for KvCache<B>
impl<B> RefUnwindSafe for KvCache<B>where
<B as BackendTypes>::FloatTensorPrimitive: RefUnwindSafe,
<B as BackendTypes>::QuantizedTensorPrimitive: RefUnwindSafe,
impl<B> Send for KvCache<B>
impl<B> Sync for KvCache<B>
impl<B> Unpin for KvCache<B>where
<B as BackendTypes>::FloatTensorPrimitive: Unpin,
<B as BackendTypes>::QuantizedTensorPrimitive: Unpin,
impl<B> UnsafeUnpin for KvCache<B>
impl<B> UnwindSafe for KvCache<B>where
<B as BackendTypes>::FloatTensorPrimitive: UnwindSafe,
<B as BackendTypes>::QuantizedTensorPrimitive: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<S, T> Duplex<S> for Twhere
T: FromSample<S> + ToSample<S>,
Source§impl<S> FromSample<S> for S
impl<S> FromSample<S> for S
fn from_sample_(s: S) -> S
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more