Skip to main content

DecodeBackend

Trait DecodeBackend 

Source
pub trait DecodeBackend: Send + Sync {
    // Required methods
    fn decode_step(
        &mut self,
        token_id: u32,
        position: usize,
        cache_key: &str,
    ) -> Result<TensorRef>;
    fn init_kv_cache(
        &mut self,
        cache_key: &str,
        kv_data: Vec<(TensorRef, TensorRef)>,
        prefill_len: usize,
    ) -> Result<()>;
    fn has_kv_cache(&self, cache_key: &str) -> bool;
    fn release_kv_cache(&mut self, cache_key: &str);
    fn name(&self) -> &str;
}
Expand description

Decode-phase execution backend.

Implements the actual computation for single-token decode steps. Different backends optimize for different hardware:

  • CudaDecodeBackend: cuBLAS + custom CUDA kernels, pre-allocated buffers
  • MetalDecodeBackend: Metal compute shaders
  • CandleDecodeBackend: candle tensor ops (CPU/fallback)

The backend is initialized with model weights and manages its own internal state (KV cache, buffers, cuBLAS handles, etc.).

Required Methods§

Source

fn decode_step( &mut self, token_id: u32, position: usize, cache_key: &str, ) -> Result<TensorRef>

Execute a single decode step: one token in, logits out.

  • token_id: the input token
  • position: sequence position (for RoPE)
  • cache_key: identifies the sequence’s KV cache

Returns logits as a TensorRef [1, 1, vocab_size].

Source

fn init_kv_cache( &mut self, cache_key: &str, kv_data: Vec<(TensorRef, TensorRef)>, prefill_len: usize, ) -> Result<()>

Initialize KV cache for a new sequence from prefill data.

Called after prefill (which runs through the model’s forward pass) to hand off the KV cache to the decode backend.

kv_data: per-layer (K, V) tensor pairs from the prefill pass. prefill_len: number of tokens in the prefill.

Source

fn has_kv_cache(&self, cache_key: &str) -> bool

Check if KV cache exists for a sequence.

Source

fn release_kv_cache(&mut self, cache_key: &str)

Release KV cache for a completed sequence.

Source

fn name(&self) -> &str

Human-readable backend name (for logging).

Implementors§