Skip to main content

EmbedBackend

Trait EmbedBackend 

Source
pub trait EmbedBackend: Send + Sync {
    // Required methods
    fn embed_batch(&self, encodings: &[Encoding]) -> Result<Vec<Vec<f32>>>;
    fn supports_clone(&self) -> bool;
    fn clone_backend(&self) -> Box<dyn EmbedBackend>;
    fn is_gpu(&self) -> bool;

    // Provided method
    fn max_tokens(&self) -> usize { ... }
}
Expand description

Trait for embedding backends.

Implementations must be Send so they can be moved across threads (e.g. into a ring-buffer pipeline). The trait is object-safe — callers use &dyn EmbedBackend or Box<dyn EmbedBackend>.

§GPU vs CPU scheduling

  • CPU backends (is_gpu() == false): cloned per rayon thread via clone_backend.
  • GPU backends (is_gpu() == true): use a ring-buffer pipeline with RING_SIZE = 4 for bounded memory.

Required Methods§

Source

fn embed_batch(&self, encodings: &[Encoding]) -> Result<Vec<Vec<f32>>>

Embed a batch of pre-tokenized inputs, returning L2-normalized vectors.

Each inner Vec<f32> is the embedding for the corresponding Encoding. Errors must propagate — never silently return defaults.

§Errors

Returns an error if tensor construction or the forward pass fails.

Source

fn supports_clone(&self) -> bool

Whether this backend supports cheap cloning for per-thread instances.

CPU backends return true; GPU backends typically return false.

Source

fn clone_backend(&self) -> Box<dyn EmbedBackend>

Create a cheap clone of this backend for per-thread use in rayon.

§Panics

May panic if supports_clone returns false. Callers must check supports_clone() first.

Source

fn is_gpu(&self) -> bool

Whether this backend runs on a GPU.

GPU backends use a ring-buffer pipelined scheduler (RING_SIZE = 4) for bounded memory usage.

Provided Methods§

Source

fn max_tokens(&self) -> usize

Maximum token count this model supports (position embedding limit).

ClassicBert: 512. ModernBERT: up to model config. Tokens beyond this are truncated during tokenization.

Implementors§

Source§

impl<D, A> EmbedBackend for GenericBackend<D, A>
where D: Driver + Send + Sync + 'static, A: ModelArch<D> + Send + Sync + 'static,