Skip to main content

MultiVectorEmbedder

Trait MultiVectorEmbedder 

Source
pub trait MultiVectorEmbedder: Send + Sync {
    // Required methods
    fn embed_tokens(&self, text: &str) -> Result<MultiVectorEmbedding>;
    fn token_dimension(&self) -> usize;
    fn max_tokens(&self) -> usize;
    fn model_id(&self) -> &str;

    // Provided method
    fn embed_tokens_batch(
        &self,
        texts: &[&str],
    ) -> Result<Vec<MultiVectorEmbedding>> { ... }
}
Expand description

Trait for models that produce token-level embeddings.

Unlike single-vector embedders (which produce one embedding per text), multi-vector embedders produce one embedding per token, enabling fine-grained late interaction scoring.

§Example

use aprender_rag::multivector::{MultiVectorEmbedder, MockMultiVectorEmbedder};

let embedder = MockMultiVectorEmbedder::new(128, 512);
let embedding = embedder.embed_tokens("hello world").unwrap();

assert_eq!(embedding.num_tokens(), 2);
assert_eq!(embedding.dim(), 128);

Required Methods§

Source

fn embed_tokens(&self, text: &str) -> Result<MultiVectorEmbedding>

Embed text into token-level vectors.

§Arguments
  • text - Input text to embed
§Returns

A MultiVectorEmbedding containing one vector per token.

Source

fn token_dimension(&self) -> usize

Get the token embedding dimension.

Source

fn max_tokens(&self) -> usize

Get the maximum tokens per document.

Source

fn model_id(&self) -> &str

Get the model identifier.

Provided Methods§

Source

fn embed_tokens_batch( &self, texts: &[&str], ) -> Result<Vec<MultiVectorEmbedding>>

Batch embed multiple texts.

The default implementation calls embed_tokens sequentially. Implementations may override for more efficient batching.

Dyn Compatibility§

This trait is dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementations on Foreign Types§

Source§

impl<E: MultiVectorEmbedder + ?Sized> MultiVectorEmbedder for Box<E>

Trait implementation for boxed embedders.

Implementors§