Trait finalfusion::similarity::EmbeddingSimilarity

source ·

pub trait EmbeddingSimilarity {
    // Required method
    fn embedding_similarity_masked(
        &self,
        query: ArrayView1<'_, f32>,
        limit: usize,
        skips: &HashSet<&str>,
        batch_size: Option<usize>
    ) -> Option<Vec<WordSimilarityResult<'_>>>;

    // Provided method
    fn embedding_similarity(
        &self,
        query: ArrayView1<'_, f32>,
        limit: usize,
        batch_size: Option<usize>
    ) -> Option<Vec<WordSimilarityResult<'_>>> { ... }
}

Expand description

Trait for embedding similarity queries.

Required Methods§

source

fn embedding_similarity_masked( &self, query: ArrayView1<'_, f32>, limit: usize, skips: &HashSet<&str>, batch_size: Option<usize> ) -> Option<Vec<WordSimilarityResult<'_>>>

Find words that are similar to the query embedding while skipping certain words.

The similarity between the query embedding and other embeddings is defined by the dot product of the embeddings. The embeddings in the storage are l2-normalized, this method l2-normalizes the input query, therefore the dot product is equivalent to the cosine similarity.

If batch_size is None, the query will be performed on all word embeddings at once. This is typically the most efficient, but can require a large amount of memory. The query is performed on batches of size n when batch_size is Some(n). Setting this to a smaller value than the number of word embeddings reduces memory use at the cost of computational efficiency.

Provided Methods§

source

fn embedding_similarity( &self, query: ArrayView1<'_, f32>, limit: usize, batch_size: Option<usize> ) -> Option<Vec<WordSimilarityResult<'_>>>

Find words that are similar to the query embedding.