Module embedding

Source
Expand description

Text embeddings.

§Embedding Module

This module provides types and traits for working with text embeddings.

§What are Embeddings?

Embeddings are dense vector representations of text that capture semantic meaning. They transform human-readable text into numerical vectors that machine learning models can process effectively. Similar texts produce similar embedding vectors, making them useful for:

  • Semantic search: Finding relevant documents based on meaning rather than exact keywords
  • Text similarity: Measuring how similar two pieces of text are
  • Classification: Categorizing text based on content
  • Clustering: Grouping similar texts together
  • Recommendation systems: Finding related content

§Embedding Models

An embedding model is a neural network that has been trained to convert text into meaningful vector representations. Different models have different characteristics:

  • Dimension: The length of the embedding vector (e.g., 768, 1536)
  • Domain: Some models are optimized for specific types of content
  • Performance: Trade-offs between speed, accuracy, and resource usage

Popular embedding models include:

  • OpenAI’s text-embedding-ada-002 (1536 dimensions)
  • Sentence Transformers like all-MiniLM-L6-v2 (384 dimensions)
  • Cohere’s embedding models

§Usage

This module provides the EmbeddingModel trait that abstracts over different embedding implementations, allowing you to switch between providers while maintaining the same interface.

use ai_types::EmbeddingModel;

async fn example<T: EmbeddingModel>(model: &T) -> ai_types::Result<()> {
    // Get the embedding dimension
    let dim = model.dim();
    println!("Model produces {}-dimensional embeddings", dim);

    // Convert text to embedding
    let embedding = model.embed("Hello, world!").await?;
    assert_eq!(embedding.len(), dim);

    Ok(())
}

Traits§

EmbeddingModel
Converts text to vector representations.

Type Aliases§

Embedding
A type alias for an embedding vector of 32-bit floats.