Crate embed_anything

Expand description

embed_anything is a minimalist, highly performant, lightning-fast, lightweight, multisource, multimodal, and local embedding pipeline.

Whether you’re working with text, images, audio, PDFs, websites, or other media, embed_anything streamlines the process of generating embeddings from various sources and seamlessly streaming (memory-efficient-indexing) them to a vector database.

It supports dense, sparse, ONNX and late-interaction embeddings, offering flexibility for a wide range of use cases.

§Usage

§Creating an Embedder

To get started, you’ll need to create an Embedder for the type of content you want to embed. We offer some utility functions to streamline creating embedders from various sources, such as Embedder::from_pretrained_hf, Embedder::from_pretrained_onnx, and Embedder::from_pretrained_cloud. You can use any of these to quickly create an Embedder like so:

use embed_anything::embeddings::embed::Embedder;

// Create a local CLIP embedder from a Hugging Face model
let clip_embedder = Embedder::from_pretrained_hf("CLIP", "jina-clip-v2", None);

// Create a cloud OpenAI embedder
let openai_embedder = Embedder::from_pretrained_cloud("OpenAI", "gpt-3.5-turbo", Some("my-api-key".to_string()));

If needed, you can also create an instance of Embedder manually, allowing you to create your own embedder! Here’s an example of manually creating embedders:

use embed_anything::embeddings::embed::{Embedder, TextEmbedder};
use embed_anything::embeddings::local::jina::JinaEmbedder;

let jina_embedder = Embedder::Text(TextEmbedder::Jina(Box::new(JinaEmbedder::default())));

§Generate embeddings

§Example: Embed a text file

Let’s see how embed_anything can help us generate embeddings from a plain text file:

use embed_anything::{embed_file, embeddings::embed::EmbedderBuilder};
use embed_anything::config::TextEmbedConfig;

// Create an embedder using a pre-trained model from Hugging Face
let embedder = EmbedderBuilder::new()
    .model_architecture("jina")
    .model_id(Some("jinaai/jina-embeddings-v2-small-en"))
    .from_pretrained_hf()?;
let config = TextEmbedConfig::default();

// Generate embeddings for any supported file type
let embeddings = embed_file("document.pdf", &embedder, Some(&config), None).await?;

Modules§

chunkers: Text chunking algorithm implementations.
config: Configuration structs and enums for embedding operations.
embeddings: Embedding model implementations and utilities.
file_loader: File discovery and parsing utilities.
file_processor: Audio file processing utilities.
models: Neural network model implementations for embedding generation.
s3_loader: S3 file loading utilities.
text_loader: Text loading and chunking utilities.

Enums§

Dtype: Numerical precision types for model weights and computations.
FileLoadingError: Errors that can occur during file loading and processing.

Functions§

emb_audio
embed_directory_stream: Embeds text from files in a directory using the specified embedding model.
embed_file: Embeds the text from a file using the specified embedding model.
embed_files_batch: Embeds a list of files.
embed_image_directory: Embeds images in a directory using the specified embedding model.
embed_query: Embeds a list of queries using the specified embedding model.
embed_webpage: Embeddings of a webpage using the specified embedding model.
process_chunks: Processes text chunks into embeddings with metadata preservation.