Expand description
embed_anything is a minimalist, highly performant, lightning-fast, lightweight, multisource, multimodal, and local embedding pipeline.
Whether you’re working with text, images, audio, PDFs, websites, or other media, embed_anything streamlines the process of generating embeddings from various sources and seamlessly streaming (memory-efficient-indexing) them to a vector database.
It supports dense, sparse, ONNX and late-interaction embeddings, offering flexibility for a wide range of use cases.
§Usage
§Creating an Embedder
To get started, you’ll need to create an Embedder for the type of content you want to embed. We offer some utility functions to streamline creating embedders from various sources, such as Embedder::from_pretrained_hf, Embedder::from_pretrained_onnx, and Embedder::from_pretrained_cloud. You can use any of these to quickly create an Embedder like so:
use embed_anything::embeddings::embed::Embedder;
// Create a local CLIP embedder from a Hugging Face model
let clip_embedder = Embedder::from_pretrained_hf("CLIP", "jina-clip-v2", None);
// Create a cloud OpenAI embedder
let openai_embedder = Embedder::from_pretrained_cloud("OpenAI", "gpt-3.5-turbo", Some("my-api-key".to_string()));If needed, you can also create an instance of Embedder manually, allowing you to create your own embedder! Here’s an example of manually creating embedders:
use embed_anything::embeddings::embed::{Embedder, TextEmbedder};
use embed_anything::embeddings::local::jina::JinaEmbedder;
let jina_embedder = Embedder::Text(TextEmbedder::Jina(Box::new(JinaEmbedder::default())));§Generate embeddings
§Example: Embed a text file
Let’s see how embed_anything can help us generate embeddings from a plain text file:
use embed_anything::{embed_file, embeddings::embed::EmbedderBuilder};
use embed_anything::config::TextEmbedConfig;
// Create an embedder using a pre-trained model from Hugging Face
let embedder = EmbedderBuilder::new()
.model_architecture("jina")
.model_id(Some("jinaai/jina-embeddings-v2-small-en"))
.from_pretrained_hf()?;
let config = TextEmbedConfig::default();
// Generate embeddings for any supported file type
let embeddings = embed_file("document.pdf", &embedder, Some(&config), None).await?;Modules§
- chunkers
- Text chunking algorithm implementations.
- config
- Configuration structs and enums for embedding operations.
- embeddings
- Embedding model implementations and utilities.
- file_
loader - File discovery and parsing utilities.
- file_
processor - Audio file processing utilities.
- models
- Neural network model implementations for embedding generation.
- text_
loader - Text loading and chunking utilities.
Enums§
- Dtype
- Numerical precision types for model weights and computations.
- File
Loading Error - Errors that can occur during file loading and processing.
Functions§
- emb_
audio - embed_
directory_ stream - Embeds text from files in a directory using the specified embedding model.
- embed_
file - Embeds the text from a file using the specified embedding model.
- embed_
files_ batch - Embeds a list of files.
- embed_
image_ directory - Embeds images in a directory using the specified embedding model.
- embed_
query - Embeds a list of queries using the specified embedding model.
- embed_
webpage - Embeddings of a webpage using the specified embedding model.
- process_
chunks - Processes text chunks into embeddings with metadata preservation.