Expand description
embed_anything is a minimalist, highly performant, lightning-fast, lightweight, multisource, multimodal, and local embedding pipeline.
Whether you’re working with text, images, audio, PDFs, websites, or other media, embed_anything streamlines the process of generating embeddings from various sources and seamlessly streaming (memory-efficient-indexing) them to a vector database.
It supports dense, sparse, ONNX and late-interaction embeddings, offering flexibility for a wide range of use cases.
§Usage
§Creating an Embedder
To get started, you’ll need to create an Embedder for the type of content you want to embed. We offer some utility functions to streamline creating embedders from various sources, such as Embedder::from_pretrained_hf, Embedder::from_pretrained_onnx, and Embedder::from_pretrained_cloud. You can use any of these to quickly create an Embedder like so:
use embed_anything::embeddings::embed::Embedder;
// Create a local CLIP embedder from a Hugging Face model
let clip_embedder = Embedder::from_pretrained_hf("CLIP", "jina-clip-v2", None);
// Create a cloud OpenAI embedder
let openai_embedder = Embedder::from_pretrained_cloud("OpenAI", "gpt-3.5-turbo", Some("my-api-key".to_string()));
If needed, you can also create an instance of Embedder manually, allowing you to create your own embedder! Here’s an example of manually creating embedders:
use embed_anything::embeddings::embed::{Embedder, TextEmbedder};
use embed_anything::embeddings::local::jina::JinaEmbedder;
let jina_embedder = Embedder::Text(TextEmbedder::Jina(Box::new(JinaEmbedder::default())));
§Generate embeddings
§Example: Embed a text file
Let’s see how embed_anything can help us generate embeddings from a plain text file:
use embed_anything::embed_file;
use embed_anything::embeddings::embed::{Embedder, TextEmbedder};
use embed_anything::embeddings::local::jina::JinaEmbedder;
// Create an Embedder for text. We support a variety of models out-of-the-box, including cloud-based models!
let embedder = Embedder::Text(TextEmbedder::Jina(Box::new(JinaEmbedder::default())));
// Generate embeddings for 'path/to/file.txt' using the embedder we just created.
let embedding = embed_file("path/to/file.txt", &embedder, None, None);
Modules§
- chunkers
- config
- embeddings
- This module contains the different embedding models that can be used to generate embeddings for the text data.
- file_
loader - file_
processor - models
- tesseract
- text_
loader
Enums§
Functions§
- emb_
audio - embed_
directory_ stream - Embeds text from files in a directory using the specified embedding model.
- embed_
file - Embeds the text from a file using the specified embedding model.
- embed_
files_ batch - Embeds a list of files.
- embed_
html - Embeds an HTML document using the specified embedding model.
- embed_
image_ directory - Embeds images in a directory using the specified embedding model.
- embed_
query - Embeds a list of queries using the specified embedding model.
- embed_
webpage - Embeddings of a webpage using the specified embedding model.
- process_
chunks