Crate embed_anything

Source
Expand description

embed_anything is a minimalist, highly performant, lightning-fast, lightweight, multisource, multimodal, and local embedding pipeline.

Whether you’re working with text, images, audio, PDFs, websites, or other media, embed_anything streamlines the process of generating embeddings from various sources and seamlessly streaming (memory-efficient-indexing) them to a vector database.

It supports dense, sparse, ONNX and late-interaction embeddings, offering flexibility for a wide range of use cases.

§Usage

§Creating an Embedder

To get started, you’ll need to create an Embedder for the type of content you want to embed. We offer some utility functions to streamline creating embedders from various sources, such as Embedder::from_pretrained_hf, Embedder::from_pretrained_onnx, and Embedder::from_pretrained_cloud. You can use any of these to quickly create an Embedder like so:

use embed_anything::embeddings::embed::Embedder;

// Create a local CLIP embedder from a Hugging Face model
let clip_embedder = Embedder::from_pretrained_hf("CLIP", "jina-clip-v2", None);

// Create a cloud OpenAI embedder
let openai_embedder = Embedder::from_pretrained_cloud("OpenAI", "gpt-3.5-turbo", Some("my-api-key".to_string()));

If needed, you can also create an instance of Embedder manually, allowing you to create your own embedder! Here’s an example of manually creating embedders:

use embed_anything::embeddings::embed::{Embedder, TextEmbedder};
use embed_anything::embeddings::local::jina::JinaEmbedder;

let jina_embedder = Embedder::Text(TextEmbedder::Jina(Box::new(JinaEmbedder::default())));

§Generate embeddings

§Example: Embed a text file

Let’s see how embed_anything can help us generate embeddings from a plain text file:

use embed_anything::embed_file;
use embed_anything::embeddings::embed::{Embedder, TextEmbedder};
use embed_anything::embeddings::local::jina::JinaEmbedder;

// Create an Embedder for text. We support a variety of models out-of-the-box, including cloud-based models!
let embedder = Embedder::Text(TextEmbedder::Jina(Box::new(JinaEmbedder::default())));
// Generate embeddings for 'path/to/file.txt' using the embedder we just created.
let embedding = embed_file("path/to/file.txt", &embedder, None, None);

Modules§

chunkers
config
embeddings
This module contains the different embedding models that can be used to generate embeddings for the text data.
file_loader
file_processor
models
tesseract
text_loader

Enums§

Dtype

Functions§

emb_audio
embed_directory_stream
Embeds text from files in a directory using the specified embedding model.
embed_file
Embeds the text from a file using the specified embedding model.
embed_files_batch
Embeds a list of files.
embed_html
Embeds an HTML document using the specified embedding model.
embed_image_directory
Embeds images in a directory using the specified embedding model.
embed_query
Embeds a list of queries using the specified embedding model.
embed_webpage
Embeddings of a webpage using the specified embedding model.
process_chunks