Crate embed_anything

Crate embed_anything 

Source
Expand description

embed_anything is a minimalist, highly performant, lightning-fast, lightweight, multisource, multimodal, and local embedding pipeline.

Whether you’re working with text, images, audio, PDFs, websites, or other media, embed_anything streamlines the process of generating embeddings from various sources and seamlessly streaming (memory-efficient-indexing) them to a vector database.

It supports dense, sparse, ONNX and late-interaction embeddings, offering flexibility for a wide range of use cases.

§Usage

§Creating an Embedder

To get started, you’ll need to create an Embedder for the type of content you want to embed. We offer some utility functions to streamline creating embedders from various sources, such as Embedder::from_pretrained_hf, Embedder::from_pretrained_onnx, and Embedder::from_pretrained_cloud. You can use any of these to quickly create an Embedder like so:

use embed_anything::embeddings::embed::Embedder;

// Create a local CLIP embedder from a Hugging Face model
let clip_embedder = Embedder::from_pretrained_hf("CLIP", "jina-clip-v2", None);

// Create a cloud OpenAI embedder
let openai_embedder = Embedder::from_pretrained_cloud("OpenAI", "gpt-3.5-turbo", Some("my-api-key".to_string()));

If needed, you can also create an instance of Embedder manually, allowing you to create your own embedder! Here’s an example of manually creating embedders:

use embed_anything::embeddings::embed::{Embedder, TextEmbedder};
use embed_anything::embeddings::local::jina::JinaEmbedder;

let jina_embedder = Embedder::Text(TextEmbedder::Jina(Box::new(JinaEmbedder::default())));

§Generate embeddings

§Example: Embed a text file

Let’s see how embed_anything can help us generate embeddings from a plain text file:

use embed_anything::{embed_file, embeddings::embed::EmbedderBuilder};
use embed_anything::config::TextEmbedConfig;

// Create an embedder using a pre-trained model from Hugging Face
let embedder = EmbedderBuilder::new()
    .model_architecture("jina")
    .model_id(Some("jinaai/jina-embeddings-v2-small-en"))
    .from_pretrained_hf()?;
let config = TextEmbedConfig::default();

// Generate embeddings for any supported file type
let embeddings = embed_file("document.pdf", &embedder, Some(&config), None).await?;

Modules§

chunkers
Text chunking algorithm implementations.
config
Configuration structs and enums for embedding operations.
embeddings
Embedding model implementations and utilities.
file_loader
File discovery and parsing utilities.
file_processor
Audio file processing utilities.
models
Neural network model implementations for embedding generation.
text_loader
Text loading and chunking utilities.

Enums§

Dtype
Numerical precision types for model weights and computations.
FileLoadingError
Errors that can occur during file loading and processing.

Functions§

emb_audio
embed_directory_stream
Embeds text from files in a directory using the specified embedding model.
embed_file
Embeds the text from a file using the specified embedding model.
embed_files_batch
Embeds a list of files.
embed_image_directory
Embeds images in a directory using the specified embedding model.
embed_query
Embeds a list of queries using the specified embedding model.
embed_webpage
Embeddings of a webpage using the specified embedding model.
process_chunks
Processes text chunks into embeddings with metadata preservation.