xz-embed 0.1.0

Text embedding and vector storage abstraction layer
Documentation

xz-embed

Text embedding and vector storage abstraction — generate & search embeddings with pluggable backends

Features

  • Pluggable embedding backends — OpenAI (openai feature) and mock (for testing)
  • EmbeddingModel trait for custom model integration
  • Vector storage — in-memory (InMemoryVectorStore) and SQLite (SqliteVecStore, sqlite-vec feature)
  • Metadata filtering with compound expressions (Eq, Ne, In, And, Or, Not, etc.)
  • Concurrent batch embedding with retry and backoff (ConcurrentBatchManager)
  • Reciprocal Rank Fusion (RRF) for hybrid vector + keyword search
  • Vector quantization — scalar and product quantizers
  • Batch embedding request/response types
  • Index builder with configurable rebuild triggers

Quick Start

use xz_embed::{MockEmbedder, EmbeddingModel};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut mock = MockEmbedder::new(1536, 2048);
    mock.set_output(vec![vec![0.1, 0.2, 0.3, 0.4]]);

    let vectors = mock.embed(&["Hello world"]).await?;
    println!("Dimensions: {}", vectors[0].len());

    // Or use the default embed_single helper
    let single = mock.embed_single("Hello world").await?;
    println!("Single: {}", single.len());

    Ok(())
}

With OpenAI (requires openai feature)

use xz_embed::OpenAiEmbedder;

// From environment variable OPENAI_API_KEY
let embedder = OpenAiEmbedder::from_env()?;
let vectors = embedder.embed(&["Hello world"]).await?;

Vector storage & search

use std::collections::HashMap;
use xz_embed::{InMemoryVectorStore, MockEmbedder, VectorEntry, VectorStore, StoreLifecycle};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let store = InMemoryVectorStore::new(4);
    store.initialize().await?;

    let mut mock = MockEmbedder::new(4, 32);
    mock.set_output(vec![vec![0.1, 0.2, 0.3, 0.4]]);

    let vectors = mock.embed(&["Rust programming"]).await?;

    let entry = VectorEntry {
        id: "doc-1".into(),
        vector: vectors[0].clone(),
        metadata: HashMap::from([("lang".into(), "rust".into())]),
        content: Some("Rust programming".into()),
        created_at: 0,
        expires_at: None,
        channel: Some("docs".into()),
    };
    store.insert(entry).await?;

    let query = mock.embed(&["systems programming"]).await?;
    let results = store.search(&query[0], 10).await?;

    for r in &results {
        println!("[{:.4}] {}{:?}", r.score, r.id, r.content);
    }

    Ok(())
}

Metadata-filtered search

use xz_embed::MetadataFilter;

let filter = MetadataFilter::and([
    MetadataFilter::in_values("lang", &["rust", "go"]),
    MetadataFilter::ne("status", "archived"),
]);

let results = store.search_with_filter(&query, &filter, 10).await?;

Hybrid search with RRF fusion

use xz_embed::rrf_fusion;

// fuse vector results and keyword (BM25) results
let fused = rrf_fusion(&vector_results, &keyword_results, 60.0);

Feature Flags

Feature Description Default
openai OpenAI embedding API via reqwest off
sqlite-vec SQLite-backed vector store on

License

MIT OR Apache-2.0