embedrs 0.2.0

Unified embedding — cloud APIs (OpenAI, Cohere, Gemini, Voyage, Jina) + local inference, one interface
Documentation

embedrs

Crates.io docs.rs License

English | 简体中文 | 日本語

Unified embedding for Rust -- cloud APIs + local inference, one interface, opinionated defaults.

Design philosophy

"好用就好用" -- just works. Pick one best default, backed by data.

  • embedrs::local()? -- all-MiniLM-L6-v2 (23MB, free, no API key)
  • embedrs::cloud(key) -- OpenAI text-embedding-3-small (best discrimination, cheapest cloud)
  • Both produce the same EmbedResult -- write code once, switch backends in one line

Defaults chosen by 8-dimension benchmark across 8 models. See benchrs for full methodology.

Quick Start

// cloud -- one key, done
let client = embedrs::cloud("sk-...");
let result = client.embed(vec!["hello world".into()]).await?;
println!("dimensions: {}", result.embeddings[0].len());
// local -- zero config, free, 23MB model downloaded on first use
let client = embedrs::local()?;
let result = client.embed(vec!["hello world".into()]).await?;

Installation

[dependencies]
embedrs = "0.2"

# enable local inference (adds ~23MB model download on first use)
embedrs = { version = "0.1", features = ["local"] }

Benchmark Results

8 dimensions, 184 unique texts. Full methodology and reproduction instructions in benchrs.

Metric MiniLM-L6 MiniLM-L12 BGE-small GTE-small OpenAI Gemini Cohere Voyage
Size 23MB 133MB 133MB 67MB cloud cloud cloud cloud
Spearman ρ 0.81 0.84 0.71 0.75 0.91 0.94 0.91 0.89
Discrimination 0.52 0.52 0.29 0.14 0.58 0.30 0.46 0.45
Retrieval 100% 100% 89% 100% 100% 89% 100% 89%
EN ρ 0.92 0.94 0.92 0.90 0.91 0.91 0.89 0.88
ZH ρ 0.65 0.74 0.45 0.40 0.88 0.99 0.93 0.89
JA ρ 0.60 0.90 0.20 0.50 0.90 1.00 1.00 0.90
Cross-lingual 0.25 0.26 0.66 0.81 0.71 0.84 0.68 0.85
Robustness 0.89 0.90 0.94 0.97 0.88 0.94 0.89 0.95
Cluster sep. 8.73x 4.38x 1.29x 1.09x 2.55x 1.11x 1.41x 1.30x
Cost $0 $0 $0 $0 $0.02/1M free tier $0.10/1M $0.06/1M

Why MiniLM-L6 for local

  • 23MB -- the only model small enough for app embedding (others are 67-133MB)
  • Best clustering separation at 8.73x (2nd place is 4.38x)
  • 100% retrieval accuracy, EN ρ=0.92
  • 12-layer models are 3-6x larger with no meaningful quality improvement
  • Known weakness: poor on Chinese/Japanese (ρ=0.60-0.65) and cross-lingual (0.25)

Why OpenAI for cloud

  • Best discrimination gap at 0.58 (dissimilar texts avg cosine = 0.09, closest to zero)
  • 100% retrieval accuracy, MRR=1.0
  • Balanced multilingual: EN=0.91, ZH=0.88, JA=0.90 -- no weak language
  • Cheapest cloud option at $0.02/1M tokens
  • Gemini has higher ρ (0.94) but poor discrimination (0.30) and retrieval miss (89%)
  • Cohere matches quality but costs 5x more ($0.10/1M tokens)

Providers

Provider Constructor Default Model Max Batch Size
OpenAI Client::openai(key) text-embedding-3-small 2048
Cohere Client::cohere(key) embed-v4.0 96
Google Gemini Client::gemini(key) gemini-embedding-001 100
Voyage AI Client::voyage(key) voyage-3-large 128
Jina AI Client::jina(key) jina-embeddings-v3 2048
Local Client::local(name)? all-MiniLM-L6-v2 256

Each cloud provider also has a *_compatible constructor for proxies or API-compatible services:

// OpenAI-compatible (Azure, proxies, etc.)
let client = Client::openai_compatible("sk-...", "https://your-proxy.com/v1");

// Cohere-compatible
let client = Client::cohere_compatible("key", "https://proxy.example.com/v2");

// Gemini-compatible
let client = Client::gemini_compatible("key", "https://proxy.example.com/v1beta");

// Voyage-compatible
let client = Client::voyage_compatible("key", "https://proxy.example.com/v1");

// Jina-compatible
let client = Client::jina_compatible("key", "https://proxy.example.com/v1");

Batch Embedding

Embed thousands of texts concurrently. Texts are automatically chunked based on the provider's maximum batch size:

let client = embedrs::cloud("sk-...");

let texts: Vec<String> = (0..5000).map(|i| format!("document {i}")).collect();

let result = client.embed_batch(texts)
    .concurrency(5)       // max concurrent API requests (default: 5)
    .chunk_size(512)       // texts per request (default: provider max)
    .model("text-embedding-3-large")
    .await?;

println!("total embeddings: {}", result.embeddings.len());
println!("total tokens: {}", result.usage.total_tokens);

Similarity Functions

use embedrs::{cosine_similarity, dot_product, euclidean_distance};

let a = vec![1.0, 0.0, 0.0];
let b = vec![0.0, 1.0, 0.0];

let cos = cosine_similarity(&a, &b);    // 0.0 (orthogonal)
let dot = dot_product(&a, &b);          // 0.0
let dist = euclidean_distance(&a, &b);  // 1.414...

Input Type

Some providers use input type hints to optimize embeddings for specific use cases:

use embedrs::InputType;

// for indexing documents
let result = client.embed(docs)
    .input_type(InputType::SearchDocument)
    .await?;

// for search queries
let result = client.embed(queries)
    .input_type(InputType::SearchQuery)
    .await?;

Available variants: SearchDocument, SearchQuery, Classification, Clustering.

Dimensions

Request reduced-dimension embeddings where the provider supports it:

let result = client.embed(vec!["hello".into()])
    .model("text-embedding-3-large")
    .dimensions(256)
    .await?;

assert_eq!(result.embeddings[0].len(), 256);

Backoff and Timeout

use std::time::Duration;
use embedrs::BackoffConfig;

let client = Client::openai("sk-...")
    .with_retry_backoff(BackoffConfig::default())  // 500ms base, 30s cap, 3 retries
    .with_timeout(Duration::from_secs(120));        // overall timeout (default: 60s)

// per-request override
let result = client.embed(vec!["hello".into()])
    .retry_backoff(BackoffConfig {
        base_delay: Duration::from_millis(200),
        max_delay: Duration::from_secs(10),
        jitter: true,
        max_http_retries: 5,
    })
    .timeout(Duration::from_secs(30))
    .await?;

Without backoff configured, HTTP 429/503 errors fail immediately.

Client Defaults

Set defaults once, override per-request:

let client = Client::openai("sk-...")
    .with_model("text-embedding-3-large")
    .with_dimensions(256)
    .with_input_type(InputType::SearchDocument)
    .with_retry_backoff(BackoffConfig::default())
    .with_timeout(Duration::from_secs(120));

// all requests use the defaults above
let a = client.embed(vec!["doc 1".into()]).await?;
let b = client.embed(vec!["doc 2".into()]).await?;

// override for a specific request
let c = client.embed(vec!["query".into()])
    .model("text-embedding-3-small")
    .input_type(InputType::SearchQuery)
    .await?;

Feature Flags

Feature Default Description
(none) yes Core embedding client, all 5 cloud providers
local no Local inference via candle (all-MiniLM-L6-v2, 23MB)
tracing no Structured logging via the tracing crate
[dependencies]
# cloud only
embedrs = "0.2"

# cloud + local inference
embedrs = { version = "0.1", features = ["local"] }

# with tracing
embedrs = { version = "0.1", features = ["local", "tracing"] }

License

MIT