embedrs
Unified embedding for Rust -- cloud APIs + local inference, one interface, opinionated defaults.
Design philosophy
"好用就好用" -- just works. Pick one best default, backed by data.
embedrs::local()?-- all-MiniLM-L6-v2 (23MB, free, no API key)embedrs::cloud(key)-- OpenAI text-embedding-3-small (best discrimination, cheapest cloud)- Both produce the same
EmbedResult-- write code once, switch backends in one line
Defaults chosen by 8-dimension benchmark across 8 models. See benchrs for full methodology.
Quick Start
// cloud -- one key, done
let client = cloud;
let result = client.embed.await?;
println!;
// local -- zero config, free, 23MB model downloaded on first use
let client = local?;
let result = client.embed.await?;
Installation
[]
= "0.2"
# enable local inference (adds ~23MB model download on first use)
= { = "0.3", = ["local"] }
Benchmark Results
8 dimensions, 184 unique texts. Full methodology and reproduction instructions in benchrs.
| Metric | MiniLM-L6 | MiniLM-L12 | BGE-small | GTE-small | OpenAI | Gemini | Cohere | Voyage |
|---|---|---|---|---|---|---|---|---|
| Size | 23MB | 133MB | 133MB | 67MB | cloud | cloud | cloud | cloud |
| Spearman ρ | 0.81 | 0.84 | 0.71 | 0.75 | 0.91 | 0.94 | 0.91 | 0.89 |
| Discrimination | 0.52 | 0.52 | 0.29 | 0.14 | 0.58 | 0.30 | 0.46 | 0.45 |
| Retrieval | 100% | 100% | 89% | 100% | 100% | 89% | 100% | 89% |
| EN ρ | 0.92 | 0.94 | 0.92 | 0.90 | 0.91 | 0.91 | 0.89 | 0.88 |
| ZH ρ | 0.65 | 0.74 | 0.45 | 0.40 | 0.88 | 0.99 | 0.93 | 0.89 |
| JA ρ | 0.60 | 0.90 | 0.20 | 0.50 | 0.90 | 1.00 | 1.00 | 0.90 |
| Cross-lingual | 0.25 | 0.26 | 0.66 | 0.81 | 0.71 | 0.84 | 0.68 | 0.85 |
| Robustness | 0.89 | 0.90 | 0.94 | 0.97 | 0.88 | 0.94 | 0.89 | 0.95 |
| Cluster sep. | 8.73x | 4.38x | 1.29x | 1.09x | 2.55x | 1.11x | 1.41x | 1.30x |
| Cost | $0 | $0 | $0 | $0 | $0.02/1M | free tier | $0.10/1M | $0.06/1M |
Why MiniLM-L6 for local
- 23MB -- the only model small enough for app embedding (others are 67-133MB)
- Best clustering separation at 8.73x (2nd place is 4.38x)
- 100% retrieval accuracy, EN ρ=0.92
- 12-layer models are 3-6x larger with no meaningful quality improvement
- Known weakness: poor on Chinese/Japanese (ρ=0.60-0.65) and cross-lingual (0.25)
Why OpenAI for cloud
- Best discrimination gap at 0.58 (dissimilar texts avg cosine = 0.09, closest to zero)
- 100% retrieval accuracy, MRR=1.0
- Balanced multilingual: EN=0.91, ZH=0.88, JA=0.90 -- no weak language
- Cheapest cloud option at $0.02/1M tokens
- Gemini has higher ρ (0.94) but poor discrimination (0.30) and retrieval miss (89%)
- Cohere matches quality but costs 5x more ($0.10/1M tokens)
Providers
| Provider | Constructor | Default Model | Max Batch Size |
|---|---|---|---|
| OpenAI | Client::openai(key) |
text-embedding-3-small |
2048 |
| Cohere | Client::cohere(key) |
embed-v4.0 |
96 |
| Google Gemini | Client::gemini(key) |
gemini-embedding-001 |
100 |
| Voyage AI | Client::voyage(key) |
voyage-3-large |
128 |
| Jina AI | Client::jina(key) |
jina-embeddings-v3 |
2048 |
| Local | Client::local(name)? |
all-MiniLM-L6-v2 |
256 |
Each cloud provider also has a *_compatible constructor for proxies or API-compatible services:
// OpenAI-compatible (Azure, proxies, etc.)
let client = openai_compatible;
// Cohere-compatible
let client = cohere_compatible;
// Gemini-compatible
let client = gemini_compatible;
// Voyage-compatible
let client = voyage_compatible;
// Jina-compatible
let client = jina_compatible;
Batch Embedding
Embed thousands of texts concurrently. Texts are automatically chunked based on the provider's maximum batch size:
let client = cloud;
let texts: = .map.collect;
let result = client.embed_batch
.concurrency // max concurrent API requests (default: 5)
.chunk_size // texts per request (default: provider max)
.model
.await?;
println!;
println!;
Similarity Functions
use ;
let a = vec!;
let b = vec!;
let cos = cosine_similarity; // 0.0 (orthogonal)
let dot = dot_product; // 0.0
let dist = euclidean_distance; // 1.414...
Input Type
Some providers use input type hints to optimize embeddings for specific use cases:
use InputType;
// for indexing documents
let result = client.embed
.input_type
.await?;
// for search queries
let result = client.embed
.input_type
.await?;
Available variants: SearchDocument, SearchQuery, Classification, Clustering.
Dimensions
Request reduced-dimension embeddings where the provider supports it:
let result = client.embed
.model
.dimensions
.await?;
assert_eq!;
Backoff and Timeout
use Duration;
use BackoffConfig;
let client = openai
.with_retry_backoff // 500ms base, 30s cap, 3 retries
.with_timeout; // overall timeout (default: 60s)
// per-request override
let result = client.embed
.retry_backoff
.timeout
.await?;
Without backoff configured, HTTP 429/503 errors fail immediately.
Client Defaults
Set defaults once, override per-request:
let client = openai
.with_model
.with_dimensions
.with_input_type
.with_retry_backoff
.with_timeout;
// all requests use the defaults above
let a = client.embed.await?;
let b = client.embed.await?;
// override for a specific request
let c = client.embed
.model
.input_type
.await?;
Feature Flags
| Feature | Default | Description |
|---|---|---|
| (none) | yes | Core embedding client, all 5 cloud providers |
local |
no | Local inference via candle (all-MiniLM-L6-v2, 23MB) |
cost-tracking |
no | Estimated cost per request via tiktoken pricing data |
tracing |
no | Structured logging via the tracing crate |
[]
# cloud only
= "0.2"
# cloud + local inference
= { = "0.3", = ["local"] }
# with cost tracking
= { = "0.3", = ["cost-tracking"] }
# with tracing
= { = "0.3", = ["local", "tracing"] }
Provider Fallback
Chain fallback providers for automatic failover when the primary provider is unavailable:
let client = openai
.with_fallback;
// if OpenAI fails, automatically tries Cohere
let result = client.embed.await?;
Multiple fallbacks are tried in order:
let client = openai
.with_fallback
.with_fallback;
Cost Tracking
Enable the cost-tracking feature to get estimated cost per request:
= { = "0.3", = ["cost-tracking"] }
let result = client.embed.await?;
if let Some = result.usage.cost
Cost estimation uses tiktoken pricing data. Returns None for models without pricing information.
Error Handling
All fallible operations return embedrs::Result<T>. Match on Error variants for fine-grained control:
use Error;
# async