Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
OmenDB
Embedded vector database for Python and Node.js. No server, no setup, just install.
- 7,600 QPS single / 64,000 QPS batch search, 99.8% recall (SIFT-100K)
- 60K vec/s insert throughput
- SQ8 quantization (4x compression, 99.8% recall, 2x faster search)
- ACORN-1 predicate-aware filtered search
- Hybrid search -- BM25 text + vector with RRF fusion
- Multi-vector -- ColBERT/MaxSim with MUVERA and token pooling
- Auto-embedding -- pass a function, store documents, search with strings
Quick Start
Python
With auto-embedding -- pass an embedding function, work with documents and strings:
# Your embedding model here (OpenAI, sentence-transformers, etc.)
return
=
# Add documents -- auto-embedded
# Search with text -- auto-embedded
=
With vectors -- bring your own embeddings:
=
=
=
Node.js
With auto-embedding:
const = require;
const db = ;
await db.;
const results = await db.;
With vectors:
const db = ;
await db.;
const results = await db.;
Features
- HNSW graph indexing -- SIMD-accelerated distance computation
- ACORN-1 filtered search -- predicate-aware graph traversal, 37.79x speedup over post-filtering
- SQ8 quantization -- 4x compression, 99.8% recall, 2x faster search
- BM25 text search -- full-text search via Tantivy
- Hybrid search -- RRF fusion of vector + text results
- Multi-vector / ColBERT -- MUVERA + MaxSim scoring for token-level retrieval
- Token pooling -- k-means clustering, 50% storage reduction for multi-vector
- Auto-embedding --
embedding_fn(Python) /embeddingFn(Node.js) for document-in, text-query workflows - Collections -- namespaced sub-databases within a single file
- Persistence -- WAL + atomic checkpoints
- O(1) lazy delete + compaction -- deleted records cleaned up in background
- Segment-based architecture -- background merging for sustained write throughput
- Context manager (Python) /
close()(Node.js) for resource cleanup
Platforms
| Platform | Status |
|---|---|
| Linux (x86_64, ARM64) | Supported |
| macOS (Intel, Apple Silicon) | Supported |
API Reference
Python
# Database
= # With auto-embedding
= # Manual vectors
= # In-memory
# CRUD
# Insert/update (vectors or documents)
# Single insert
# Get by ID
# Batch get
# Delete by IDs
# Delete by metadata filter
# Update fields
# Search
# Vector or string query
# Filtered search (ACORN-1)
# Distance threshold
# Batch search (parallel)
# Hybrid search
# String query (auto-embeds both)
# Text-only BM25
# Iteration
# Count
# Filtered count
# Lazy ID iterator
# All items (loads to memory)
... # Lazy iteration
in # Existence check
# Collections
= # Create/get collection
# List collections
# Delete collection
# Persistence
# Flush to disk
# Close
# Remove deleted records
# Reorder for cache locality
# Merge databases
# Config
# Get search quality
= 200 # Set search quality
# Vector dimensionality
# Database statistics
Node.js
// Database
const db = ;
const db = ;
// CRUD
await db.;
db.;
db.;
db.;
db.;
await db.; // update
// Search
await db.;
await db.;
await db.;
// Hybrid
await db.;
db.;
// Collections
db.;
db.;
db.;
// Persistence
db.;
db.;
db.;
db.;
Configuration
=
# Quantization options:
# - True or "sq8": SQ8 ~4x smaller, ~99% recall (recommended)
# - None/False: Full precision (default)
# Distance metric options:
# - "l2" or "euclidean": Euclidean distance (default)
# - "cosine": Cosine distance (1 - cosine similarity)
# - "dot" or "ip": Inner product (for MIPS)
# Context manager (auto-flush on exit)
Distance Filtering
Use max_distance to filter out low-relevance results (prevents "context rot" in RAG):
# Only return results with distance <= 0.5
=
# Combine with metadata filter
=
This ensures your RAG pipeline only receives highly relevant context, avoiding distractors that can hurt LLM performance.
Filters
# Equality
# Shorthand
# Explicit
# Comparison
# Not equal
# Greater than
# Greater or equal
# Less than
# Less or equal
# Membership
# In list
# String contains
# Logical
# AND
# OR
Hybrid Search
Combine vector similarity with BM25 full-text search using RRF fusion:
# With embedding_fn -- pass a string for both vector and text query
=
=
# With manual vectors
# Tune alpha: 0 = text only, 1 = vector only, default = 0.5
# Get separate keyword and semantic scores for debugging/tuning
=
# Returns: {"id": "...", "score": 0.85, "keyword_score": 0.92, "semantic_score": 0.78}
# Text-only BM25
Multi-vector (ColBERT)
MUVERA with MaxSim scoring for ColBERT-style token-level retrieval. Token pooling via k-means reduces storage by 50%.
=
= # MaxSim scoring
Performance
Authoritative baseline: SIFT-100K · 128D · M=16 · ef_construction=100 · ef_search=100 · k=10 · Fedora i9-13900KF (5-run median)
| Mode | Build | Single | Batch | Recall@10 |
|---|---|---|---|---|
| fp32 | 24,881 v/s | 2,324 QPS | 39,905 QPS | 99.8% |
| SQ8 | pending refreshed Linux run | pending | pending | pending |
Batch search uses Rayon for parallel execution across all cores. Scales to 1M+ vectors. Apple Silicon runs are still useful for local reference, but Fedora/Linux medians are the authoritative comparison baseline.
Filtered search (ACORN-1, 10% selectivity): predicate-aware graph traversal, no post-filter overhead.
- Dataset: SIFT-100K (real 128D embeddings, not random vectors)
- Parameters: M=16, ef_construction=100, ef_search=100, k=10
- Batch: parallel via Rayon
- Recall: validated against brute-force ground truth
- Authoritative runs: Fedora/Linux medians from
cd python && uv run python benchmark.py --publish - Local reproduction:
cd python && uv run python benchmark.py - Synthetic sweeps:
uv run python benchmark.py --fullis exploratory and not comparable to SIFT history - Current Apple M3 Max reference: fp32
59,789 v/s,7,644 QPS,64,570 QPS,99.8%; SQ859,905 v/s,15,403 QPS,95,442 QPS,99.8%
Tuning
The ef_search parameter controls the recall/speed tradeoff at query time. Higher values explore more candidates, improving recall but slowing search.
Rules of thumb:
ef_searchmust be >= k (number of results requested)- For 128D embeddings: ef=100 usually achieves 90%+ recall
- For 768D+ embeddings: increase to ef=200-400 for better recall
- If recall drops at scale (50K+), increase both ef_search and ef_construction
Runtime tuning:
# Check current value
# 100
# Increase for better recall (slower)
= 200
# Decrease for speed (may reduce recall)
= 50
# Per-query override
=
Recommended settings by use case:
| Use Case | ef_search | Expected Recall |
|---|---|---|
| Fast search (128D) | 64 | ~85% |
| Balanced (default) | 100 | ~90% |
| High recall (768D+) | 200-300 | ~95%+ |
| Maximum recall | 500+ | ~98%+ |
Examples
See complete working examples:
python/examples/quickstart.py-- Minimal Python examplepython/examples/basic.py-- CRUD operations and persistencepython/examples/filters.py-- All filter operatorspython/examples/rag.py-- RAG workflow with mock embeddingspython/examples/embedding_fn.py-- Auto-embedding with embedding_fnpython/examples/quantization.py-- SQ8 quantizationnode/examples/quickstart.js-- Minimal Node.js examplenode/examples/embedding_fn.js-- Auto-embedding with embeddingFnnode/examples/multivector.ts-- Multi-vector / ColBERT
Integrations
LangChain
LangChain integration requires Python 3.10+.
=
=
LlamaIndex
=
=
=
=
License
Elastic License 2.0 -- Free to use, modify, and embed. The only restriction: you can't offer OmenDB as a managed service to third parties.