Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
OmenDB
Embedded vector database for Python and Node.js. No server, no setup, just install.
- 20K QPS single-threaded search with 100% recall (SIFT-10K)
- 105K vec/s insert throughput
- SQ8 quantization (4x compression, ~99% recall)
- ACORN-1 predicate-aware filtered search
- Hybrid search -- BM25 text + vector with RRF fusion
- Multi-vector -- ColBERT/MaxSim with MUVERA and token pooling
- Auto-embedding -- pass a function, store documents, search with strings
Quick Start
Python
With auto-embedding -- pass an embedding function, work with documents and strings:
# Your embedding model here (OpenAI, sentence-transformers, etc.)
return
=
# Add documents -- auto-embedded
# Search with text -- auto-embedded
=
With vectors -- bring your own embeddings:
=
=
=
Node.js
With auto-embedding:
const omendb = require;
const db = omendb.;
db.;
const results = db.;
With vectors:
const db = omendb.;
db.;
const results = db.;
Features
- HNSW graph indexing -- SIMD-accelerated distance computation
- ACORN-1 filtered search -- predicate-aware graph traversal, 37.79x speedup over post-filtering
- SQ8 quantization -- 4x compression, ~99% recall
- BM25 text search -- full-text search via Tantivy
- Hybrid search -- RRF fusion of vector + text results
- Multi-vector / ColBERT -- MUVERA + MaxSim scoring for token-level retrieval
- Token pooling -- k-means clustering, 50% storage reduction for multi-vector
- Auto-embedding --
embedding_fn(Python) /embeddingFn(Node.js) for document-in, text-query workflows - Collections -- namespaced sub-databases within a single file
- Persistence -- WAL + atomic checkpoints
- O(1) lazy delete + compaction -- deleted records cleaned up in background
- Segment-based architecture -- background merging for sustained write throughput
- Context manager (Python) /
close()(Node.js) for resource cleanup
Platforms
| Platform | Status |
|---|---|
| Linux (x86_64, ARM64) | Supported |
| macOS (Intel, Apple Silicon) | Supported |
API Reference
Python
# Database
= # With auto-embedding
= # Manual vectors
= # In-memory
# CRUD
# Insert/update (vectors or documents)
# Single insert
# Get by ID
# Batch get
# Delete by IDs
# Delete by metadata filter
# Update fields
# Search
# Vector or string query
# Filtered search (ACORN-1)
# Distance threshold
# Batch search (parallel)
# Hybrid search
# String query (auto-embeds both)
# Text-only BM25
# Iteration
# Count
# Filtered count
# Lazy ID iterator
# All items (loads to memory)
... # Lazy iteration
in # Existence check
# Collections
= # Create/get collection
# List collections
# Delete collection
# Persistence
# Flush to disk
# Close
# Remove deleted records
# Reorder for cache locality
# Merge databases
# Config
# Get search quality
= 200 # Set search quality
# Vector dimensionality
# Database statistics
Node.js
// Database
const db = omendb.;
const db = omendb.;
// CRUD
db.;
db.;
db.;
db.;
db.;
db.;
// Search
db.;
db.;
db.;
// Hybrid
db.;
db.;
// Collections
db.;
db.;
db.;
// Persistence
db.;
db.;
db.;
db.;
Configuration
=
# Quantization options:
# - True or "sq8": SQ8 ~4x smaller, ~99% recall (recommended)
# - None/False: Full precision (default)
# Distance metric options:
# - "l2" or "euclidean": Euclidean distance (default)
# - "cosine": Cosine distance (1 - cosine similarity)
# - "dot" or "ip": Inner product (for MIPS)
# Context manager (auto-flush on exit)
Distance Filtering
Use max_distance to filter out low-relevance results (prevents "context rot" in RAG):
# Only return results with distance <= 0.5
=
# Combine with metadata filter
=
This ensures your RAG pipeline only receives highly relevant context, avoiding distractors that can hurt LLM performance.
Filters
# Equality
# Shorthand
# Explicit
# Comparison
# Not equal
# Greater than
# Greater or equal
# Less than
# Less or equal
# Membership
# In list
# String contains
# Logical
# AND
# OR
Hybrid Search
Combine vector similarity with BM25 full-text search using RRF fusion:
# With embedding_fn -- pass a string for both vector and text query
=
=
# With manual vectors
# Tune alpha: 0 = text only, 1 = vector only, default = 0.5
# Get separate keyword and semantic scores for debugging/tuning
=
# Returns: {"id": "...", "score": 0.85, "keyword_score": 0.92, "semantic_score": 0.78}
# Text-only BM25
Multi-vector (ColBERT)
MUVERA with MaxSim scoring for ColBERT-style token-level retrieval. Token pooling via k-means reduces storage by 50%.
=
= # MaxSim scoring
Performance
SIFT-10K (128D, M=16, ef=100, k=10, Apple M3 Max):
| Metric | Result |
|---|---|
| Build | 105K vec/s |
| Search | 19.7K QPS |
| Batch | 156K QPS |
| Recall@10 | 100.0% |
SIFT-1M (1M vectors, 128D, M=16, ef=100, k=10):
| Machine | QPS | Recall |
|---|---|---|
| i9-13900KF | 4,591 | 98.6% |
| Apple M3 Max | 3,216 | 98.4% |
Quantization:
| Mode | Compression | Recall | Use Case |
|---|---|---|---|
| f32 | 1x | 100% | Default |
| SQ8 | 4x | ~99% | Recommended for most |
= # SQ8
Filtered search (ACORN-1, SIFT-10K, 10% selectivity):
| Method | QPS | Recall | Speedup |
|---|---|---|---|
| ACORN-1 | -- | -- | 37.79x vs post-filter |
- Parameters: m=16, ef_construction=100, ef_search=100
- Batch: Uses Rayon for parallel search across all cores
- Recall: Validated against brute-force ground truth on SIFT/GloVe
- Reproduce:
- Quick (10K):
uv run python benchmarks/run.py
- Quick (10K):
Tuning
The ef_search parameter controls the recall/speed tradeoff at query time. Higher values explore more candidates, improving recall but slowing search.
Rules of thumb:
ef_searchmust be >= k (number of results requested)- For 128D embeddings: ef=100 usually achieves 90%+ recall
- For 768D+ embeddings: increase to ef=200-400 for better recall
- If recall drops at scale (50K+), increase both ef_search and ef_construction
Runtime tuning:
# Check current value
# 100
# Increase for better recall (slower)
= 200
# Decrease for speed (may reduce recall)
= 50
# Per-query override
=
Recommended settings by use case:
| Use Case | ef_search | Expected Recall |
|---|---|---|
| Fast search (128D) | 64 | ~85% |
| Balanced (default) | 100 | ~90% |
| High recall (768D+) | 200-300 | ~95%+ |
| Maximum recall | 500+ | ~98%+ |
Examples
See complete working examples:
python/examples/quickstart.py-- Minimal Python examplepython/examples/basic.py-- CRUD operations and persistencepython/examples/filters.py-- All filter operatorspython/examples/rag.py-- RAG workflow with mock embeddingspython/examples/embedding_fn.py-- Auto-embedding with embedding_fnpython/examples/quantization.py-- SQ8 quantizationnode/examples/quickstart.js-- Minimal Node.js examplenode/examples/embedding_fn.js-- Auto-embedding with embeddingFnnode/examples/multivector.ts-- Multi-vector / ColBERT
Integrations
LangChain
=
=
LlamaIndex
=
=
=
=
License
Elastic License 2.0 -- Free to use, modify, and embed. The only restriction: you can't offer OmenDB as a managed service to third parties.