omendb 0.0.27

Fast embedded vector database with HNSW + ACORN-1 filtered search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
# OmenDB

[![PyPI](https://img.shields.io/pypi/v/omendb)](https://pypi.org/project/omendb/)
[![npm](https://img.shields.io/npm/v/omendb)](https://www.npmjs.com/package/omendb)
[![License](https://img.shields.io/badge/License-Elastic_2.0-blue.svg)](https://github.com/omendb/omendb/blob/main/LICENSE)

Embedded vector database for Python and Node.js. No server, no setup, just install.

- **20K QPS** single-threaded search with 100% recall (SIFT-10K)
- **105K vec/s** insert throughput
- **SQ8 quantization** (4x compression, ~99% recall)
- **ACORN-1** predicate-aware filtered search
- **Hybrid search** -- BM25 text + vector with RRF fusion
- **Multi-vector** -- ColBERT/MaxSim with MUVERA and token pooling
- **Auto-embedding** -- pass a function, store documents, search with strings

```bash
pip install omendb       # Python
npm install omendb       # Node.js
```

## Quick Start

### Python

**With auto-embedding** -- pass an embedding function, work with documents and strings:

```python
import omendb

def embed(texts):
    # Your embedding model here (OpenAI, sentence-transformers, etc.)
    return [[0.1] * 384 for _ in texts]

db = omendb.open("./mydb", dimensions=384, embedding_fn=embed)

# Add documents -- auto-embedded
db.set([
    {"id": "doc1", "document": "Paris is the capital of France", "metadata": {"topic": "geography"}},
    {"id": "doc2", "document": "The mitochondria is the powerhouse of the cell", "metadata": {"topic": "biology"}},
])

# Search with text -- auto-embedded
results = db.search("capital of France", k=5)
```

**With vectors** -- bring your own embeddings:

```python
db = omendb.open("./mydb", dimensions=128)

db.set([
    {"id": "doc1", "vector": [0.1] * 128, "metadata": {"category": "science"}},
    {"id": "doc2", "vector": [0.2] * 128, "metadata": {"category": "history"}},
])

results = db.search([0.1] * 128, k=5)
results = db.search([0.1] * 128, k=5, filter={"category": "science"})
```

### Node.js

**With auto-embedding:**

```javascript
const omendb = require("omendb");

const db = omendb.open("./mydb", 384, { embeddingFn: embed });
db.set([{ id: "doc1", document: "Paris is the capital of France" }]);
const results = db.search("capital of France", 5);
```

**With vectors:**

```javascript
const db = omendb.open("./mydb", 128);
db.set([{ id: "doc1", vector: new Float32Array(128).fill(0.1) }]);
const results = db.search(new Float32Array(128).fill(0.1), 5);
```

## Features

- **HNSW graph indexing** -- SIMD-accelerated distance computation
- **ACORN-1 filtered search** -- predicate-aware graph traversal, 37.79x speedup over post-filtering
- **SQ8 quantization** -- 4x compression, ~99% recall
- **BM25 text search** -- full-text search via Tantivy
- **Hybrid search** -- RRF fusion of vector + text results
- **Multi-vector / ColBERT** -- MUVERA + MaxSim scoring for token-level retrieval
- **Token pooling** -- k-means clustering, 50% storage reduction for multi-vector
- **Auto-embedding** -- `embedding_fn` (Python) / `embeddingFn` (Node.js) for document-in, text-query workflows
- **Collections** -- namespaced sub-databases within a single file
- **Persistence** -- WAL + atomic checkpoints
- **O(1) lazy delete + compaction** -- deleted records cleaned up in background
- **Segment-based architecture** -- background merging for sustained write throughput
- **Context manager** (Python) / `close()` (Node.js) for resource cleanup

## Platforms

| Platform                     | Status    |
| ---------------------------- | --------- |
| Linux (x86_64, ARM64)        | Supported |
| macOS (Intel, Apple Silicon) | Supported |

## API Reference

### Python

```python
# Database
db = omendb.open(path, dimensions, embedding_fn=fn)  # With auto-embedding
db = omendb.open(path, dimensions)                    # Manual vectors
db = omendb.open(":memory:", dimensions)              # In-memory

# CRUD
db.set(items)                           # Insert/update (vectors or documents)
db.set("id", vector, metadata)          # Single insert
db.get(id)                              # Get by ID
db.get_batch(ids)                       # Batch get
db.delete(ids)                          # Delete by IDs
db.delete_by_filter(filter)             # Delete by metadata filter
db.update(id, vector, metadata, text)   # Update fields

# Search
db.search(query, k)                     # Vector or string query
db.search(query, k, filter={...})       # Filtered search (ACORN-1)
db.search(query, k, max_distance=0.5)   # Distance threshold
db.search_batch(queries, k)             # Batch search (parallel)

# Hybrid search
db.search_hybrid(query_vector, query_text, k)
db.search_hybrid("query text", k=10)    # String query (auto-embeds both)
db.search_text(query_text, k)           # Text-only BM25

# Iteration
len(db)                                 # Count
db.count(filter={...})                  # Filtered count
db.ids()                                # Lazy ID iterator
db.items()                              # All items (loads to memory)
for item in db: ...                     # Lazy iteration
"id" in db                              # Existence check

# Collections
col = db.collection("users")            # Create/get collection
db.collections()                        # List collections
db.delete_collection("users")           # Delete collection

# Persistence
db.flush()                              # Flush to disk
db.close()                              # Close
db.compact()                            # Remove deleted records
db.optimize()                           # Reorder for cache locality
db.merge_from(other_db)                 # Merge databases

# Config
db.ef_search                            # Get search quality
db.ef_search = 200                      # Set search quality
db.dimensions                           # Vector dimensionality
db.stats()                              # Database statistics
```

### Node.js

```javascript
// Database
const db = omendb.open(path, dimensions, { embeddingFn: fn });
const db = omendb.open(path, dimensions);

// CRUD
db.set(items);
db.get(id);
db.getBatch(ids);
db.delete(ids);
db.deleteByFilter(filter);
db.update(id, { vector, metadata, text });

// Search
db.search(query, k);
db.search(query, k, { filter, maxDistance, ef });
db.searchBatch(queries, k);

// Hybrid
db.searchHybrid(queryVector, queryText, k);
db.searchText(queryText, k);

// Collections
db.collection("users");
db.collections();
db.deleteCollection("users");

// Persistence
db.flush();
db.close();
db.compact();
db.optimize();
```

## Configuration

```python
db = omendb.open(
    "./mydb",                # Creates ./mydb.omen + ./mydb.wal
    dimensions=384,
    m=16,                    # HNSW connections per node (default: 16)
    ef_construction=200,     # Index build quality (default: 100)
    ef_search=100,           # Search quality (default: 100)
    quantization=True,       # SQ8 quantization (default: None)
    metric="cosine",         # Distance metric (default: "l2")
    embedding_fn=embed,      # Auto-embed documents and string queries
)

# Quantization options:
# - True or "sq8": SQ8 ~4x smaller, ~99% recall (recommended)
# - None/False: Full precision (default)

# Distance metric options:
# - "l2" or "euclidean": Euclidean distance (default)
# - "cosine": Cosine distance (1 - cosine similarity)
# - "dot" or "ip": Inner product (for MIPS)

# Context manager (auto-flush on exit)
with omendb.open("./db", dimensions=768) as db:
    db.set([...])
```

## Distance Filtering

Use `max_distance` to filter out low-relevance results (prevents "context rot" in RAG):

```python
# Only return results with distance <= 0.5
results = db.search(query, k=10, max_distance=0.5)

# Combine with metadata filter
results = db.search(query, k=10, filter={"type": "doc"}, max_distance=0.5)
```

This ensures your RAG pipeline only receives highly relevant context, avoiding distractors that can hurt LLM performance.

## Filters

```python
# Equality
{"field": "value"}                      # Shorthand
{"field": {"$eq": "value"}}             # Explicit

# Comparison
{"field": {"$ne": "value"}}             # Not equal
{"field": {"$gt": 10}}                  # Greater than
{"field": {"$gte": 10}}                 # Greater or equal
{"field": {"$lt": 10}}                  # Less than
{"field": {"$lte": 10}}                 # Less or equal

# Membership
{"field": {"$in": ["a", "b"]}}          # In list
{"field": {"$contains": "sub"}}         # String contains

# Logical
{"$and": [{...}, {...}]}                # AND
{"$or": [{...}, {...}]}                 # OR
```

## Hybrid Search

Combine vector similarity with BM25 full-text search using RRF fusion:

```python
# With embedding_fn -- pass a string for both vector and text query
db = omendb.open("./mydb", dimensions=384, embedding_fn=embed)
db.set([
    {"id": "doc1", "document": "Paris is the capital of France", "metadata": {"topic": "geography"}},
])

results = db.search_hybrid("capital of France", k=10)

# With manual vectors
db.search_hybrid(query_vector, "query text", k=10)

# Tune alpha: 0 = text only, 1 = vector only, default = 0.5
db.search_hybrid(query_vector, "query text", k=10, alpha=0.7)

# Get separate keyword and semantic scores for debugging/tuning
results = db.search_hybrid(query_vector, "query text", k=10, subscores=True)
# Returns: {"id": "...", "score": 0.85, "keyword_score": 0.92, "semantic_score": 0.78}

# Text-only BM25
db.search_text("capital of France", k=10)
```

## Multi-vector (ColBERT)

MUVERA with MaxSim scoring for ColBERT-style token-level retrieval. Token pooling via k-means reduces storage by 50%.

```python
mvdb = omendb.open(":memory:", dimensions=128, multi_vector=True)
mvdb.set([{
    "id": "doc1",
    "vectors": [[0.1]*128, [0.2]*128, [0.3]*128],  # Token embeddings
}])
results = mvdb.search([[0.1]*128, [0.15]*128], k=5)  # MaxSim scoring
```

## Performance

**SIFT-10K** (128D, M=16, ef=100, k=10, Apple M3 Max):

| Metric    | Result     |
| --------- | ---------- |
| Build     | 105K vec/s |
| Search    | 19.7K QPS  |
| Batch     | 156K QPS   |
| Recall@10 | 100.0%     |

**SIFT-1M** (1M vectors, 128D, M=16, ef=100, k=10):

| Machine      | QPS   | Recall |
| ------------ | ----- | ------ |
| i9-13900KF   | 4,591 | 98.6%  |
| Apple M3 Max | 3,216 | 98.4%  |

**Quantization:**

| Mode | Compression | Recall | Use Case             |
| ---- | ----------- | ------ | -------------------- |
| f32  | 1x          | 100%   | Default              |
| SQ8  | 4x          | ~99%   | Recommended for most |

```python
db = omendb.open("./db", dimensions=768, quantization=True)          # SQ8
```

**Filtered search** (ACORN-1, SIFT-10K, 10% selectivity):

| Method  | QPS | Recall | Speedup               |
| ------- | --- | ------ | --------------------- |
| ACORN-1 | --  | --     | 37.79x vs post-filter |

<details>
<summary>Benchmark methodology</summary>

- **Parameters**: m=16, ef_construction=100, ef_search=100
- **Batch**: Uses Rayon for parallel search across all cores
- **Recall**: Validated against brute-force ground truth on SIFT/GloVe
- **Reproduce**:
  - Quick (10K): `uv run python benchmarks/run.py`

</details>

## Tuning

The `ef_search` parameter controls the recall/speed tradeoff at query time. Higher values explore more candidates, improving recall but slowing search.

**Rules of thumb:**

- `ef_search` must be >= k (number of results requested)
- For 128D embeddings: ef=100 usually achieves 90%+ recall
- For 768D+ embeddings: increase to ef=200-400 for better recall
- If recall drops at scale (50K+), increase both ef_search and ef_construction

**Runtime tuning:**

```python
# Check current value
print(db.ef_search)  # 100

# Increase for better recall (slower)
db.ef_search = 200

# Decrease for speed (may reduce recall)
db.ef_search = 50

# Per-query override
results = db.search(query, k=10, ef=300)
```

**Recommended settings by use case:**

| Use Case            | ef_search | Expected Recall |
| ------------------- | --------- | --------------- |
| Fast search (128D)  | 64        | ~85%            |
| Balanced (default)  | 100       | ~90%            |
| High recall (768D+) | 200-300   | ~95%+           |
| Maximum recall      | 500+      | ~98%+           |

## Examples

See complete working examples:

- [`python/examples/quickstart.py`]python/examples/quickstart.py -- Minimal Python example
- [`python/examples/basic.py`]python/examples/basic.py -- CRUD operations and persistence
- [`python/examples/filters.py`]python/examples/filters.py -- All filter operators
- [`python/examples/rag.py`]python/examples/rag.py -- RAG workflow with mock embeddings
- [`python/examples/embedding_fn.py`]python/examples/embedding_fn.py -- Auto-embedding with embedding_fn
- [`python/examples/quantization.py`]python/examples/quantization.py -- SQ8 quantization
- [`node/examples/quickstart.js`]node/examples/quickstart.js -- Minimal Node.js example
- [`node/examples/embedding_fn.js`]node/examples/embedding_fn.js -- Auto-embedding with embeddingFn
- [`node/examples/multivector.ts`]node/examples/multivector.ts -- Multi-vector / ColBERT

## Integrations

### LangChain

```bash
pip install omendb[langchain]
```

```python
from langchain_openai import OpenAIEmbeddings
from omendb.langchain import OmenDBVectorStore

store = OmenDBVectorStore.from_texts(
    texts=["Paris is the capital of France"],
    embedding=OpenAIEmbeddings(),
    path="./langchain_vectors",
)
docs = store.similarity_search("capital of France", k=1)
```

### LlamaIndex

```bash
pip install omendb[llamaindex]
```

```python
from llama_index.core import VectorStoreIndex, Document, StorageContext
from omendb.llamaindex import OmenDBVectorStore

vector_store = OmenDBVectorStore(path="./llama_vectors")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    [Document(text="OmenDB is fast")],
    storage_context=storage_context,
)
response = index.as_query_engine().query("What is OmenDB?")
```

## License

[Elastic License 2.0](LICENSE) -- Free to use, modify, and embed. The only restriction: you can't offer OmenDB as a managed service to third parties.