embedvec — High-Performance Embedded Vector Database
The fastest pure-Rust vector database — HNSW indexing, SIMD acceleration, E8 and H4 lattice quantization, and flexible persistence (Sled, RocksDB, or PostgreSQL/pgvector).
Why embedvec Over the Competition?
| Feature | embedvec | Qdrant | Milvus | Pinecone | pgvector |
|---|---|---|---|---|---|
| Deployment | Embedded (in-process) | Server | Server | Cloud-only | PostgreSQL extension |
| Language | Pure Rust | Rust | Go/C++ | Proprietary | C |
| Latency | <1ms p99 | 2-10ms | 5-20ms | 10-50ms | 2-5ms |
| Memory (1M 768d) | ~196MB (H4) / ~120MB (E8) | ~3GB | ~3GB | N/A | ~3GB |
| Zero-copy | ✓ | ✗ | ✗ | ✗ | ✗ |
| SIMD | AVX2/FMA | AVX2 | AVX2 | Unknown | ✗ |
| Quantization | E8 + H4 lattice (SOTA) | Scalar/PQ | PQ/SQ | Unknown | ✗ |
| Python bindings | ✓ (PyO3) | ✓ | ✓ | ✓ | ✓ (psycopg) |
| WASM support | ✓ | ✗ | ✗ | ✗ | ✗ |
Key Advantages
-
10-100× Lower Latency — No network round-trips. embedvec runs in your process. Sub-millisecond queries are the norm, not the exception.
-
Up to 16× Less Memory — E8 and H4 lattice quantization (from QuIP#/QTIP research) achieve 1.25–1.73 bits/dimension with <5% recall loss. Store 1M 768-dim vectors in ~196 MB instead of 3 GB.
-
No Infrastructure — No Docker, no Kubernetes, no managed service bills. Just
cargo add embedvec. Perfect for edge devices, mobile, WASM, and serverless. -
Scale When Ready — Start embedded, then seamlessly migrate to PostgreSQL/pgvector for distributed deployments without changing your code.
-
True Rust Safety — No unsafe FFI, no C++ dependencies (unless you opt into RocksDB). Memory-safe, thread-safe, and panic-free.
When to Use embedvec
| Use Case | embedvec | Server DB |
|---|---|---|
| RAG/LLM apps with <10M vectors | ✓ Best | Overkill |
| Edge/mobile/WASM deployment | ✓ Only option | ✗ |
| Prototype → production path | ✓ Same code | Rewrite needed |
| Multi-tenant SaaS | Consider | ✓ Better |
| >100M vectors | Consider pgvector | ✓ Better |
Why embedvec?
- Pure Rust — No C++ dependencies (unless using RocksDB/pgvector), safe and portable
- Blazing Fast — AVX2/FMA SIMD acceleration, optimized HNSW with O(1) lookups
- Memory Efficient — H4 (~15.7×) and E8 (~24.8×) quantization with <5% recall loss
- Two Lattice Modes — E8 (8D, 240 roots) for maximum compression; H4 (4D, 600-cell) for fast decoding
- Flexible Persistence — Sled (pure Rust), RocksDB (high perf), or PostgreSQL/pgvector (distributed)
- Production Ready — Async API, metadata filtering, batch operations
Benchmarks
All measurements on 768-dimensional vectors. Run cargo bench -- lattice to reproduce.
Lattice Quantization Comparison (768-dim, 100 vectors per batch)
| Metric | None (raw f32) | H4 (600-cell) | E8 (D8 lattice) |
|---|---|---|---|
| Encode / 100 vectors | 15.3 µs | 7.26 ms | 3.29 ms |
| Decode / 100 vectors | 17.5 µs | 249 µs | 1.10 ms |
| Insert / 100 vectors | 32.7 ms | 36.2 ms (+11%) | 905 ms (+27×) |
| Search / 10 queries (ef=64, 10k DB) | 10.3 ms | 0.69 ms | 133 ms |
| Bytes / vector (768-dim) | 3,072 B | 196 B | 124 B |
| Compression ratio | 1× | 15.7× | 24.8× |
| Bits / dimension | 32 | ~1.73 | ~1.25 |
H4 search is fast because HNSW indexes the raw float vector at insert time; the quantized H4 representation is used for storage only. E8 search decodes each candidate during HNSW graph traversal, adding decode overhead per distance call.
Core Operations (768-dim, 10k dataset, AVX2)
| Operation | Time | Throughput |
|---|---|---|
| Search (ef=32) | 3.0 ms | 3,300 queries/sec |
| Search (ef=64) | 4.9 ms | 2,000 queries/sec |
| Search (ef=128) | 16.1 ms | 620 queries/sec |
| Search (ef=256) | 23.2 ms | 430 queries/sec |
| Insert (768-dim, raw) | 32.7 ms/100 | 3,060 vectors/sec |
| Distance (cosine) | 122 ns/pair | 8.2M ops/sec |
| Distance (euclidean) | 108 ns/pair | 9.3M ops/sec |
| Distance (dot product) | 91 ns/pair | 11M ops/sec |
Memory Usage at Scale (768-dim vectors)
| Mode | Bytes/Vector | 100k Vectors | 1M Vectors | Compression |
|---|---|---|---|---|
| Raw f32 | 3,072 B | ~307 MB | ~3.07 GB | 1× |
| H4 | 196 B | ~19.6 MB | ~196 MB | 15.7× |
| E8 10-bit | 124 B | ~12.4 MB | ~124 MB | 24.8× |
Core Features
| Feature | Description |
|---|---|
| HNSW Indexing | Hierarchical Navigable Small World graph for O(log n) ANN search |
| SIMD Distance | AVX2/FMA accelerated cosine, euclidean, dot product |
| E8 Quantization | 8D D8∪D8+½ lattice, 240 roots, ~1.25 bits/dim, 24.8× compression |
| H4 Quantization | 4D 600-cell polytope, 120 vertices, ~1.73 bits/dim, 15.7× compression |
| Metadata Filtering | Composable filters: eq, gt, lt, contains, AND/OR/NOT |
| Triple Persistence | Sled (pure Rust), RocksDB (high perf), or pgvector (PostgreSQL) |
| pgvector Integration | Native PostgreSQL vector search with HNSW/IVFFlat indexes |
| Async API | Tokio-compatible async operations |
| PyO3 Bindings | First-class Python support with numpy interop |
| WASM Support | Feature-gated for browser/edge deployment |
Quick Start — Rust
[]
= "0.6"
= { = "1.0", = ["rt-multi-thread", "macros"] }
= "1.0"
use ;
async
Quick Start — Python
# Create database with H4 quantization (15.7× memory savings, fast decode)
=
=
=
=
=
API Reference
EmbedVec Builder
builder
.dimension // Vector dimension (required)
.metric // Distance metric
.m // HNSW M parameter
.ef_construction // HNSW build parameter
.quantization // None | h4_default() | e8_default()
.persistence // Optional disk persistence
.build
.await?;
Core Operations
| Method | Description |
|---|---|
add(vector, payload) |
Add single vector with metadata |
add_many(vectors, payloads) |
Batch add vectors |
search(query, k, ef_search, filter) |
Find k nearest neighbors |
len() |
Number of vectors |
clear() |
Remove all vectors |
flush() |
Persist to disk (if enabled) |
FilterExpr — Composable Filters
eq
gt
gte
lt
contains
starts_with
in_values
exists
// Boolean composition
eq
.and
.or
Quantization Reference
Choosing a Mode
| Mode | Bits/Dim | Bytes/Vector (768d) | Encode Speed | Decode Speed | Best For |
|---|---|---|---|---|---|
None |
32 | 3,072 B | Instant | Instant | Highest accuracy, max RAM |
H4 |
~1.73 | 196 B | 72 µs/vec | 2.5 µs/vec | Best balance — fast decode, 15.7× compression |
E8 10-bit |
~1.25 | 124 B | 33 µs/vec | 11 µs/vec | Maximum compression, slower search |
H4 — 4D 600-Cell Lattice
// Default: Hadamard preprocessing, reproducible seed
h4_default
// Custom
H4
The H4 quantizer maps each 4D block to the nearest vertex of the regular 600-cell polytope (120 vertices with icosahedral symmetry). Each block is stored as a single u8 index.
- ~1.73 bits/dimension effective
- 15.7× compression vs raw f32 at 768 dimensions
- Fast decode: table lookup + 4D Hadamard inverse (~2.5 µs per vector)
E8 — 8D D8 Lattice
// Default: 10-bit, Hadamard preprocessing
e8_default
// Custom bit-depth
E8
The E8 quantizer uses the D8 ∪ (D8 + ½) double-cover decomposition to find the nearest E8 lattice point per 8D block. Achieves maximum compression density.
- ~1.25 bits/dimension effective
- 24.8× compression vs raw f32 at 768 dimensions
- Slower decode than H4 due to 8D parity reconstruction
E8 and H4 Lattice Quantization
Both quantizers implement the same pipeline:
- Random Sign Preprocessing — Multiply each coordinate by ±1 from a seeded PRNG
- Hadamard Transform — Fast Walsh-Hadamard transform decorrelates coordinates
- Scale Normalization — Global scale factor computed per vector
- Nearest Lattice Point — Exhaustive search over roots (E8: 240, H4: 120)
- Compact Storage — E8: u16 code + f32 scale; H4: u8 index per 4D block + f32 scale
- Asymmetric Search — Query stays FP32; database decoded on-the-fly
Based on QuIP#/NestQuant/QTIP research (2024–2025).
Performance
Projected Performance at Scale
| Operation | ~1M vectors | ~10M vectors | Notes |
|---|---|---|---|
| Query (k=10, ef=128) | 0.4–1.2 ms | 1–4 ms | Cosine, no filter |
| Query + filter | 0.6–2.5 ms | 2–8 ms | Depends on selectivity |
| Memory (None/f32) | ~3.1 GB | ~31 GB | Full precision |
| Memory (H4) | ~196 MB | ~1.96 GB | 15.7× reduction |
| Memory (E8 10-bit) | ~124 MB | ~1.24 GB | 24.8× reduction |
Feature Flags
[]
= { = "0.6", = ["persistence-sled", "async"] }
| Feature | Description | Default |
|---|---|---|
persistence-sled |
On-disk storage via Sled (pure Rust) | ✓ |
persistence-rocksdb |
On-disk storage via RocksDB (higher perf) | ✗ |
persistence-pgvector |
PostgreSQL with native vector search | ✗ |
async |
Tokio async API | ✓ |
python |
PyO3 bindings | ✗ |
simd |
SIMD distance optimizations | ✗ |
wasm |
WebAssembly support | ✗ |
Persistence Backends
Sled (Default)
Pure Rust embedded database.
let db = with_persistence.await?;
RocksDB (Optional)
= { = "0.6", = ["persistence-rocksdb", "async"] }
let config = new
.backend
.cache_size;
let db = with_backend.await?;
pgvector (PostgreSQL)
= { = "0.6", = ["persistence-pgvector", "async"] }
let config = pgvector
.table_name
.index_type;
let backend = connect.await?;
Testing
# Lattice comparison benchmarks only
# Full benchmark suite
Roadmap
- v0.6 (current): H4 lattice quantization, E8 fixes, lattice benchmark suite
- v0.7: Delete support, batch queries, LangChain/LlamaIndex integration
- Future: Hybrid sparse-dense, full-text + vector, SIMD-accelerated lattice decode
License
MIT OR Apache-2.0
Contributing
Contributions welcome! Please read CONTRIBUTING.md before submitting PRs.
Acknowledgments
- HNSW algorithm: Malkov & Yashunin (2016)
- E8 quantization: Inspired by QuIP#, NestQuant, QTIP (2024–2025)
- H4 quantization: Regular 600-cell polytope (icosahedral symmetry in ℝ⁴)
- Rust ecosystem: serde, tokio, pyo3, sled
embedvec — The "SQLite of vector search" for Rust-first teams in 2026.