embedvec — High-Performance Embedded Vector Database
The fastest pure-Rust vector database — HNSW indexing, SIMD acceleration, E8 quantization, and flexible persistence (Sled, RocksDB, or PostgreSQL/pgvector).
Why embedvec Over the Competition?
| Feature | embedvec | Qdrant | Milvus | Pinecone | pgvector |
|---|---|---|---|---|---|
| Deployment | Embedded (in-process) | Server | Server | Cloud-only | PostgreSQL extension |
| Language | Pure Rust | Rust | Go/C++ | Proprietary | C |
| Latency | <1ms p99 | 2-10ms | 5-20ms | 10-50ms | 2-5ms |
| Memory (1M 768d) | ~500MB (E8) | ~3GB | ~3GB | N/A | ~3GB |
| Zero-copy | ✓ | ✗ | ✗ | ✗ | ✗ |
| SIMD | AVX2/FMA | AVX2 | AVX2 | Unknown | ✗ |
| Quantization | E8 lattice (SOTA) | Scalar/PQ | PQ/SQ | Unknown | ✗ |
| Python bindings | ✓ (PyO3) | ✓ | ✓ | ✓ | ✓ (psycopg) |
| WASM support | ✓ | ✗ | ✗ | ✗ | ✗ |
Key Advantages
-
10-100× Lower Latency — No network round-trips. embedvec runs in your process, not a separate server. Sub-millisecond queries are the norm, not the exception.
-
6× Less Memory — E8 lattice quantization (from QuIP#/QTIP research) achieves ~1.25 bits/dimension with <5% recall loss. Store 1M vectors in 500MB instead of 3GB.
-
No Infrastructure — No Docker, no Kubernetes, no managed service bills. Just
cargo add embedvecand you're done. Perfect for edge devices, mobile, WASM, and serverless. -
Scale When Ready — Start embedded, then seamlessly migrate to PostgreSQL/pgvector for distributed deployments without changing your code.
-
True Rust Safety — No unsafe FFI, no C++ dependencies (unless you opt into RocksDB). Memory-safe, thread-safe, and panic-free.
When to Use embedvec
| Use Case | embedvec | Server DB |
|---|---|---|
| RAG/LLM apps with <10M vectors | ✓ Best | Overkill |
| Edge/mobile/WASM deployment | ✓ Only option | ✗ |
| Prototype → production path | ✓ Same code | Rewrite needed |
| Multi-tenant SaaS | Consider | ✓ Better |
| >100M vectors | Consider pgvector | ✓ Better |
Why embedvec?
- Pure Rust — No C++ dependencies (unless using RocksDB/pgvector), safe and portable
- Blazing Fast — AVX2/FMA SIMD acceleration, optimized HNSW with O(1) lookups
- Memory Efficient — E8 quantization provides 4-6× compression with <5% recall loss
- Flexible Persistence — Sled (pure Rust), RocksDB (high perf), or PostgreSQL/pgvector (distributed)
- Production Ready — Async API, metadata filtering, batch operations
Benchmarks
768-dimensional vectors, 10k dataset, AVX2 enabled:
| Operation | Time | Throughput |
|---|---|---|
| Search (ef=32) | 3.0 ms | 3,300 queries/sec |
| Search (ef=64) | 4.9 ms | 2,000 queries/sec |
| Search (ef=128) | 16.1 ms | 620 queries/sec |
| Search (ef=256) | 23.2 ms | 430 queries/sec |
| Insert (768-dim) | 25.5 ms/100 | 3,900 vectors/sec |
| Distance (cosine) | 122 ns/pair | 8.2M ops/sec |
| Distance (euclidean) | 108 ns/pair | 9.3M ops/sec |
| Distance (dot product) | 91 ns/pair | 11M ops/sec |
Run cargo bench to reproduce on your hardware.
Core Features
| Feature | Description |
|---|---|
| HNSW Indexing | Hierarchical Navigable Small World graph for O(log n) ANN search |
| SIMD Distance | AVX2/FMA accelerated cosine, euclidean, dot product |
| E8 Quantization | Lattice-based compression (4-6× memory reduction) |
| Metadata Filtering | Composable filters: eq, gt, lt, contains, AND/OR/NOT |
| Triple Persistence | Sled (pure Rust), RocksDB (high perf), or pgvector (PostgreSQL) |
| pgvector Integration | Native PostgreSQL vector search with HNSW/IVFFlat indexes |
| Async API | Tokio-compatible async operations |
| PyO3 Bindings | First-class Python support with numpy interop |
| WASM Support | Feature-gated for browser/edge deployment |
Quick Start — Rust
[]
= "0.5"
= { = "1.0", = ["rt-multi-thread", "macros"] }
= "1.0"
use ;
async
Quick Start — Python
# Create database with E8 quantization
=
# Add vectors (numpy array or list-of-lists)
=
=
# Search with filter
=
=
API Reference
EmbedVec Builder
builder
.dimension // Vector dimension (required)
.metric // Distance metric
.m // HNSW M parameter
.ef_construction // HNSW build parameter
.quantization // Or E8 for compression
.persistence // Optional disk persistence
.build
.await?;
Core Operations
| Method | Description |
|---|---|
add(vector, payload) |
Add single vector with metadata |
add_many(vectors, payloads) |
Batch add vectors |
search(query, k, ef_search, filter) |
Find k nearest neighbors |
len() |
Number of vectors |
clear() |
Remove all vectors |
flush() |
Persist to disk (if enabled) |
FilterExpr — Composable Filters
// Equality
eq
// Comparisons
gt
gte
lt
lte
// String operations
contains
starts_with
// Membership
in_values
// Existence
exists
// Boolean composition
eq
.and
.or
Quantization Modes
| Mode | Bits/Dim | Memory/Vector (768d) | Recall@10 |
|---|---|---|---|
None |
32 | ~3.1 KB | 100% |
E8 8-bit |
~1.0 | ~170 B | 92–97% |
E8 10-bit |
~1.25 | ~220 B | 96–99% |
E8 12-bit |
~1.5 | ~280 B | 98–99% |
// No quantization (full f32)
None
// E8 with Hadamard preprocessing (recommended)
E8
// Convenience constructor
e8_default // 10-bit with Hadamard
E8 Lattice Quantization
embedvec implements state-of-the-art E8 lattice quantization based on QuIP#/NestQuant/QTIP research (2024-2025):
- Hadamard Preprocessing: Fast Walsh-Hadamard transform + random signs makes coordinates more Gaussian/i.i.d.
- Block-wise Quantization: Split vectors into 8D blocks, quantize each to nearest E8 lattice point
- Asymmetric Search: Query remains FP32, database vectors decoded on-the-fly during HNSW traversal
- Compact Storage: ~2-2.5 bits per dimension effective
Why E8?
The E8 lattice has exceptional packing density in 8 dimensions, providing better rate-distortion than scalar quantization or product quantization for normalized embeddings typical in LLM/RAG applications.
Performance
Measured Benchmarks (768-dim, 10k vectors, AVX2)
| Operation | Time | Throughput |
|---|---|---|
| Search (ef=32) | 3.0 ms | 3,300 queries/sec |
| Search (ef=64) | 4.9 ms | 2,000 queries/sec |
| Search (ef=128) | 16.1 ms | 620 queries/sec |
| Search (ef=256) | 23.2 ms | 430 queries/sec |
| Insert (768-dim) | 25.5 ms/100 | 3,900 vectors/sec |
| Distance (cosine) | 122 ns/pair | 8.2M ops/sec |
| Distance (euclidean) | 108 ns/pair | 9.3M ops/sec |
| Distance (dot product) | 91 ns/pair | 11M ops/sec |
Projected Performance at Scale
| Operation | ~1M vectors | ~10M vectors | Notes |
|---|---|---|---|
| Query (k=10, ef=128) | 0.4–1.2 ms | 1–4 ms | Cosine, no filter |
| Query + filter | 0.6–2.5 ms | 2–8 ms | Depends on selectivity |
| Memory (FP32) | ~3.1 GB | ~31 GB | Full precision |
| Memory (E8-10bit) | ~0.5 GB | ~5 GB | 4-6× reduction |
Feature Flags
[]
= { = "0.5", = ["persistence-sled", "async"] }
| Feature | Description | Default |
|---|---|---|
persistence-sled |
On-disk storage via Sled (pure Rust) | ✓ |
persistence-rocksdb |
On-disk storage via RocksDB (higher perf) | ✗ |
persistence-pgvector |
PostgreSQL with native vector search | ✗ |
async |
Tokio async API | ✓ |
python |
PyO3 bindings | ✗ |
simd |
SIMD distance optimizations | ✗ |
wasm |
WebAssembly support | ✗ |
Persistence Backends
embedvec supports three persistence backends:
Sled (Default)
Pure Rust embedded database. Good default for most use cases.
use ;
// Simple path-based persistence (uses Sled)
let db = with_persistence.await?;
// Or via builder
let db = builder
.dimension
.persistence
.build
.await?;
RocksDB (Optional)
Higher performance LSM-tree database. Better for write-heavy workloads and large datasets.
[]
= { = "0.5", = ["persistence-rocksdb", "async"] }
use ;
// Configure RocksDB backend
let config = new
.backend
.cache_size; // 256MB cache
let db = with_backend.await?;
pgvector (PostgreSQL) — Scale to Billions
Native PostgreSQL vector search using the pgvector extension. Best for:
- Distributed deployments across multiple nodes
- Existing PostgreSQL infrastructure (no new services)
- SQL access to vectors alongside relational data
- Teams already familiar with PostgreSQL operations
- Scaling beyond 10M vectors with horizontal sharding
[]
= { = "0.5", = ["persistence-pgvector", "async"] }
Prerequisites: PostgreSQL 15+ with pgvector extension installed:
CREATE EXTENSION vector;
use ;
use PgVectorBackend;
// Configure pgvector backend
let config = pgvector
.table_name // optional, default: "embedvec_vectors"
.index_type; // "hnsw" (default) or "ivfflat"
// Connect (auto-creates table and index)
let backend = connect.await?;
// Insert vectors with JSONB metadata
backend.insert_vector.await?;
// Native vector search (executed in PostgreSQL)
let results = backend.search_vectors.await?;
for in results
// Other operations
let count = backend.count.await?;
backend.delete_vector.await?;
backend.clear.await?;
Why pgvector with embedvec?
| Aspect | embedvec + pgvector | Raw pgvector |
|---|---|---|
| Setup | Auto-creates tables/indexes | Manual SQL |
| API | Rust-native async | SQL strings |
| Metadata | Typed JSONB | Manual casting |
| Connection | Pooled (sqlx) | Manual management |
| Migration | Same API as Sled/RocksDB | N/A |
pgvector features:
- HNSW indexes — Faster queries, tunable
ef_search(default: 128) - IVFFlat indexes — Better for bulk loading, lower memory
- Cosine similarity —
<=>operator for normalized embeddings - JSONB metadata — Query vectors with SQL WHERE clauses
- Auto-provisioning — Tables and indexes created on connect
- Connection pooling — Up to 10 concurrent connections via sqlx
Index comparison:
| Index | Build Time | Query Time | Memory | Best For |
|---|---|---|---|---|
| HNSW | Slower | Faster | Higher | Real-time queries |
| IVFFlat | Faster | Slower | Lower | Batch workloads |
Testing
# Run all tests
# Run with specific features
# Run benchmarks
Benchmarking
# Install criterion
# Run benchmarks
# Memory profiling (requires jemalloc)
Roadmap
- v0.5 (current): E8 quantization stable + persistence
- v0.6: Binary/PQ fallback, delete support, batch queries
- v0.7: LangChain/LlamaIndex official integration
- Future: Hybrid sparse-dense, full-text + vector
License
MIT OR Apache-2.0
Contributing
Contributions welcome! Please read CONTRIBUTING.md before submitting PRs.
Acknowledgments
- HNSW algorithm: Malkov & Yashunin (2016)
- E8 quantization: Inspired by QuIP#, NestQuant, QTIP (2024-2025)
- Rust ecosystem: serde, tokio, pyo3, sled
embedvec — The "SQLite of vector search" for Rust-first teams in 2026.