velesdb-core
High-performance vector database engine written in Rust.
Features
- Blazing Fast: Native HNSW with AVX-512/AVX2/NEON SIMD (42.8µs search at 768D, 23.6ns dot product 768D)
- Adaptive Search: Two-phase ef_search that auto-escalates only for hard queries (2-4x faster median)
- Hybrid Search: Combine vector similarity + BM25 full-text search with RRF fusion
- Sparse Vectors: Named sparse vector indexes with DAAT MaxScore search and RRF/RSF fusion
- Streaming Inserts: Bounded-channel ingestion with backpressure and insert-and-search via delta buffer
- Agent Memory SDK: Semantic, Episodic, and Procedural memory with TTL, snapshots, and reinforcement
- Query Plan Cache: Two-tier LRU cache with write-generation invalidation for repeated queries
- Persistent Storage: Memory-mapped files for efficient disk access
- Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Hamming, Jaccard
- ColumnStore Filtering: Up to 130x faster than JSON filtering at scale
- VelesQL: SQL-like query language with MATCH support for graph pattern queries
- Bulk Operations: Optimized batch insert with turbo/fast modes and parallel HNSW indexing
- Quantization: SQ8 (4x), Binary (32x), Product Quantization (8-32x), RaBitQ compression
Installation
Quick Start
use ;
use json;
Distance Metrics
All 5 metrics are available via DistanceMetric enum:
use DistanceMetric;
// Text embeddings (normalized vectors)
let cosine = Cosine;
// Image features, spatial data
let euclidean = Euclidean;
// Pre-normalized vectors, MIPS
let dot = DotProduct;
// Binary vectors, fingerprints, LSH
let hamming = Hamming;
// Set similarity, sparse vectors, tags
let jaccard = Jaccard;
| Metric | Use Case | Score Interpretation |
|---|---|---|
Cosine |
Text embeddings | Higher = more similar |
Euclidean |
Spatial data | Lower = more similar |
DotProduct |
MIPS, pre-normalized | Higher = more similar |
Hamming |
Binary vectors | Lower = more similar |
Jaccard |
Set similarity | Higher = more similar |
Common Embedding Dimensions
| Model | Dimension | Metric |
|---|---|---|
OpenAI text-embedding-3-small |
1536 | Cosine |
OpenAI text-embedding-3-large |
3072 | Cosine |
Sentence-Transformers all-MiniLM-L6-v2 |
384 | Cosine |
Cohere embed-english-v3.0 |
1024 | Cosine |
BAAI bge-large-en-v1.5 |
1024 | Cosine |
| CLIP (image+text) | 512 or 768 | Cosine |
The dimension parameter must match your embedding model's output size exactly.
Bulk Operations
For high-throughput import (3,300+ vectors/sec):
use ;
let db = open?;
db.create_collection?;
let collection = db.get_collection
.ok_or?;
// Generate 10,000 vectors
let points: =
.map
.collect;
// Bulk insert with parallel HNSW indexing
let inserted = collection.upsert_bulk?;
println!;
// Explicit flush for durability (optional)
collection.flush?;
Durability semantics
store/upsertupdate in-memory/WAL state for performance.flush()is the explicit durability barrier for crash-consistent persistence.- Destructor-based cleanup is best-effort and should not be used as a commit boundary.
Memory-Efficient Storage (Quantization)
use ;
let db = open?;
// SQ8: 4x memory reduction, ~1% recall loss
db.create_collection_with_options?;
// Binary: 32x memory reduction, ~5-10% recall loss (IoT/Edge)
db.create_collection_with_options?;
// Product Quantization: variable compression
db.create_collection_with_options?;
// RaBitQ: randomized binary quantization
db.create_collection_with_options?;
Performance
Vector Operations (768D)
| Operation | Time | Throughput |
|---|---|---|
| Dot Product | 23.6 ns | 32.5 Gelem/s |
| Euclidean Distance | 22.7 ns | 33.8 Gelem/s |
| Cosine Similarity | 33.6 ns | 22.9 Gelem/s |
| Hamming Distance | 34.3 ns | — |
| Jaccard Similarity | 29.3 ns | — |
Measured March 2026 on Intel Core i9-14900KF, 64GB DDR5, Rust 1.92.0, --release, sequential on idle machine.
System Benchmarks (10K vectors, 768D)
| Benchmark | Result |
|---|---|
| HNSW Search | 42.8 µs (k=10, Balanced mode) |
| VelesQL Cache Hit | 1.06 µs (~943K QPS) |
| Sparse Search | 958 µs (MaxScore DAAT) |
| Recall@10 (Accurate) | 100% |
Key Performance Features
- Search latency: 42.8µs for 10K/768D vectors (k=10)
- Insert throughput: 3.8-7x faster than pgvector (10K-100K vectors, benchmark)
- ColumnStore filtering: faster than JSON scanning at scale
Recall by Configuration (Native Rust, Criterion)
| Config | Mode | ef_search | Recall@10 | Latency P50 | Status |
|---|---|---|---|---|---|
| 10K/128D | Balanced | 128 | 98.8% | 85µs | ✅ |
| 10K/128D | Accurate | 512 | 100% | 112µs | ✅ |
| 10K/128D | Perfect | 4096 | 100% | 163µs | ✅ |
| 10K/128D | Adaptive | 32-512 | 95%+ | ~40µs (easy) | ✅ |
Latency P50 = median over 100 queries. The headline "42.8µs" is for 10K/768D Balanced — higher dimensions use SIMD more efficiently. 128D benchmarks above are worst-case for recall measurement.
📊 Benchmark kit: See benchmarks/ for reproducible tests.
Understanding Collections & Metrics
Metric is Set at Collection Level
VelesDB is not a relational database. Each collection has:
- ONE vector column with a fixed dimension
- ONE distance metric (immutable after creation)
- JSON metadata (payload) for each point
// Create collection with Cosine metric (for text embeddings)
db.create_collection?;
// Create collection with Hamming metric (for binary vectors)
db.create_collection?;
// The metric is fixed - you cannot change it after creation
// To use a different metric, create a new collection
Metadata (Payload) Format
Metadata is stored as JSON (serde_json::Value). Any valid JSON structure is supported:
use json;
// Simple flat metadata
let point1 = new;
// Nested metadata
let point2 = new;
// No metadata
let point3 = without_payload;
Querying with VelesQL
VelesQL is a SQL-like query language. The distance metric is always the one defined at collection creation.
JOIN runtime limit:
JOIN ... USING (...)currently supports one column only.
Multi-columnUSING (a, b, ...)is parsed but rejected at execution time.
-- Vector similarity search
SELECT * FROM docs WHERE VECTOR NEAR [0.1, 0.2, ...] LIMIT 5;
-- With parameter (for API)
SELECT * FROM docs WHERE VECTOR NEAR $query LIMIT 10;
-- Full-text search (BM25)
SELECT * FROM docs WHERE content MATCH 'rust programming' LIMIT 10;
-- Hybrid (vector + text)
SELECT * FROM docs
WHERE VECTOR NEAR $query AND content MATCH 'rust'
LIMIT 5;
Querying Metadata
Metadata fields can be filtered with standard SQL operators:
-- Equality
SELECT * FROM docs WHERE category = 'tech' LIMIT 10;
-- Comparison operators
SELECT * FROM docs WHERE views > 1000 LIMIT 10;
SELECT * FROM docs WHERE price >= 50 AND price <= 200 LIMIT 10;
-- String patterns
SELECT * FROM docs WHERE title LIKE '%rust%' LIMIT 10;
-- IN list
SELECT * FROM docs WHERE category IN ('tech', 'science', 'ai') LIMIT 10;
-- BETWEEN (inclusive)
SELECT * FROM docs WHERE score BETWEEN 0.5 AND 1.0 LIMIT 10;
-- NULL checks
SELECT * FROM docs WHERE author IS NOT NULL LIMIT 10;
-- Combine vector + metadata filters
SELECT * FROM docs
WHERE VECTOR NEAR [0.1, 0.2, ...]
AND category = 'tech'
AND views > 100
LIMIT 5;
WITH Clause (Query Options)
Override search parameters on a per-query basis:
-- Set search mode
SELECT * FROM docs WHERE VECTOR NEAR $v LIMIT 10
WITH (mode = 'accurate');
-- Set ef_search and timeout
SELECT * FROM docs WHERE VECTOR NEAR $v LIMIT 10
WITH (ef_search = 512, timeout_ms = 5000);
| Option | Type | Description |
|---|---|---|
mode |
string | fast, balanced, accurate, perfect, adaptive |
ef_search |
integer | HNSW ef_search (higher = better recall) |
timeout_ms |
integer | Query timeout in milliseconds |
rerank |
boolean | Enable result reranking |
Available Filter Operators
| Operator | SQL Syntax | Example |
|---|---|---|
| Equal | = |
category = 'tech' |
| Not Equal | != or <> |
status != 'draft' |
| Greater Than | > |
views > 1000 |
| Greater or Equal | >= |
price >= 50 |
| Less Than | < |
score < 0.5 |
| Less or Equal | <= |
rating <= 3 |
| IN | IN (...) |
tag IN ('a', 'b') |
| BETWEEN | BETWEEN ... AND |
age BETWEEN 18 AND 65 |
| LIKE | LIKE |
name LIKE '%john%' |
| IS NULL | IS NULL |
email IS NULL |
| IS NOT NULL | IS NOT NULL |
phone IS NOT NULL |
| Full-text | MATCH |
content MATCH 'rust' |
Sparse Vector Search
VelesDB supports sparse vectors (e.g., SPLADE, BM25 term weights) alongside dense embeddings. You can store named sparse vectors per point, search them independently, or combine dense+sparse results using Reciprocal Rank Fusion (RRF).
Upserting points with sparse vectors
use BTreeMap;
use ;
use SparseVector;
let db = open?;
db.create_collection?;
let collection = db.get_collection
.ok_or?;
// Build a sparse vector from (term_index, weight) pairs
let sparse = new;
// Attach named sparse vectors to a point
let mut sparse_map = new;
sparse_map.insert; // "" = default sparse index
let point = with_sparse;
collection.upsert?;
# Ok::
Sparse-only search (DAAT MaxScore)
The sparse search engine uses a DAAT (Document-At-A-Time) MaxScore algorithm for fast top-k retrieval by inner product. It automatically falls back to linear scan for high-coverage queries.
# use SparseVector;
// Build a query with term weights
let query = new;
// Search the default sparse index for top-5 results
let results = collection.sparse_search_default?;
for result in &results
# Ok::
Hybrid dense+sparse with RRF fusion
Combine dense vector search (HNSW) with sparse term matching. Both branches run in parallel via rayon, then results are fused using Reciprocal Rank Fusion (RRF) or Relative Score Fusion (RSF).
# use SparseVector;
# use FusionStrategy;
let dense_query = vec!;
let sparse_query = new;
// RRF fusion with default k=60
let strategy = rrf_default;
let results = collection.hybrid_sparse_search?;
for result in &results
# Ok::
You can also use RelativeScore fusion for explicit weight control:
# use FusionStrategy;
// 70% dense, 30% sparse (validated constructor)
let strategy = relative_score?;
Fusion types and parameters
| Type | Path | Description |
|---|---|---|
SparseVector |
velesdb_core::sparse_index |
Sorted (u32 index, f32 weight) pairs; deduplicates and filters zeros on construction |
FusionStrategy |
velesdb_core |
RRF { k }, RelativeScore { dense_weight, sparse_weight } |
ScoredDoc |
velesdb_core::sparse_index |
Raw sparse search result: doc_id: u64, score: f32 |
| Method | On | Description |
|---|---|---|
sparse_search_default(query, k) |
Collection |
Sparse search on the default ("") index |
sparse_search_named(query, k, name) |
Collection |
Sparse search on a named index |
hybrid_sparse_search(dense, sparse, k, strategy) |
Collection |
Dense + sparse with fusion |
hybrid_sparse_search_with_filter(dense, sparse, k, strategy, filter) |
Collection |
Same with metadata filter |
Streaming Inserts
For high-throughput, continuously arriving data (IoT sensors, live embeddings, log streams),
StreamIngester provides a bounded-channel ingestion pipeline with automatic micro-batch
flushing and backpressure signaling.
Basic usage
use ;
use Point;
// Configure the pipeline
let config = StreamingConfig ;
// `collection` is a Collection obtained from db.get_collection(...)
let ingester = new;
// Send points — returns immediately
let point = new;
match ingester.try_send
// Gracefully drain remaining points before shutdown
ingester.shutdown.await;
Backpressure
try_send is non-blocking. When the bounded channel is at capacity, it returns
BackpressureError::BufferFull -- the caller should retry after a short delay or
drop the point. If the background drain task exits unexpectedly, DrainTaskDead is
returned.
Delta buffer (insert-and-search)
During an HNSW rebuild, newly inserted vectors are not yet in the index. The delta buffer accumulates these vectors and merges them into search results via brute-force scan, so freshly inserted data is searchable immediately without waiting for the rebuild to complete.
// The delta buffer is managed automatically by the streaming pipeline.
// When active, search results transparently include delta-buffered vectors.
let results = collection.search?;
// ^ includes both HNSW-indexed and delta-buffered vectors
Agent Memory Patterns
The Agent Memory SDK provides three memory subsystems designed for AI agent workloads: chatbots, RAG pipelines, and autonomous learning agents. Each memory type is backed by VelesDB collections with vector similarity search, TTL-based expiration, and snapshot persistence.
Initialization
use Arc;
use Database;
use AgentMemory;
let db = new;
let memory = new?;
# Ok::
Semantic Memory (long-term knowledge)
Stores facts as vector embeddings for similarity-based retrieval. Use this for RAG knowledge bases, persistent world knowledge, or any data your agent should "know" long-term.
// Store a fact
let embedding = vec!; // from your embedding model
memory.semantic.store?;
// Query by similarity
let query_embedding = vec!;
let results = memory.semantic.query?;
for in &results
Episodic Memory (event timeline)
Records events with timestamps for temporal and similarity-based retrieval. Use this for conversation history, user interaction logs, or any time-sequenced data.
// Record an event
let timestamp = 1710000000_i64; // Unix timestamp
let embedding = vec!;
memory.episodic.record?;
// Retrieve recent events
let recent = memory.episodic.recent?;
for in &recent
// Recall similar events
let results = memory.episodic.recall_similar?;
Procedural Memory (learned patterns)
Stores action sequences with confidence scoring and reinforcement learning. Use this for agents that learn from experience -- task automation, decision-making, or any workflow where past success/failure should influence future behavior.
// Learn a procedure
let steps = vec!;
let embedding = vec!;
memory.procedural.learn?;
// Recall matching procedures (min confidence 0.5)
let matches = memory.procedural.recall?;
for m in &matches
// Reinforce after success/failure
memory.procedural.reinforce?; // increases confidence
memory.procedural.reinforce?; // decreases confidence
TTL, eviction, and snapshots
// Set TTL on individual entries
memory.set_semantic_ttl; // expires in 1 hour
memory.set_episodic_ttl; // expires in 24 hours
// Run periodic expiration
let stats = memory.auto_expire?;
println!;
// Evict low-confidence procedures
let evicted = memory.evict_low_confidence_procedures?;
// Snapshot and restore
let memory = memory
.with_snapshots // keep last 5 snapshots
.with_eviction_config;
let version = memory.snapshot?;
memory.load_snapshot_version?;
When to use each memory type
| Memory Type | Use Case | Example |
|---|---|---|
| Semantic | Persistent knowledge that rarely changes | RAG knowledge base, world facts, documentation |
| Episodic | Time-sequenced events and interactions | Chat history, user sessions, audit logs |
| Procedural | Learned behaviors that improve over time | Task automation, decision trees, API call patterns |
Agent Memory types
| Type | Description |
|---|---|
AgentMemory |
Unified interface; holds SemanticMemory, EpisodicMemory, ProceduralMemory |
SemanticMemory |
store(id, content, embedding), query(embedding, k) returns Vec<(id, score, content)> |
EpisodicMemory |
record(id, description, timestamp, embedding), recent(limit, since), recall_similar(embedding, k) |
ProceduralMemory |
learn(id, name, steps, embedding, confidence), recall(embedding, k, min_confidence), reinforce(id, success) |
ProcedureMatch |
Result struct: id, name, steps: Vec<String>, confidence: f32, score: f32 |
| EvictionConfig | consolidation_age_threshold: u64, min_confidence_threshold: f32, max_entries_per_cycle: usize |
| SnapshotManager | new(dir, max_snapshots) -- versioned state persistence with automatic rotation |
| ExpireResult | Returned by auto_expire(): semantic_expired, episodic_expired, episodic_consolidated counts |
Default embedding dimension is 384 (configurable via AgentMemory::with_dimension(db, dim)).
Query Plan Cache
VelesDB automatically caches compiled query plans in a two-tier LRU cache (L1 lock-free + L2 LRU). Repeated queries skip parsing and planning entirely when the cache key matches.
How it works
- Automatic: The cache is enabled by default on every
Databaseinstance. No configuration required. - Write-generation invalidation: Each collection tracks a monotonic write generation counter. When data is inserted, updated, or deleted, the generation increments. Cached plans whose key includes a stale generation are automatically bypassed -- no explicit invalidation needed.
- LRU eviction: The cache has bounded capacity. Least-recently-used plans are evicted when the cache is full.
Inspecting cache behavior with EXPLAIN
The EXPLAIN output includes cache_hit and plan_reuse_count fields that show whether
a query plan was served from the cache:
EXPLAIN SELECT * FROM docs WHERE VECTOR NEAR $v LIMIT 10;
cache_hit: true-- the plan was found in cache (parsing and planning were skipped).cache_hit: false-- cache miss; a fresh plan was compiled and inserted into the cache.plan_reuse_count-- how many times this cached plan has been reused across all callers.
Cache metrics
let metrics = db.plan_cache.metrics;
println!;
println!;
Cache types and parameters
| Type | Path | Description |
|---|---|---|
CompiledPlanCache |
velesdb_core::cache |
Two-tier cache (L1 lock-free DashMap + L2 LRU). Default: 1K L1 / 10K L2 entries |
PlanKey |
velesdb_core::cache |
Cache key: query_hash: u64, schema_version: u64, collection_generations: SmallVec<[u64; 4]> |
CompiledPlan |
velesdb_core::cache |
Cached plan: plan: QueryPlan, referenced_collections: Vec<String>, reuse_count: AtomicU64 |
PlanCacheMetrics |
velesdb_core::cache |
hits(), misses(), hit_rate() -> f64 (ratio 0.0--1.0) |
| Method | On | Description |
|---|---|---|
plan_cache() |
Database |
Returns &CompiledPlanCache |
plan_cache().metrics() |
CompiledPlanCache |
Returns &PlanCacheMetrics |
plan_cache().stats() |
CompiledPlanCache |
Returns LockFreeCacheStats (L1/L2 sizes, hit counts) |
Public API Reference
// Core types
use ;
// Sparse vectors and fusion
use SparseVector; // Sparse vector (indices + weights)
use FusionStrategy; // RRF, RelativeScore, Average, Maximum, Weighted
// Streaming ingestion
use ;
// Agent memory
use ;
// Index types
use ;
// Query plan cache
use ;
// Filtering
use ;
// Quantization
use ;
// Metrics
use ;
License
VelesDB Core License 1.0
See LICENSE for details.