velesdb-core
High-performance vector database engine written in Rust.
Features
- Blazing Fast: HNSW index with explicit SIMD (4x faster than auto-vectorized)
- Hybrid Search: Combine vector similarity + BM25 full-text search with RRF fusion
- Persistent Storage: Memory-mapped files for efficient disk access
- Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Hamming, Jaccard
- ColumnStore Filtering: 122x faster than JSON filtering at scale
- VelesQL: SQL-like query language with MATCH support for full-text search
- Bulk Operations: Optimized batch insert with parallel HNSW indexing
- Quantization: SQ8 (4x) and Binary (32x) memory compression
Installation
Quick Start
use ;
use json;
Distance Metrics
All 5 metrics are available via DistanceMetric enum:
use DistanceMetric;
// Text embeddings (normalized vectors)
let cosine = Cosine;
// Image features, spatial data
let euclidean = Euclidean;
// Pre-normalized vectors, MIPS
let dot = DotProduct;
// Binary vectors, fingerprints, LSH
let hamming = Hamming;
// Set similarity, sparse vectors, tags
let jaccard = Jaccard;
| Metric | Use Case | Score Interpretation |
|---|---|---|
Cosine |
Text embeddings | Higher = more similar |
Euclidean |
Spatial data | Lower = more similar |
DotProduct |
MIPS, pre-normalized | Higher = more similar |
Hamming |
Binary vectors | Lower = more similar |
Jaccard |
Set similarity | Higher = more similar |
Bulk Operations
For high-throughput import (3,300+ vectors/sec):
use ;
let db = open?;
db.create_collection?;
let collection = db.get_collection.unwrap;
// Generate 10,000 vectors
let points: =
.map
.collect;
// Bulk insert with parallel HNSW indexing
let inserted = collection.upsert_bulk?;
println!;
// Explicit flush for durability (optional)
collection.flush?;
Memory-Efficient Storage (Quantization)
use ;
let db = open?;
// SQ8: 4x memory reduction, ~1% recall loss
db.create_collection_with_options?;
// Binary: 32x memory reduction, ~5-10% recall loss (IoT/Edge)
db.create_collection_with_options?;
Performance
Vector Operations (768D)
| Operation | Time | Throughput |
|---|---|---|
| Dot Product | ~38 ns | 26M ops/sec |
| Euclidean Distance | ~47 ns | 21M ops/sec |
| Cosine Similarity | ~83 ns | 12M ops/sec |
| Hamming Distance | ~16 ns | 62M ops/sec |
| Jaccard Similarity | ~90 ns | 11M ops/sec |
End-to-End Benchmark (10k vectors, 768D)
| Metric | pgvectorscale | VelesDB | Speedup |
|---|---|---|---|
| Ingest | 22.3s | 3.0s | 7.4x |
| Search Latency | 52.8ms | 4.0ms | 13x |
| Throughput | 18.9 QPS | 246.8 QPS | 13x |
Key Performance Features
- Search latency: < 5ms for 10k vectors
- Bulk import: 3,300 vectors/sec with
upsert_bulk() - ColumnStore filtering: 122x faster than JSON at 100k items
Recall by Configuration (Native Rust, Criterion)
| Config | Mode | ef_search | Recall@10 | Latency P50 | Status |
|---|---|---|---|---|---|
| 10K/128D | Balanced | 128 | 95.8% | 0.88ms | ✅ |
| 10K/128D | HighRecall | 1024 | 99.4% | 3.0ms | ✅ |
| 10K/128D | Perfect | 2048 | 100.0% | 0.61ms | ✅ |
| 100K/768D | HighRecall | 1024 | 97.0% | 71.5ms | ✅ ≥95% |
| 100K/768D | Perfect | 2048 | 100.0% | 55.4ms | ✅ |
Latency P50 = median over 100 queries. ≥95% recall guaranteed for HighRecall mode.
📊 Benchmark kit: See benchmarks/ for reproducible tests.
Understanding Collections & Metrics
Metric is Set at Collection Level
VelesDB is not a relational database. Each collection has:
- ONE vector column with a fixed dimension
- ONE distance metric (immutable after creation)
- JSON metadata (payload) for each point
// Create collection with Cosine metric (for text embeddings)
db.create_collection?;
// Create collection with Hamming metric (for binary vectors)
db.create_collection?;
// The metric is fixed - you cannot change it after creation
// To use a different metric, create a new collection
Metadata (Payload) Format
Metadata is stored as JSON (serde_json::Value). Any valid JSON structure is supported:
use json;
// Simple flat metadata
let point1 = new;
// Nested metadata
let point2 = new;
// No metadata
let point3 = without_payload;
Querying with VelesQL
VelesQL is a SQL-like query language. The distance metric is always the one defined at collection creation.
-- Vector similarity search
SELECT * FROM docs WHERE VECTOR NEAR [0.1, 0.2, ...] LIMIT 5;
-- With parameter (for API)
SELECT * FROM docs WHERE VECTOR NEAR $query LIMIT 10;
-- Full-text search (BM25)
SELECT * FROM docs WHERE content MATCH 'rust programming' LIMIT 10;
-- Hybrid (vector + text)
SELECT * FROM docs
WHERE VECTOR NEAR $query AND content MATCH 'rust'
LIMIT 5;
Querying Metadata
Metadata fields can be filtered with standard SQL operators:
-- Equality
SELECT * FROM docs WHERE category = 'tech' LIMIT 10;
-- Comparison operators
SELECT * FROM docs WHERE views > 1000 LIMIT 10;
SELECT * FROM docs WHERE price >= 50 AND price <= 200 LIMIT 10;
-- String patterns
SELECT * FROM docs WHERE title LIKE '%rust%' LIMIT 10;
-- IN list
SELECT * FROM docs WHERE category IN ('tech', 'science', 'ai') LIMIT 10;
-- BETWEEN (inclusive)
SELECT * FROM docs WHERE score BETWEEN 0.5 AND 1.0 LIMIT 10;
-- NULL checks
SELECT * FROM docs WHERE author IS NOT NULL LIMIT 10;
-- Combine vector + metadata filters
SELECT * FROM docs
WHERE VECTOR NEAR [0.1, 0.2, ...]
AND category = 'tech'
AND views > 100
LIMIT 5;
Available Filter Operators
| Operator | SQL Syntax | Example |
|---|---|---|
| Equal | = |
category = 'tech' |
| Not Equal | != or <> |
status != 'draft' |
| Greater Than | > |
views > 1000 |
| Greater or Equal | >= |
price >= 50 |
| Less Than | < |
score < 0.5 |
| Less or Equal | <= |
rating <= 3 |
| IN | IN (...) |
tag IN ('a', 'b') |
| BETWEEN | BETWEEN ... AND |
age BETWEEN 18 AND 65 |
| LIKE | LIKE |
name LIKE '%john%' |
| IS NULL | IS NULL |
email IS NULL |
| IS NOT NULL | IS NOT NULL |
phone IS NOT NULL |
| Full-text | MATCH |
content MATCH 'rust' |
Public API Reference
// Core types
use ;
// Index types
use ;
// Filtering
use ;
// Quantization
use ;
// Metrics
use ;
License
Elastic License 2.0 (ELv2)
See LICENSE for details.