claw-vector
claw-vector is the semantic memory engine behind ClawDB. It combines SQLite-backed metadata and FTS5 keyword search, memory-mapped vector persistence, adaptive Flat and HNSW indexing, hybrid retrieval, metadata filtering, reranking, and a Python embedding microservice exposed over gRPC and HTTP.
What It Provides
- Persistent vector storage with SQLite metadata and mmap-backed raw vectors.
- Automatic index selection: Flat for small collections, HNSW once collections grow.
- ANN, filtered, and hybrid vector plus keyword retrieval.
- Search responses that include both ranked hits and execution metrics.
- A Python embedding service built on sentence-transformers with Prometheus metrics and health probes.
- Rust library and gRPC server entrypoints for application integration.
Crate At A Glance
- Crate name:
claw-vector - Current crate version:
0.1.1 - Rust edition:
2021 - MSRV target:
1.75 - Primary entrypoint:
VectorEngine - Binary entrypoint:
claw-vector-server
Install from crates.io:
[]
= "0.1.1"
Or add from CLI:
Architecture
+----------------------------------+
| Python embedding service |
Text ingest and queries -->| sentence-transformers / ONNX |
| gRPC :50051 HTTP :8080 |
+----------------+-----------------+
|
v
+------------------------+ +------------------------------+
| VectorEngine |---->| CollectionManager |
| Rust API and gRPC | | lifecycle, persistence, |
| ANN and hybrid search | | index selection, restore |
+-----------+------------+ +-----------+------------------+
| |
v v
+--------------------------+ +--------------------------+
| ANN and rerank pipeline | | SQLite metadata store |
| Flat or HNSW | | collections, records, |
| filters, hybrid fusion | | UUID mapping, FTS5 |
+-------------+------------+ +-------------+------------+
| |
v v
+------------------+ +---------------------+
| mmap vector file | | persisted indexes |
| raw f32 payloads | | Flat and HNSW state |
+------------------+ +---------------------+
Storage Model
Collection definitions, text payloads, metadata, UUID mappings, and keyword-search state live in SQLite. The database is opened in WAL mode and schema migrations are applied automatically on startup.
Raw vectors are stored separately in fixed-slot memory-mapped files. That keeps metadata queries cheap while letting the engine hydrate vectors only when a request actually asks for them or when reranking needs them.
Each collection starts on a Flat index and automatically switches to HNSW around 1,000 vectors. The collection manager persists the chosen index type so reopen and restore follow the same search path that was active before shutdown.
Search Model
The Rust API returns a SearchResponse:
SearchMetrics includes query dimensionality, candidate counts, post-filter counts, and end-to-end latency in microseconds.
Metadata filters support:
eqgtltcontainsinexistsandornot
Hybrid retrieval combines ANN candidates with SQLite FTS5 keyword hits. The alpha field on HybridQuery controls the blend between vector similarity and keyword relevance, where 1.0 is vector-only and 0.0 is keyword-only.
Available rerankers:
diversityfor MMR-style result diversificationrecencyfor boosting newer recordscompositefor chaining reranking passes
Public Rust API Overview
Core types you will typically interact with:
VectorEnginefor collection lifecycle and query execution.VectorConfigfor runtime configuration and tuning.SearchQueryandHybridQueryfor ANN and vector+keyword retrieval.MetadataFilterfor structured filtering on JSON metadata.BatchSearchQueryandBatchUpsertResultfor bulk operations.
Typical request flow in applications:
- Build
VectorConfigand initializeVectorEngine. - Create or restore a collection.
- Upsert records by text (service embeddings) or raw vectors.
- Query using ANN or hybrid search.
- Inspect
SearchMetricsfor latency and candidate diagnostics. - Close the engine to flush indexes and state.
Quick Start
1. Start the embedding service
With Docker Compose:
Or run it locally:
The service exposes gRPC on 127.0.0.1:50051 and HTTP on 127.0.0.1:8080 by default.
2. Use the Rust engine
use ;
use json;
async
3. Optional: run the Rust gRPC server
The server listens on 0.0.0.0:50051 by default. Override that with CLAW_GRPC_ADDR.
Embedding Service
The Python service loads the embedding model during FastAPI lifespan startup, warms it up, then starts the async gRPC server used by the Rust engine.
HTTP endpoints:
GET /healthGET /readyPOST /embedPOST /batch-embedGET /model-infoGET /metrics
Authentication:
- HTTP requests accept
X-Claw-Api-Key. - gRPC requests accept
x-claw-api-key(Rust server) orauthorization: Bearer <key>(Python embedding service). - When keys are configured, unauthorized requests return
401(HTTP) orUnauthenticated(gRPC).
Example request:
Example response shape:
Configuration Reference
Rust engine
VectorConfig::builder() controls all runtime settings. The path and embedding endpoint settings can also be supplied through environment variables.
| Setting | Default | Source | Description |
|---|---|---|---|
db_path |
claw_vector.db |
builder, CLAW_VECTOR_DB_PATH |
SQLite database file for collections, metadata, and FTS5 state. |
index_dir |
claw_vector_indices |
builder, CLAW_VECTOR_INDEX_DIR |
Directory for persisted index files and mmap vector files. |
embedding_service_url |
http://localhost:50051 |
builder, CLAW_EMBEDDING_URL |
gRPC endpoint for the Python embedding service. |
default_dimensions |
384 |
builder | Default embedding dimensionality. |
ef_construction |
200 |
builder | HNSW build-time recall and speed tradeoff. |
m_connections |
16 |
builder | HNSW graph degree. |
ef_search |
50 |
builder | Default HNSW search breadth. |
max_elements |
1_000_000 |
builder | Maximum vectors per collection index. |
cache_size |
10_000 |
builder | Rust-side embedding LRU cache capacity. |
batch_size |
64 |
builder | Max texts per embedding request. |
embedding_timeout_ms |
5000 |
builder | gRPC timeout for embedding calls. |
num_threads |
available parallelism | builder | Rayon worker count for index operations. |
default_workspace_id |
default |
builder, CLAW_DEFAULT_WORKSPACE_ID |
Default tenant/workspace id used when a request does not provide one. |
api_key_store_path |
claw_vector_auth.db |
builder, CLAW_API_KEY_STORE_PATH |
SQLite database used by Rust gRPC auth key store. |
rate_limit_rps |
100 |
builder, CLAW_RATE_LIMIT_RPS |
Default per-workspace request rate limit for Rust gRPC APIs. |
require_auth |
true in release, false in tests |
builder, CLAW_REQUIRE_AUTH |
Enable or disable auth checks (intended for local development/testing only when false). |
CLAW_GRPC_ADDR |
0.0.0.0:50051 |
env | Listen address for the Rust gRPC server binary. |
Python embedding service
| Environment variable | Default | Description |
|---|---|---|
MODEL_NAME |
sentence-transformers/all-MiniLM-L6-v2 |
Hugging Face model to load. |
DEVICE |
cpu |
Runtime device such as cpu, cuda, or another supported backend. |
GRPC_HOST |
0.0.0.0 |
gRPC bind host. |
GRPC_PORT |
50051 |
gRPC bind port. |
HTTP_HOST |
0.0.0.0 |
HTTP bind host. |
HTTP_PORT |
8080 |
HTTP bind port. |
MAX_BATCH_SIZE |
64 |
Max texts accepted in a single embedding request. |
CACHE_SIZE |
10000 |
In-process embedding cache capacity. |
NORMALIZE_EMBEDDINGS |
true |
Whether embeddings are normalized by default. |
ONNX_MODEL_PATH |
unset | Optional ONNX model path for alternate inference. |
MAX_SEQUENCE_LENGTH |
256 |
Max token length exposed in model metadata responses. |
CLAW_API_KEY |
unset | Single API key accepted by Python HTTP/gRPC endpoints. |
CLAW_API_KEYS |
unset | Comma-separated list of API keys accepted by Python HTTP/gRPC endpoints. |
EMBED_RATE_LIMIT_PER_MINUTE |
200 |
Per-API-key HTTP request limit for /embed. |
Testing And Validation
Rust:
Python:
Additional checks commonly used in CI:
The engine test suite covers collection lifecycle, persistence across reopen, hybrid search, metadata filters, embedding cache behavior, and Flat to HNSW migration.
Benchmarks And Performance Targets
Criterion benchmarks live in benches/vector_bench.rs and cover:
- single-vector upsert
- batch upsert of 100 records
- Flat search at 500 vectors
- HNSW search at 1,000 and 100,000 vectors
- filtered ANN search at 10,000 vectors
- embedding cache hit latency
- hybrid search at 1,000 text-backed records
Operational targets for the current implementation:
- HNSW search p99 under 10 ms at 100k vectors
- cache-hit embedding lookups under 1 us
- embedding throughput above 2,000 texts per minute for the default MiniLM model on suitable hardware
Use Criterion directly when you want full benchmark runs:
Development Notes
- Rust protobuf generation is handled in
build.rswith vendoredprotoc, so local protobuf installation is not required. - SQLite migrations are embedded and applied automatically by the Rust store layer.
docker-compose.ymlprovides a ready-to-run embedding service and an optional Rust dev container profile.
Operational Guidance
- Use workspace isolation (
workspace_id) in multi-tenant environments to separate data, indexes, and auth/rate-limit scopes. - Keep vector dimensions fixed per collection; dimension drift should be handled by creating a new collection and reindexing.
- For large ingestion jobs, use batch upserts and tune
ef_construction/m_connectionsbefore initial load. - For low-latency query paths, keep
include_vectorsoff unless callers truly need full vector payloads. - Back up both SQLite files and index/mmap directories to preserve full recall behavior after restore.
Release Notes
Version 0.1.1 updates dependency constraints to the latest compatible direct versions, refreshes lockfile resolutions, and expands crate documentation for integration, operations, and CI validation.