veles-core
Core library for Veles — fast, hybrid (BM25 + semantic) local code search for AI agents and humans, written in pure Rust.
veles-core is the indexing and search engine. It walks a directory,
chunks source files, builds a BM25 inverted index plus a dense
model2vec-rs embedding
index, and serves hybrid queries with Reciprocal Rank Fusion.
Tree-sitter is used to extract definitions for symbol-level lookups.
- No GPU, no transformer forward pass at query time. Static embeddings keep query latency in the tens of milliseconds on CPU.
- Persistent on-disk index under
<repo>/.veles/, with incremental updates that reuse embeddings of unchanged files. - Identifier-aware tokeniser that splits camelCase, snake_case, and mixed-script identifiers; multilingual model option for Cyrillic, CJK, Arabic, etc.
- Pure Rust — no Python, no
protoc, no native ML runtime required.
Install
[]
= "0.2"
Quick start
use Path;
use ;
#
The first build downloads the default embedding model (~64 MB) into the
HuggingFace cache (~/.cache/huggingface/hub/).
Persistence and incremental updates
use Path;
use VelesIndex;
#
What's in the box
| Module | Purpose |
|---|---|
veles_index |
The main VelesIndex (index + search + persist + symbols). |
chunker |
Line-based source chunking with overlap. |
tokenizer |
Identifier-aware tokeniser (camelCase / snake_case / Unicode). |
index::sparse |
BM25 inverted index over a corpus of tokenised docs. |
index::dense |
Brute-force cosine similarity (rayon-parallel, top-k via min-heap). |
index::search |
Semantic / BM25 / hybrid (RRF) search entry points. |
ranking |
Definition boosts, file-saturation decay, path penalties. |
symbols |
Tree-sitter symbols for Rust, Python, JavaScript, TypeScript, Go. |
persist |
On-disk format under .veles/ (manifest + bincode). |
walker |
.gitignore-aware file walker. |
model |
model2vec-rs loader (default + multilingual). |
See also
- The end-user CLI documentation
and USAGE.md
reference for the
velesbinary. veles-grpc— tonic-based gRPC service wrappingveles-core.veles-mcp— MCP server exposingsearchandfind_relatedover JSON-RPC for AI agents.
License
MIT