Expand description
veles-core — fast, hybrid (BM25 + semantic) local code search.
veles-core is the indexing and search engine that powers the Veles
CLI, MCP server, and gRPC service. It walks a directory, chunks source
files, builds a BM25 inverted index plus a dense
model2vec-rs embedding index, and serves hybrid queries
using Reciprocal Rank Fusion. Tree-sitter is used to extract
definitions for symbol-level lookups.
Design goals:
- No GPU, no transformer forward pass at query time. Embeddings come from a static model2vec model, so query latency stays in the tens of milliseconds on CPU.
- Persistent on-disk index. Indexes live under
<repo>/.veles/and support incremental updates that reuse embeddings of unchanged files. - Pure Rust. No Python interpreter, no protobuf compiler, no
native ML runtime —
cargo build --releaseis enough.
§Quick start
use std::path::Path;
use veles_core::{SearchMode, VelesIndex};
// Build an index from a directory. The first call downloads the
// default embedding model (~64 MB) into the HuggingFace cache.
let index = VelesIndex::from_path(Path::new("."), None, None, false)?;
// Hybrid (BM25 + semantic) search — the default for most queries.
let results = index.search(
"parse config file",
5,
SearchMode::Hybrid,
None, // alpha — auto-detect from query type
None, // language filter
None, // path filter
);
for r in results {
println!("{} [{:.3}]", r.chunk.location(), r.score);
}§Persistence
Indexes can be saved to and loaded from <repo>/.veles/:
let repo = Path::new(".");
let index = VelesIndex::from_path(repo, None, None, false)?;
index.save(repo)?;
// Later, reload without re-embedding:
let model = veles_core::model::load_model(None)?;
let mut reloaded = VelesIndex::load(repo, model)?;
// Refresh files that changed on disk; unchanged files keep their
// embeddings.
let report = reloaded.update_from_path(repo)?;
eprintln!("{} added, {} modified, {} removed",
report.added_files, report.modified_files, report.removed_files);§Module overview
veles_index— the mainVelesIndextype combining BM25, dense, symbols, and persistence.chunker— line-based source chunking with overlap.tokenizer— identifier-aware tokeniser (camelCase, snake_case, Cyrillic, CJK).index— sparse (index::sparse) and dense (index::dense) indexes,index::searchentry points, andindex::topkselection.ranking— query-type detection, definition boosts, file-path penalties, file-saturation decay.symbols— tree-sitter symbol extraction for Rust, Python, JavaScript, TypeScript, and Go.persist— on-disk format under.veles/.walker—.gitignore-aware file walker (built onignore).model— wrapper aroundmodel2vec-rsfor loading the default and multilingual static embedding models.
Re-exports§
pub use types::Chunk;pub use types::IndexStats;pub use types::SearchMode;pub use types::SearchResult;pub use veles_index::VelesIndex;
Modules§
- chunker
- Source code chunker — splits files into indexable units.
- index
- Index module — dense index, sparse index, search, and the main
VelesIndex. - model
- Model loading wrapper around model2vec-rs.
- persist
- Persistent on-disk index format.
- ranking
- Ranking module — boosting, penalties, and weighting.
- symbols
- Tree-sitter-backed symbol extraction.
- tokenizer
- Tokenizer for BM25 indexing — splits identifiers into sub-tokens.
- types
- Core types shared across the search surface.
- veles_
index - Main
VelesIndex— the central API for indexing and searching code. - walker
- File walker — walks directories, filters by extension, respects .gitignore.