Crate veles_core

Expand description

veles-core — fast, hybrid (BM25 + semantic) local code search.

veles-core is the indexing and search engine that powers the Veles CLI, MCP server, and gRPC service. It walks a directory, chunks source files, builds a BM25 inverted index plus a dense model2vec-rs embedding index, and serves hybrid queries using Reciprocal Rank Fusion. Tree-sitter is used to extract definitions for symbol-level lookups.

Design goals:

No GPU, no transformer forward pass at query time. Embeddings come from a static model2vec model, so query latency stays in the tens of milliseconds on CPU.
Persistent on-disk index. Indexes live under <repo>/.veles/ and support incremental updates that reuse embeddings of unchanged files.
Pure Rust. No Python interpreter, no protobuf compiler, no native ML runtime — cargo build --release is enough.

§Quick start

use std::path::Path;
use veles_core::{SearchMode, VelesIndex};

// Build an index from a directory. The first call downloads the
// default embedding model (~64 MB) into the HuggingFace cache.
let index = VelesIndex::from_path(Path::new("."), None, None, false)?;

// Hybrid (BM25 + semantic) search — the default for most queries.
let results = index.search(
    "parse config file",
    5,
    SearchMode::Hybrid,
    None,  // alpha — auto-detect from query type
    None,  // language filter
    None,  // path filter
);

for r in results {
    println!("{} [{:.3}]", r.chunk.location(), r.score);
}

§Persistence

Indexes can be saved to and loaded from <repo>/.veles/:

let repo = Path::new(".");
let index = VelesIndex::from_path(repo, None, None, false)?;
index.save(repo)?;

// Later, reload without re-embedding:
let model = veles_core::model::load_model(None)?;
let mut reloaded = VelesIndex::load(repo, model)?;

// Refresh files that changed on disk; unchanged files keep their
// embeddings.
let report = reloaded.update_from_path(repo)?;
eprintln!("{} added, {} modified, {} removed",
    report.added_files, report.modified_files, report.removed_files);

§Module overview

veles_index — the main VelesIndex type combining BM25, dense, symbols, and persistence.
chunker — line-based source chunking with overlap.
tokenizer — identifier-aware tokeniser (camelCase, snake_case, Cyrillic, CJK).
index — sparse (index::sparse) and dense (index::dense) indexes, index::search entry points, and index::topk selection.
ranking — query-type detection, definition boosts, file-path penalties, file-saturation decay.
symbols — tree-sitter symbol extraction for Rust, Python, JavaScript, TypeScript, and Go.
persist — on-disk format under .veles/.
walker — .gitignore-aware file walker (built on ignore).
model — wrapper around model2vec-rs for loading the default and multilingual static embedding models.

Re-exports§

pub use types::Chunk;
pub use types::IndexStats;
pub use types::SearchMode;
pub use types::SearchResult;
pub use veles_index::VelesIndex;

Modules§

chunker: Source code chunker — splits files into indexable units.
index: Index module — dense index, sparse index, search, and the main VelesIndex.
model: Model loading wrapper around model2vec-rs.
persist: Persistent on-disk index format.
ranking: Ranking module — boosting, penalties, and weighting.
symbols: Tree-sitter-backed symbol extraction.
tokenizer: Tokenizer for BM25 indexing — splits identifiers into sub-tokens.
types: Core types shared across the search surface.
veles_index: Main VelesIndex — the central API for indexing and searching code.
walker: File walker — walks directories, filters by extension, respects .gitignore.

Crate veles_core

Crate veles_core Copy item path

§Quick start

§Persistence

§Module overview

Re-exports§

Modules§

Crate veles_core