Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
vecgrep
Semantic grep — like ripgrep, but with vector search.
vecgrep uses a local embedding model (all-MiniLM-L6-v2) to search your codebase by meaning rather than exact text matches. The model is embedded directly in the binary — no external services, no API keys, fully offline.
Fast by default. After the first index build, searches return instantly — vecgrep queries the cached index without waiting for re-indexing. Changed files are indexed in the background for next time. Interactive mode (-i) and the HTTP server (--serve) feel real-time: queries take ~5ms, and results update progressively as new files are indexed.
Usage
# Search for a concept
# Search with more results and a lower threshold
# Filter by file type
# Use a code snippet as query to find similar patterns
# Interactive TUI mode
# JSON output for scripting
|
# Combining with ripgrep — semantic search to find files, then exact match
|
# Reverse — use ripgrep to narrow files, then vecgrep to rank by meaning
|
# Index management
More examples
# HTTP server mode (load model once, query via curl)
# => Listening on http://127.0.0.1:8080
# Use with fzf for interactive fuzzy semantic search
&
# Security audit — find input handling code, then grep for dangerous patterns
|
# Find files about a concept and open them in your editor
|
# Count how many files deal with a concept
|
# Count how many chunks in each file relate to error handling
# Filter high-confidence results and format as file:line
|
# Find who wrote security-related code
| | |
# Recent changes to files about database access
|
# Pretty-print matching files with bat
|
# Generate a markdown TODO list from semantic matches
|
# Re-run tests when error-handling code changes
|
How it works
- Walk — discovers files on a background thread using the same engine as ripgrep (
.gitignore-aware, binary detection), streaming them through a bounded channel - Chunk — splits files into overlapping token-window chunks, snapped to line boundaries
- Embed — runs each chunk through the ONNX model to produce a 384-dimensional vector
- Index — caches embeddings in a local SQLite database (
.vecgrep/index.db), keyed by BLAKE3 content hash so only changed files are re-embedded on subsequent runs - Search — computes cosine similarity between your query embedding and all cached chunk embeddings, returns top-k results
Walking and indexing overlap — the embedder processes files as the walker discovers them. Searches run against the cached index immediately; changed files are indexed in the background. Use --full-index to wait for indexing to complete before searching.
Search is a single matrix dot product against cached embeddings loaded in memory — no database in the hot path. This makes interactive mode and the HTTP server responsive enough for on-every-keystroke use.
Why local-only?
vecgrep runs entirely on your machine. There are no API calls, no cloud services, no telemetry. Your code never leaves your computer.
This matters for:
- Privacy — proprietary codebases stay private
- Speed — no network round-trips; search is a local matrix multiply that takes <5ms
- Availability — works offline, on planes, behind firewalls, in air-gapped environments
- Cost — no API fees, no usage limits
Model choice
vecgrep embeds all-MiniLM-L6-v2 directly in the binary. This is a 22M-parameter sentence transformer that produces 384-dimensional embeddings.
Why this model:
- Small and fast — 90 MB (float32 ONNX), runs inference in single-digit milliseconds on CPU. No GPU required.
- Best code-search accuracy at this size — outperforms larger models on our code-search benchmark thanks to strong separation between relevant and irrelevant results.
- Standard BERT architecture — wide ONNX Runtime support across platforms (x86, ARM, with optional CoreML/CUDA acceleration).
- Battle-tested — one of the most downloaded sentence-transformers models, with well-understood behaviour.
The model is downloaded once at build time from HuggingFace, cached locally, and compiled into the binary via include_bytes!. The resulting binary is fully self-contained.
Install
Pre-built binaries for macOS and Linux are available on the releases page. Download the appropriate archive, extract it, and place the vecgrep binary on your PATH.
To build from source:
The first build downloads the ONNX model (~90 MB) from HuggingFace and caches it locally. Subsequent builds reuse the cached model.
Options
vecgrep [OPTIONS] <QUERY> [PATHS]...
Arguments:
<QUERY> Search query (natural language or code snippet)
[PATHS]... Files or directories to search [default: .]
Like ripgrep, you can pass multiple paths. Directories
are walked recursively, respecting .gitignore. Files
are searched directly. The index is scoped to the
project root (discovered via .git/, .vecgrep/, etc.).
Options:
-k, --top-k <N> Number of results [default: 10]
--threshold <F> Minimum similarity 0.0–1.0 [default: 0.3]
-i, --interactive Interactive TUI mode
-t, --type <TYPE> Filter by file type (rust, python, js, ...)
-T, --type-not <TYPE> Exclude file type
-g, --glob <PATTERN> Filter by glob
-C, --context <N> Context lines around match [default: 3]
-j, --threads <N> Indexing threads
-l, --files-with-matches Print only file paths with matches
-c, --count Print count of matching chunks per file
-., --hidden Search hidden files and directories
-L, --follow Follow symbolic links
-d, --max-depth <N> Limit directory traversal depth
--no-ignore Don't respect .gitignore
--type-list Show all supported file types
--color <WHEN> When to use color (auto, always, never)
--reindex Force full re-index
--full-index Wait for indexing to complete before searching
--index-only Build index without searching
--stats Show index statistics
--clear-cache Delete cached index
--show-root Print resolved project root and exit
--json JSONL output (includes "root" field)
--serve Start HTTP server mode
--port <PORT> Port for HTTP server [default: auto]
--chunk-size <N> Tokens per chunk [default: 500]
--chunk-overlap <N> Overlap tokens [default: 100]
Integrations
- vecgrep.nvim — Neovim plugin for semantic search via vecgrep's
--servemode
Environment variables
VECGREP_MODEL_CACHE— override model cache directory (default: system cache dir)VECGREP_LOG— enable debug logging, e.g.VECGREP_LOG=debug
License
MIT