ripvec
Cacheless semantic code + document search. One binary, 19 grammars, one static-encoder engine, zero setup.
ripvec finds code and documents by meaning, provides structural code intelligence across every language it knows, and ranks results by how important each file is in your project. It runs CPU-only, holds no on-disk index, and matches or exceeds transformer baselines on our benchmark matrix across code and prose.
)
The function is called with_retry, the variable is delay. "exponential backoff" appears nowhere in the source. grep can't find this. ripvec can, because it embeds both your query and the code into the same vector space, fuses semantic scores with path-enriched BM25, layers a structural-importance signal from a PageRank percentile boost, and reranks the top candidates through a cross-encoder.
When to use what
ripvec has three interfaces. Here's when each one matters:
| Interface | When to use it | Who uses it |
|---|---|---|
CLI (ripvec "query" .) |
Terminal search, one-shot queries | You, directly |
MCP server (ripvec-mcp) |
AI agent needs to search or understand your codebase | Claude Code, Cursor, any MCP client |
LSP server (ripvec-mcp --lsp) |
Editor/agent needs symbols, definitions, diagnostics | Claude Code's LSP tool, editors |
The MCP server gives AI agents 8 semantic + structural tools plus 9 LSP tools. The LSP server gives editors structural intelligence (outlines, go-to-definition, syntax diagnostics) for all 19 languages from one binary. The CLI is for humans. Same binary for all three.
If you're using Claude Code, install the plugin. It sets up both MCP and LSP automatically; Claude will use search_code when you ask conceptual questions and the LSP for symbol navigation.
Engine
ripvec uses a single retrieval engine across the CLI, MCP server, and LSP server:
Model2Vec static bi-encoder (minishlab/potion-base-32M, 256-dim) + path-enriched BM25 + function-level PageRank percentile boost + TinyBERT-L-2 cross-encoder rerank (gated by corpus class).
The engine is in-memory per session -- no on-disk index or persistent cache. Sub-MCPs, fresh worktrees, agent fan-out, and document archives all work without setup.
Quality and speed
Two reproducible benchmarks anchor ripvec's behavior, both run from a fresh checkout via cargo run --release --example corpus_bench. The corpora and query / target-file annotations are checked in under tests/corpus/.
Code corpus
Workload: tests/corpus/code, ~2 GB across nine codebases (tokio, redis, react, spring-boot, go, linux, ripgrep, flask, express). Query set: 20 architectural and semantic queries against tokio with file-level ground truth (tests/corpus/annotations/tokio.json). Scoring: NDCG@10, recall@10, precision@10 with suffix-path matching.
| metric | value |
|---|---|
| chunks indexed | 1,075,655 |
| index build | 65 s |
| PageRank graph build | 45 s |
| query p50 | 42 ms |
| query p90 | 168 ms |
| query p99 | 241 ms |
| NDCG@10 | 0.665 |
| recall@10 | 0.767 |
| precision@10 | 0.120 |
Prose corpus
Workload: tests/corpus/gutenberg, 10 Project Gutenberg books (~2 MB plain text). Query set: 15 natural-language queries each mapping to a single relevant book (tests/corpus/annotations/gutenberg.json). Same scoring.
| metric | value |
|---|---|
| chunks indexed | 1,652 |
| index build | 120 ms |
| query p50 | 34 ms |
| query p90 | 36 ms |
| query p99 | 36 ms |
| NDCG@10 | 1.000 |
| recall@10 | 1.000 |
| precision@10 | 0.100 (one relevant book per query, top-10) |
Every query returns the correct book at rank 1.
Comparison vs semble
semble is the closest published baseline for this stack: static-embedding bi-encoder, path-enriched BM25, ranking layer. ripvec runs semble's full published benchmark (63 repos, 19 languages, 1,251 queries) end-to-end. Full per-language tables, methodology, and raw JSON outputs live in docs/benchmarks/full_corpus.md.
Macro-averaged across languages:
| pipeline | NDCG@10 | q-p50 | q-p99 | index |
|---|---|---|---|---|
| semble (potion-code-16M) | 0.852 | 2.22 ms | 11.35 ms | 1347 ms |
| ripvec matched (same model, no PageRank, no rerank) | 0.845 | 0.33 ms | 4.20 ms | 110 ms |
| ripvec default (potion-base-32M + PageRank + auto-rerank) | 0.803 | 0.35 ms | 4.31 ms | 109 ms |
Matched-mode quality sits within 0.007 NDCG@10 of semble while running 6.7Γ faster at p50, 2.7Γ at p99, and 12.2Γ faster on index build. The matched cell answers "is the port faithful": same model, same algorithm shape, deltas attribute to the implementation. The default cell answers "what does a user get out of the box": ripvec's shipped configuration trades 0.049 NDCG@10 on this code-heavy corpus for the headroom the 32M model gives on prose (NDCG@10 = 1.000 on the Gutenberg benchmark above) and for the PageRank prior that helps architectural queries on import-graph-heavy codebases.
The pipelines differ on three axes:
- Embedding model. semble defaults to
potion-code-16M(code-tuned). ripvec defaults topotion-base-32M(general). 16M leads 32M on this code corpus by 0.042 NDCG@10; 32M leads 16M on the prose benchmark by 0.058. The bench harness accepts--model REPOto swap the bi-encoder. - Reranker. semble has no cross-encoder. ripvec applies
ms-marco-TinyBERT-L-2-v2on Docs and Mixed corpora when the query is natural-language; pure Code corpora skip it (the gate fires zero times across this 63-repo run, by design). - Structural prior. ripvec computes function-level PageRank over the import / call graph and applies a percentile-based boost. semble has no equivalent.
Reproducing
# Single-corpus end-to-end harness (code, ~25 min).
# Single-corpus end-to-end harness (prose, ~30 s).
# Full semble corpus replay (63 repos, ~25 min after one-time clone).
&&
Bench flags:
| flag | default | purpose |
|---|---|---|
--candidates N |
50 | cap on candidates the reranker sees |
--rerank-model REPO |
cross-encoder/ms-marco-TinyBERT-L-2-v2 |
swap cross-encoder |
--model REPO |
minishlab/potion-base-32M |
swap bi-encoder |
--scope {code,docs,all} |
(from arg) | corpus filter intent |
--repeats N |
5 | timing reps per query |
--no-rerank / --rerank |
auto | force the gate one way |
For matched-model semble parity, cargo run --release --example semble_bench -- <repo> <annotations.json> mirrors the harness in ~/src/semble/benchmarks/run_benchmark.py.
Workflow: orient, search, navigate
graph LR
A["πΊοΈ Orient<br/>get_repo_map"] --> B["π Search<br/>search(scope)"]
B --> C["π§ Navigate<br/>LSP operations"]
C -->|"need more context"| B
C -->|"found it"| D["βοΈ Edit"]
Orient. get_repo_map returns a structural overview ranked by function-level importance. One tool call replaces 10+ sequential file reads. Start here when working on unfamiliar code.
Search. search(query="authentication middleware", scope="code") finds implementations by meaning across all 19 languages simultaneously. Pass scope="docs" for documentation-only retrieval (with cross-encoder rerank), scope="all" (default) to search everything and let the corpus class decide whether rerank fires. Results are ranked by relevance and structural importance.
Navigate. LSP documentSymbol shows the file outline. goToDefinition jumps to the likely definition. findReferences shows usage sites. incomingCalls/outgoingCalls traces the call graph.
Semantic search
You describe behavior, ripvec finds the implementation:
| What you want | grep / ripgrep | ripvec |
|---|---|---|
| "retry with backoff" | Nothing (code says delay *= 2) |
Finds the retry handler |
| "database connection pool" | Comments mentioning "pool" | The pool implementation |
| "authentication middleware" | // TODO: add auth |
The auth guard |
| "WebSocket lifecycle" | String "WebSocket" | Connect/disconnect handlers |
Search modes: --mode hybrid (default, semantic + BM25 fusion), --mode semantic (pure vector similarity), --mode keyword (pure BM25). Hybrid is usually best.
Scope: code, docs, or all
Documents about a topic (READMEs, design specs, RFCs, code comments) literally use the topic's words. Code that implements the topic usually doesn't. Semantic similarity therefore systematically ranks docs above implementations on descriptive queries, and the right answer depends on what the agent is looking for.
scope lets the caller declare intent:
| Scope | Includes | Rerank | When to pick |
|---|---|---|---|
code |
code-language extensions (.py, .rs, .ts, .go, β¦) |
off | "Find the implementation of X." |
docs |
prose extensions (.md, .rst, .txt, .adoc, .org, .mdx) |
on (NL queries) | "Find documentation about X / how X is described." |
all (default) |
everything | corpus-aware | "Search everything; let the gate decide whether rerank fires." |
include_extensions and exclude_extensions give surgical control on top of scope (e.g. scope=all, exclude_extensions=["min.js"]). Same flags on CLI: --scope, --include-ext, --exclude-ext.
The MCP search tool exposes these as JSON params; the CLI exposes them as flags.
Multi-language LSP
ripvec serves LSP from a single binary for all 19 grammars. No per-language server installs. It provides:
documentSymbol: file outline (functions, fields, enum variants, constants, types, headings)workspaceSymbol: cross-language symbol search with PageRank boostgoToDefinition: name-based resolution ranked by structural importancefindReferences: usage sites via hybrid search + content filteringhover: scope chain, signature, enriched contextpublishDiagnostics: tree-sitter syntax error detection after every editincomingCalls/outgoingCalls: function-level call graph
For languages with dedicated LSPs (Rust, Python, Go, TypeScript), ripvec runs alongside them. The dedicated server handles types, ripvec handles semantic search and cross-language features. For languages without dedicated LSPs (bash, HCL, Ruby, Kotlin, Swift, Scala), ripvec is the primary code intelligence.
JSON, YAML, TOML, and Markdown get structural outlines (keys, mappings, headings) and syntax diagnostics. Useful for navigating large config files, not comparable to language-aware intelligence.
Architecture: the ripvec engine
The default engine is a four-stage composite pipeline. Each stage uses a fast cheap-to-rebuild signal; together they outperform a single transformer on retrieval quality.
graph TB
Q["Query"] --> EMB["Bi-encoder embed<br/>(Model2Vec potion-base-32M, 256-dim)"]
Q --> BM["BM25 score<br/>(path-enriched, postings-list inverted)"]
EMB --> SEM["Cosine similarity<br/>parallel sgemv across rayon row-shards<br/>top-N candidates"]
BM --> LEX["Lexical ranking<br/>par_iter over query terms<br/>top-N candidates"]
SEM --> RRF["Reciprocal Rank Fusion<br/>(k=60)"]
LEX --> RRF
RRF --> PR["Γ PageRank percentile boost<br/>(sigmoid curve, Ξ±=0.5)"]
PR --> GATE{"Corpus class<br/>(β₯30% prose chunks?)"}
GATE -->|"Docs / Mixed"| RR["Cross-encoder rerank<br/>(ms-marco-TinyBERT-L-2-v2)<br/>top-50 candidates"]
GATE -->|"Code"| OUT["Top-k results"]
RR --> OUT
Static bi-encoder retrieval (Model2Vec). The bi-encoder is a lookup-and-mean-pool over a pretrained 256-dim embedding table (minishlab/potion-base-32M). No transformer forward pass; encoding cost is dominated by memory bandwidth, not FLOPs. About 5ms per query on a single CPU thread; ~250K chunks per second when indexing in parallel.
Path-enriched BM25. Lexical scoring with a code-aware tokenizer that splits parseJsonConfig into [parse, json, config] and my_func_name into [my, func, name]. Chunk text is enriched with the file stem (doubled) and the last three directory components before tokenization, so a query like "session encoding" hits both content and sessions.py paths.
Reciprocal Rank Fusion. Combines the semantic and lexical rankings via Cormack et al.'s rank-based fusion (k=60). Handles the scale mismatch between cosine similarity and BM25 without tuning.
PageRank percentile boost. A structural-importance signal on top of relevance. See the next section.
Cross-encoder rerank (prose-class corpora). When the index's corpus class is Docs or Mixed (at least 30% of indexed chunks are prose-extension files) and the query is natural-language, the top 50 candidates are re-scored by ms-marco-TinyBERT-L-2-v2: a 2-layer cross-encoder distilled from BERT-base, ~5 MB on disk, ~0.3 ms per pair on CPU. The model swaps in from a sweep against the larger ms-marco-MiniLM-L-12-v2 (33 MB, 12 layers): TinyBERT-L-2 holds NDCG@10 = 1.000 on the Gutenberg benchmark at 20Γ the throughput.
Wiring details: the BERT pooler (tanh(W_pool Β· cls)) runs between the trunk and the classifier head (matching the head the model was trained against). Raw classifier logits flow out (sentence-transformers Identity activation), and the ranking layer min-max normalizes both cross-encoder and bi-encoder score arrays within the candidate set before convex-combining (0.7 Γ cross + 0.3 Γ bi). Tokenizer truncation is LongestFirst at max_position_embeddings, preserving [CLS] / [SEP] on long inputs.
Code-class corpora skip the reranker. The cross-encoder is trained on web-prose passage retrieval and adds latency without lifting NDCG on code: on the 8-Python-library benchmark, rerank-on costs roughly 0.09 NDCG@10 vs rerank-off regardless of which cross-encoder model is plugged in.
Function-level PageRank
graph LR
subgraph "Call Graph"
A["main()"] --> B["handle_request()"]
A --> C["init_db()"]
B --> D["authenticate()"]
B --> E["dispatch()"]
D --> F["verify_token()"]
E --> D
end
subgraph "PageRank"
D2["authenticate() β
β
β
"]
B2["handle_request() β
β
"]
E2["dispatch() β
"]
end
ripvec extracts call expressions from every function body using tree-sitter, resolves callee names to definitions, and computes PageRank on the resulting call graph. Functions called by many others rank higher. authenticate() in the example above is more structurally important than dispatch() because more code depends on it.
The bi-encoder is structurally weaker than a transformer. Model2Vec doesn't model cross-token interactions and can't reliably distinguish a 1500-char canonical implementation from a 3-line example stub by dense similarity alone. Without a corrective signal, the engine ranks tests/hello_world.py competitively with src/auth/handler.py on a query like "register a route." PageRank carries the missing signal: implementations are imported by tests and callers; stubs are imported by nothing.
ripvec applies the structural prior as a sigmoid-on-percentile boost: boost(p) = 1 + Ξ± Γ sigmoid((p β 0.5) / s) where p is the file's PR percentile within the corpus, Ξ±=0.5 is the ceiling lift, and s=0.15 controls steepness.
| PR percentile | Example file | Boost (Ξ±=0.5) |
|---|---|---|
| 0 (not in graph) | isolated leaf file | 1.00Γ (no boost) |
| 0.10 (bottom decile) | rarely-imported impl | 1.04Γ |
| 0.25 (lower quartile) | hub of one small module | 1.08Γ |
| 0.50 (median) | typical impl file | 1.25Γ |
| 0.75 (upper quartile) | heavily-imported module | 1.42Γ |
| 0.95 (near top) | central trait / API surface | 1.48Γ |
| 1.00 (graph root) | e.g. tokio/src/lib.rs |
~1.49Γ (asymptote 1.5Γ) |
Two design constraints fall out of this curve:
- At-or-above-median PR gets a meaningfully different boost from low-PR. A median-importance impl with cosine 0.84 ends at 0.84 Γ 1.25 = 1.05; a near-zero-PR test with cosine 0.85 ends at 0.85 Γ 1.02 = 0.867. The impl flips above the test by ~21%, enough to reorder reliably when the bi-encoder is uncertain.
- The ceiling caps centers-of-universe. A graph-root file at p=1.0 gets at most 1.5Γ. It can't dominate when the query genuinely matches a less-central file.
The boost is applied via a composable RankingLayer chain shared across CLI, MCP, and LSP code paths. Adding a new ranking signal (recency, file-saturation diversification) is a single new impl RankingLayer.
Performance
ripvec engine (the default and only engine). Wall time for a single query, end-to-end including model load on cold start:
| Corpus | First query (cold) | Warm | Notes |
|---|---|---|---|
| Small repo (~500 files) | ~7s | 0.3s | Model download + index build dominate cold path |
| Medium repo (~5K files, e.g. Tokio) | ~12s | 0.8s | |
| Large repo (~50K files) | ~50s | 8s | Linear in file count for indexing |
| Linux kernel (~92K files, 1.7 GB) | ~75s | n/a (in-memory drops between processes) |
The MCP daemon holds the in-memory index for the session lifetime, so warm latency dominates after the first query. For sub-MCPs and agent fan-out where each spawn starts fresh, the cold-path numbers are what to budget against.
Memory. ~200 MB for a typical project (embedding table + chunks + BM25 index).
Where CPU goes on the ripvec engine (linux/92K corpus, sampled).
| Component | % of CPU-time |
|---|---|
| rayon worker synchronization (intrinsic par_iter joins) | ~38% |
tokenizer Unicode normalization (upstream tokenizers crate) |
~10% |
| file I/O (read + open syscalls) | ~5% |
| pool_ids (SIMD f32x8, our kernel) | ~2% |
| tree-sitter parse | ~3% |
| BM25 build + interner | ~3% |
| useful work | ~36% |
The 38% sync floor is structural: rayon's par_iter join semantics require parking workers between stages. We've shipped what's worth shipping past that floor (mimalloc, hand-vectorized pool_ids, bounded-queue streaming pipeline, lasso term interning). Further compression would require restructuring around an async stage scheduler.
How it compares
| Tool | Type | Key difference from ripvec |
|---|---|---|
| ripgrep | Text search | No semantic understanding |
| Sourcegraph | Cloud AI platform | $49-59/user/month, code leaves your machine |
| grepai | Local semantic search | Requires Ollama for embeddings |
| mgrep | Semantic search | Uses cloud embeddings (Mixedbread AI) |
| Serena | MCP symbol navigation | Requires per-language LSP servers installed |
| Bloop | Was semantic + navigation | Archived Jan 2025 |
| VS Code anycode | Tree-sitter outlines | Editor-only, no cross-file search |
| Cursor @Codebase | IDE semantic search | Cursor-only, sends embeddings to cloud |
ripvec is self-contained (no Ollama, no cloud, no per-language setup), runs locally, and combines search + LSP + structural ranking in one binary. The cacheless default fits sub-MCP / fan-out / fresh-worktree workflows where a persistent index isn't viable.
Install
Pre-built binaries (fastest)
Requires cargo-binstall. Downloads a pre-built binary for your platform; no compilation.
From source
Claude Code plugin
The plugin auto-downloads the binary for your platform on first use and configures both MCP and LSP servers. It includes 3 skills (codebase orientation, semantic discovery, change impact analysis), 3 commands (/map, /find, /repo-index), and a code exploration agent.
Platforms
| Platform | Backend |
|---|---|
| macOS Apple Silicon | CPU (Accelerate) |
| Linux x86_64 | CPU (OpenBLAS) |
| Linux ARM64 (Graviton) | CPU (OpenBLAS) |
Model weights download automatically on first run: ~33 MB (potion-base-32M). The cross-encoder reranker (ms-marco-TinyBERT-L-2-v2, ~5 MB) downloads on first prose-class query.
Usage
CLI
MCP server
Tools (7 retrieval + 9 LSP):
| Category | Tools |
|---|---|
| Retrieval | search (with scope / include_extensions / exclude_extensions), find_similar, find_duplicates, get_repo_map, reindex, index_status, up_to_date |
| LSP | lsp_document_symbols, lsp_workspace_symbols, lsp_hover, lsp_goto_definition, lsp_goto_implementation, lsp_references, lsp_prepare_call_hierarchy, lsp_incoming_calls, lsp_outgoing_calls |
| Diagnostics | debug_log, log_level |
A single search tool covers code and prose. The agent picks scope (code / docs / all); the corpus-aware rerank gate decides whether the cross-encoder fires on a given query. index_status reports engine: "ripvec" and cache_location: "in-memory".
LSP server
Same binary, --lsp flag selects protocol.
Supported languages
19 tree-sitter grammars, 30 file extensions:
| Language | Extensions | Extracted elements |
|---|---|---|
| Rust | .rs |
functions, structs, enums, variants, fields, impls, traits, consts, mods |
| Python | .py |
functions, classes, assignments |
| JavaScript | .js .jsx |
functions, classes, methods, variables |
| TypeScript | .ts .tsx |
functions, classes, interfaces, type aliases, enums |
| Go | .go |
functions, methods, types, constants |
| Java | .java |
methods, classes, interfaces, enums, fields, constructors |
| C | .c .h |
functions, structs, enums, typedefs |
| C++ | .cpp .cc .cxx .hpp |
functions, classes, namespaces, enums, fields |
| Bash | .sh .bash .bats |
functions, variables |
| Ruby | .rb |
methods, classes, modules, constants |
| HCL / Terraform | .tf .tfvars .hcl |
blocks (resources, data, variables) |
| Kotlin | .kt .kts |
functions, classes, objects, properties |
| Swift | .swift |
functions, classes, protocols, properties |
| Scala | .scala |
functions, classes, traits, objects, vals, types |
| TOML | .toml |
tables, key-value pairs |
| JSON | .json |
object keys |
| YAML | .yaml .yml |
mapping keys |
| Markdown | .md |
headings |
Unsupported file types get sliding-window plain-text chunking. The embedding model handles any language; tree-sitter just provides better chunk boundaries.
Acknowledgments
ripvec's static bi-encoder uses Model2Vec embeddings (potion-base-32M, potion-code-16M) from MinishLab, whose semble pipeline inspired the path-enriched BM25 and query-shape boosting design we ported to Rust and extended. Cross-encoder rerank uses ms-marco-TinyBERT-L-2-v2. See CREDITS.md for the full ledger of what we used, what we ported, and what we built on top.
Limitations
- goToDefinition is best-effort: resolves by name matching and structural importance, not by type system analysis. Use dedicated LSPs (rust-analyzer, pyright, gopls) when you need exact resolution for overloaded symbols.
- Call graph is approximate: common names like
new,run,rendermay resolve to the wrong definition. Cross-crate resolution limited to workspace members. - Static encoder top-10 coherence on long-form prose: the Model2Vec bi-encoder (256-dim, no cross-token attention) can lose coherence across positions 4-10 on narrative corpora. The cross-encoder rerank gate fires on prose-class queries and substantially recovers top-K quality (NDCG@10 = 1.000 on the Gutenberg benchmark), but on very long narrative archives the bi-encoder ranking pre-rerank sets the ceiling.
- Cold start scales linearly: first-query indexing is O(files). At 92K files (Linux kernel) it is ~75s. The index is discarded on process exit; each fresh process re-indexes.
- English-centric: the embedding model was trained primarily on English text. Queries and code comments in other languages will have lower recall.
Development
&& &&
See CLAUDE.md for detailed development conventions, architecture notes, and MCP tool namespace resolution.
Architecture
Cargo workspace with three crates:
| Crate | Role |
|---|---|
ripvec-core |
Static encoder engine, CPU rerank backend, chunking, embedding, search, repo map, call graph, ranking layers |
ripvec |
CLI binary (clap + ratatui TUI) |
ripvec-mcp |
MCP + LSP server binary (rmcp + tower-lsp-server) |
Docs
- CREDITS.md: full attribution for models, libraries, and design inspiration
- Development Learnings
- Metal/MPS Architecture (archived)
- CUDA Architecture (archived)
License
Licensed under either of Apache-2.0 or MIT at your option.