ripvec-core 3.0.2

Semantic code + document search engine. Cacheless static-embedding + cross-encoder rerank by default; optional ModernBERT/BGE transformer engines with GPU backends. Tree-sitter chunking, hybrid BM25 + PageRank, composable ranking layers.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
# ripvec

[![CI](https://github.com/fnordpig/ripvec/actions/workflows/ci.yml/badge.svg)](https://github.com/fnordpig/ripvec/actions/workflows/ci.yml)
[![crates.io](https://img.shields.io/crates/v/ripvec.svg)](https://crates.io/crates/ripvec)
[![docs.rs](https://docs.rs/ripvec-core/badge.svg)](https://docs.rs/ripvec-core)
[![downloads](https://img.shields.io/crates/d/ripvec.svg)](https://crates.io/crates/ripvec)
[![plugin](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fraw.githubusercontent.com%2Ffnordpig%2Fripvec%2Fmain%2Fplugins%2F.claude-plugin%2Fmarketplace.json&query=%24.metadata.version&label=plugin&color=blue)](plugins/)
[![License: MIT/Apache-2.0](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](LICENSE-MIT)

**Cacheless semantic code + document search. One binary, 19 grammars, one static-encoder engine, zero setup.**

ripvec finds code and documents by meaning, provides structural code intelligence across every language it knows, and ranks results by how important each file is in your project. It runs CPU-only, holds no on-disk index, and matches or exceeds transformer baselines on our benchmark matrix across code and prose.

```sh
$ ripvec "retry logic with exponential backoff" ~/src/my-project

 1. retry_handler.rs:42-78                                        [0.91]
    pub async fn with_retry<F, T>(f: F, max_attempts: u32) -> Result<T>
    where F: Fn() -> Future<Output = Result<T>> {
        let mut delay = Duration::from_millis(100);
        for attempt in 0..max_attempts {
            match f().await {
                Ok(v) => return Ok(v),
                Err(e) if attempt < max_attempts - 1 => {
                    sleep(delay).await;
                    delay *= 2;  // exponential backoff
    ...

 2. http_client.rs:156-189                                        [0.84]
    impl HttpClient {
        async fn request_with_backoff(&self, req: Request) -> Response {
    ...
```

The function is called `with_retry`, the variable is `delay`. "exponential backoff" appears nowhere in the source. grep can't find this. ripvec can, because it embeds both your query and the code into the same vector space, fuses semantic scores with path-enriched BM25, layers a structural-importance signal from a PageRank percentile boost, and reranks the top candidates through a cross-encoder.

## When to use what

ripvec has three interfaces. Here's when each one matters:

| Interface | When to use it | Who uses it |
|-----------|---------------|-------------|
| **CLI** (`ripvec "query" .`) | Terminal search, one-shot queries | You, directly |
| **MCP server** (`ripvec-mcp`) | AI agent needs to search or understand your codebase | Claude Code, Cursor, any MCP client |
| **LSP server** (`ripvec-mcp --lsp`) | Editor/agent needs symbols, definitions, diagnostics | Claude Code's LSP tool, editors |

The MCP server gives AI agents 8 semantic + structural tools plus 9 LSP tools. The LSP server gives editors structural intelligence (outlines, go-to-definition, syntax diagnostics) for all 19 languages from one binary. The CLI is for humans. Same binary for all three.

If you're using **Claude Code**, install the plugin. It sets up both MCP and LSP automatically; Claude will use `search_code` when you ask conceptual questions and the LSP for symbol navigation.

## Engine

ripvec uses a single retrieval engine across the CLI, MCP server, and LSP server:

**Model2Vec static bi-encoder** (`minishlab/potion-base-32M`, 256-dim) + path-enriched BM25 + function-level PageRank percentile boost + TinyBERT-L-2 cross-encoder rerank (gated by corpus class).

The engine is in-memory per session -- no on-disk index or persistent cache. Sub-MCPs, fresh worktrees, agent fan-out, and document archives all work without setup.

## Quality and speed

Two reproducible benchmarks anchor ripvec's behavior, both run from a fresh checkout via `cargo run --release --example corpus_bench`. The corpora and query / target-file annotations are checked in under `tests/corpus/`.

### Code corpus

Workload: `tests/corpus/code`, ~2 GB across nine codebases (tokio, redis, react, spring-boot, go, linux, ripgrep, flask, express). Query set: 20 architectural and semantic queries against tokio with file-level ground truth (`tests/corpus/annotations/tokio.json`). Scoring: NDCG@10, recall@10, precision@10 with suffix-path matching.

| metric | value |
|---|---:|
| chunks indexed | 1,075,655 |
| index build | 65 s |
| PageRank graph build | 45 s |
| query p50 | **42 ms** |
| query p90 | 168 ms |
| query p99 | 241 ms |
| NDCG@10 | 0.665 |
| recall@10 | 0.767 |
| precision@10 | 0.120 |

### Prose corpus

Workload: `tests/corpus/gutenberg`, 10 Project Gutenberg books (~2 MB plain text). Query set: 15 natural-language queries each mapping to a single relevant book (`tests/corpus/annotations/gutenberg.json`). Same scoring.

| metric | value |
|---|---:|
| chunks indexed | 1,652 |
| index build | 120 ms |
| query p50 | **34 ms** |
| query p90 | 36 ms |
| query p99 | 36 ms |
| NDCG@10 | 1.000 |
| recall@10 | 1.000 |
| precision@10 | 0.100 (one relevant book per query, top-10) |

Every query returns the correct book at rank 1.

### Comparison vs semble

[semble](https://github.com/MinishLab/semble) is the closest published baseline for this stack: static-embedding bi-encoder, path-enriched BM25, ranking layer. ripvec runs semble's full published benchmark (63 repos, 19 languages, 1,251 queries) end-to-end. Full per-language tables, methodology, and raw JSON outputs live in [`docs/benchmarks/full_corpus.md`](docs/benchmarks/full_corpus.md).

Macro-averaged across languages:

| pipeline | NDCG@10 | q-p50 | q-p99 | index |
|---|---:|---:|---:|---:|
| semble (potion-code-16M) | 0.852 | 2.22 ms | 11.35 ms | 1347 ms |
| **ripvec matched** (same model, no PageRank, no rerank) | 0.845 | **0.33 ms** | **4.20 ms** | **110 ms** |
| ripvec default (potion-base-32M + PageRank + auto-rerank) | 0.803 | 0.35 ms | 4.31 ms | 109 ms |

Matched-mode quality sits within 0.007 NDCG@10 of semble while running 6.7Γ— faster at p50, 2.7Γ— at p99, and 12.2Γ— faster on index build. The matched cell answers "is the port faithful": same model, same algorithm shape, deltas attribute to the implementation. The default cell answers "what does a user get out of the box": ripvec's shipped configuration trades 0.049 NDCG@10 on this code-heavy corpus for the headroom the 32M model gives on prose (NDCG@10 = 1.000 on the Gutenberg benchmark above) and for the PageRank prior that helps architectural queries on import-graph-heavy codebases.

The pipelines differ on three axes:

- **Embedding model.** semble defaults to `potion-code-16M` (code-tuned). ripvec defaults to `potion-base-32M` (general). 16M leads 32M on this code corpus by 0.042 NDCG@10; 32M leads 16M on the prose benchmark by 0.058. The bench harness accepts `--model REPO` to swap the bi-encoder.
- **Reranker.** semble has no cross-encoder. ripvec applies `ms-marco-TinyBERT-L-2-v2` on Docs and Mixed corpora when the query is natural-language; pure Code corpora skip it (the gate fires zero times across this 63-repo run, by design).
- **Structural prior.** ripvec computes function-level PageRank over the import / call graph and applies a percentile-based boost. semble has no equivalent.

### Reproducing

```sh
# Single-corpus end-to-end harness (code, ~25 min).
cargo run --release --example corpus_bench -- \
  tests/corpus/code tests/corpus/annotations/tokio.json --scope code

# Single-corpus end-to-end harness (prose, ~30 s).
cargo run --release --example corpus_bench -- \
  tests/corpus/gutenberg tests/corpus/annotations/gutenberg.json --scope docs

# Full semble corpus replay (63 repos, ~25 min after one-time clone).
cd ~/src/semble && uv run python -m benchmarks.sync_repos    # ~10 GB
cargo run --release --example semble_full_bench --features cpu-accelerate -- \
  --mode matched --out docs/benchmarks/results/ripvec_matched.json
cargo run --release --example semble_full_bench --features cpu-accelerate -- \
  --mode default --out docs/benchmarks/results/ripvec_default.json
```

Bench flags:

| flag | default | purpose |
|---|---|---|
| `--candidates N` | 50 | cap on candidates the reranker sees |
| `--rerank-model REPO` | `cross-encoder/ms-marco-TinyBERT-L-2-v2` | swap cross-encoder |
| `--model REPO` | `minishlab/potion-base-32M` | swap bi-encoder |
| `--scope {code,docs,all}` | (from arg) | corpus filter intent |
| `--repeats N` | 5 | timing reps per query |
| `--no-rerank` / `--rerank` | auto | force the gate one way |

For matched-model semble parity, `cargo run --release --example semble_bench -- <repo> <annotations.json>` mirrors the harness in `~/src/semble/benchmarks/run_benchmark.py`.

## Workflow: orient, search, navigate

```mermaid
graph LR
    A["πŸ—ΊοΈ Orient<br/>get_repo_map"] --> B["πŸ” Search<br/>search(scope)"]
    B --> C["🧭 Navigate<br/>LSP operations"]
    C -->|"need more context"| B
    C -->|"found it"| D["✏️ Edit"]
```

**Orient.** `get_repo_map` returns a structural overview ranked by function-level importance. One tool call replaces 10+ sequential file reads. Start here when working on unfamiliar code.

**Search.** `search(query="authentication middleware", scope="code")` finds implementations by meaning across all 19 languages simultaneously. Pass `scope="docs"` for documentation-only retrieval (with cross-encoder rerank), `scope="all"` (default) to search everything and let the corpus class decide whether rerank fires. Results are ranked by relevance and structural importance.

**Navigate.** LSP `documentSymbol` shows the file outline. `goToDefinition` jumps to the likely definition. `findReferences` shows usage sites. `incomingCalls`/`outgoingCalls` traces the call graph.

## Semantic search

You describe behavior, ripvec finds the implementation:

| What you want | grep / ripgrep | ripvec |
|---------------|----------------|--------|
| "retry with backoff" | Nothing (code says `delay *= 2`) | Finds the retry handler |
| "database connection pool" | Comments mentioning "pool" | The pool implementation |
| "authentication middleware" | `// TODO: add auth` | The auth guard |
| "WebSocket lifecycle" | String "WebSocket" | Connect/disconnect handlers |

Search modes: `--mode hybrid` (default, semantic + BM25 fusion), `--mode semantic` (pure vector similarity), `--mode keyword` (pure BM25). Hybrid is usually best.

### Scope: code, docs, or all

Documents about a topic (READMEs, design specs, RFCs, code comments) literally *use* the topic's words. Code that implements the topic usually doesn't. Semantic similarity therefore systematically ranks docs above implementations on descriptive queries, and the right answer depends on what the agent is looking for.

`scope` lets the caller declare intent:

| Scope | Includes | Rerank | When to pick |
|---|---|---|---|
| `code` | code-language extensions (`.py`, `.rs`, `.ts`, `.go`, …) | off | "Find the implementation of X." |
| `docs` | prose extensions (`.md`, `.rst`, `.txt`, `.adoc`, `.org`, `.mdx`) | on (NL queries) | "Find documentation about X / how X is described." |
| `all` (default) | everything | corpus-aware | "Search everything; let the gate decide whether rerank fires." |

`include_extensions` and `exclude_extensions` give surgical control on top of scope (e.g. `scope=all`, `exclude_extensions=["min.js"]`). Same flags on CLI: `--scope`, `--include-ext`, `--exclude-ext`.

The MCP `search` tool exposes these as JSON params; the CLI exposes them as flags.

## Multi-language LSP

ripvec serves LSP from a single binary for all 19 grammars. No per-language server installs. It provides:

- **`documentSymbol`**: file outline (functions, fields, enum variants, constants, types, headings)
- **`workspaceSymbol`**: cross-language symbol search with PageRank boost
- **`goToDefinition`**: name-based resolution ranked by structural importance
- **`findReferences`**: usage sites via hybrid search + content filtering
- **`hover`**: scope chain, signature, enriched context
- **`publishDiagnostics`**: tree-sitter syntax error detection after every edit
- **`incomingCalls` / `outgoingCalls`**: function-level call graph

For languages with dedicated LSPs (Rust, Python, Go, TypeScript), ripvec runs alongside them. The dedicated server handles types, ripvec handles semantic search and cross-language features. For languages without dedicated LSPs (bash, HCL, Ruby, Kotlin, Swift, Scala), ripvec is the primary code intelligence.

JSON, YAML, TOML, and Markdown get structural outlines (keys, mappings, headings) and syntax diagnostics. Useful for navigating large config files, not comparable to language-aware intelligence.

## Architecture: the ripvec engine

The default engine is a four-stage composite pipeline. Each stage uses a fast cheap-to-rebuild signal; together they outperform a single transformer on retrieval quality.

```mermaid
graph TB
    Q["Query"] --> EMB["Bi-encoder embed<br/>(Model2Vec potion-base-32M, 256-dim)"]
    Q --> BM["BM25 score<br/>(path-enriched, postings-list inverted)"]
    EMB --> SEM["Cosine similarity<br/>parallel sgemv across rayon row-shards<br/>top-N candidates"]
    BM --> LEX["Lexical ranking<br/>par_iter over query terms<br/>top-N candidates"]
    SEM --> RRF["Reciprocal Rank Fusion<br/>(k=60)"]
    LEX --> RRF
    RRF --> PR["Γ— PageRank percentile boost<br/>(sigmoid curve, Ξ±=0.5)"]
    PR --> GATE{"Corpus class<br/>(β‰₯30% prose chunks?)"}
    GATE -->|"Docs / Mixed"| RR["Cross-encoder rerank<br/>(ms-marco-TinyBERT-L-2-v2)<br/>top-50 candidates"]
    GATE -->|"Code"| OUT["Top-k results"]
    RR --> OUT
```

**Static bi-encoder retrieval (Model2Vec).** The bi-encoder is a lookup-and-mean-pool over a pretrained 256-dim embedding table (`minishlab/potion-base-32M`). No transformer forward pass; encoding cost is dominated by memory bandwidth, not FLOPs. About 5ms per query on a single CPU thread; ~250K chunks per second when indexing in parallel.

**Path-enriched BM25.** Lexical scoring with a code-aware tokenizer that splits `parseJsonConfig` into `[parse, json, config]` and `my_func_name` into `[my, func, name]`. Chunk text is enriched with the file stem (doubled) and the last three directory components before tokenization, so a query like "session encoding" hits both content and `sessions.py` paths.

**Reciprocal Rank Fusion.** Combines the semantic and lexical rankings via Cormack et al.'s rank-based fusion (k=60). Handles the scale mismatch between cosine similarity and BM25 without tuning.

**PageRank percentile boost.** A structural-importance signal on top of relevance. See the next section.

**Cross-encoder rerank (prose-class corpora).** When the index's corpus class is `Docs` or `Mixed` (at least 30% of indexed chunks are prose-extension files) and the query is natural-language, the top 50 candidates are re-scored by `ms-marco-TinyBERT-L-2-v2`: a 2-layer cross-encoder distilled from BERT-base, ~5 MB on disk, ~0.3 ms per pair on CPU. The model swaps in from a sweep against the larger `ms-marco-MiniLM-L-12-v2` (33 MB, 12 layers): TinyBERT-L-2 holds NDCG@10 = 1.000 on the Gutenberg benchmark at 20Γ— the throughput.

Wiring details: the BERT pooler (`tanh(W_pool Β· cls)`) runs between the trunk and the classifier head (matching the head the model was trained against). Raw classifier logits flow out (sentence-transformers `Identity` activation), and the ranking layer min-max normalizes both cross-encoder and bi-encoder score arrays within the candidate set before convex-combining (`0.7 Γ— cross + 0.3 Γ— bi`). Tokenizer truncation is `LongestFirst` at `max_position_embeddings`, preserving `[CLS]` / `[SEP]` on long inputs.

Code-class corpora skip the reranker. The cross-encoder is trained on web-prose passage retrieval and adds latency without lifting NDCG on code: on the 8-Python-library benchmark, rerank-on costs roughly 0.09 NDCG@10 vs rerank-off regardless of which cross-encoder model is plugged in.

### Function-level PageRank

```mermaid
graph LR
    subgraph "Call Graph"
        A["main()"] --> B["handle_request()"]
        A --> C["init_db()"]
        B --> D["authenticate()"]
        B --> E["dispatch()"]
        D --> F["verify_token()"]
        E --> D
    end
    subgraph "PageRank"
        D2["authenticate() β˜…β˜…β˜…"]
        B2["handle_request() β˜…β˜…"]
        E2["dispatch() β˜…"]
    end
```

ripvec extracts call expressions from every function body using tree-sitter, resolves callee names to definitions, and computes PageRank on the resulting call graph. Functions called by many others rank higher. `authenticate()` in the example above is more structurally important than `dispatch()` because more code depends on it.

The bi-encoder is structurally weaker than a transformer. Model2Vec doesn't model cross-token interactions and can't reliably distinguish a 1500-char canonical implementation from a 3-line example stub by dense similarity alone. Without a corrective signal, the engine ranks `tests/hello_world.py` competitively with `src/auth/handler.py` on a query like "register a route." PageRank carries the missing signal: implementations are imported by tests and callers; stubs are imported by nothing.

ripvec applies the structural prior as a **sigmoid-on-percentile boost**: `boost(p) = 1 + Ξ± Γ— sigmoid((p βˆ’ 0.5) / s)` where `p` is the file's PR percentile within the corpus, `Ξ±=0.5` is the ceiling lift, and `s=0.15` controls steepness.

| PR percentile | Example file | Boost (Ξ±=0.5) |
|---|---|---:|
| 0 (not in graph) | isolated leaf file | 1.00Γ— (no boost) |
| 0.10 (bottom decile) | rarely-imported impl | 1.04Γ— |
| 0.25 (lower quartile) | hub of one small module | 1.08Γ— |
| **0.50 (median)** | typical impl file | **1.25Γ—** |
| 0.75 (upper quartile) | heavily-imported module | 1.42Γ— |
| 0.95 (near top) | central trait / API surface | 1.48Γ— |
| 1.00 (graph root) | e.g. `tokio/src/lib.rs` | ~1.49Γ— (asymptote 1.5Γ—) |

Two design constraints fall out of this curve:

1. **At-or-above-median PR gets a meaningfully different boost from low-PR.** A median-importance impl with cosine 0.84 ends at 0.84 Γ— 1.25 = 1.05; a near-zero-PR test with cosine 0.85 ends at 0.85 Γ— 1.02 = 0.867. The impl flips above the test by ~21%, enough to reorder reliably when the bi-encoder is uncertain.
2. **The ceiling caps centers-of-universe.** A graph-root file at p=1.0 gets at most 1.5Γ—. It can't dominate when the query genuinely matches a less-central file.

The boost is applied via a composable [`RankingLayer`](crates/ripvec-core/src/ranking.rs) chain shared across CLI, MCP, and LSP code paths. Adding a new ranking signal (recency, file-saturation diversification) is a single new `impl RankingLayer`.

## Performance

**ripvec engine (the default and only engine).** Wall time for a single query, end-to-end including model load on cold start:

| Corpus | First query (cold) | Warm | Notes |
|---|---|---|---|
| Small repo (~500 files) | ~7s | 0.3s | Model download + index build dominate cold path |
| Medium repo (~5K files, e.g. Tokio) | ~12s | 0.8s | |
| Large repo (~50K files) | ~50s | 8s | Linear in file count for indexing |
| Linux kernel (~92K files, 1.7 GB) | ~75s | n/a (in-memory drops between processes) | |

The MCP daemon holds the in-memory index for the session lifetime, so warm latency dominates after the first query. For sub-MCPs and agent fan-out where each spawn starts fresh, the cold-path numbers are what to budget against.

**Memory.** ~200 MB for a typical project (embedding table + chunks + BM25 index).

**Where CPU goes on the ripvec engine (linux/92K corpus, sampled).**

| Component | % of CPU-time |
|---|---:|
| rayon worker synchronization (intrinsic par_iter joins) | ~38% |
| tokenizer Unicode normalization (upstream `tokenizers` crate) | ~10% |
| file I/O (read + open syscalls) | ~5% |
| pool_ids (SIMD f32x8, our kernel) | ~2% |
| tree-sitter parse | ~3% |
| BM25 build + interner | ~3% |
| useful work | ~36% |

The 38% sync floor is structural: rayon's `par_iter` join semantics require parking workers between stages. We've shipped what's worth shipping past that floor (mimalloc, hand-vectorized pool_ids, bounded-queue streaming pipeline, lasso term interning). Further compression would require restructuring around an async stage scheduler.

## How it compares

| Tool | Type | Key difference from ripvec |
|------|------|--------------------------|
| ripgrep | Text search | No semantic understanding |
| Sourcegraph | Cloud AI platform | $49-59/user/month, code leaves your machine |
| grepai | Local semantic search | Requires Ollama for embeddings |
| mgrep | Semantic search | Uses cloud embeddings (Mixedbread AI) |
| Serena | MCP symbol navigation | Requires per-language LSP servers installed |
| Bloop | Was semantic + navigation | Archived Jan 2025 |
| VS Code anycode | Tree-sitter outlines | Editor-only, no cross-file search |
| Cursor @Codebase | IDE semantic search | Cursor-only, sends embeddings to cloud |

ripvec is self-contained (no Ollama, no cloud, no per-language setup), runs locally, and combines search + LSP + structural ranking in one binary. The cacheless default fits sub-MCP / fan-out / fresh-worktree workflows where a persistent index isn't viable.

## Install

### Pre-built binaries (fastest)

```sh
cargo binstall ripvec ripvec-mcp
```

Requires [cargo-binstall](https://github.com/cargo-bins/cargo-binstall). Downloads a pre-built binary for your platform; no compilation.

### From source

```sh
cargo install ripvec ripvec-mcp
```

### Claude Code plugin

```sh
claude plugin install ripvec@fnordpig-my-claude-plugins
```

The plugin auto-downloads the binary for your platform on first use and configures both MCP and LSP servers. It includes 3 skills (codebase orientation, semantic discovery, change impact analysis), 3 commands (`/map`, `/find`, `/repo-index`), and a code exploration agent.

### Platforms

| Platform | Backend |
|----------|---------|
| macOS Apple Silicon | CPU (Accelerate) |
| Linux x86_64 | CPU (OpenBLAS) |
| Linux ARM64 (Graviton) | CPU (OpenBLAS) |

Model weights download automatically on first run: ~33 MB (`potion-base-32M`). The cross-encoder reranker (`ms-marco-TinyBERT-L-2-v2`, ~5 MB) downloads on first prose-class query.

## Usage

### CLI

```sh
ripvec "error handling" .                              # ripvec engine (Model2Vec + BM25 + PageRank)
ripvec "form validation hooks" -n 5                    # Top 5 results
ripvec "database migration" --mode keyword             # BM25 only
ripvec "session encoding" --exclude-extensions=jsonl,md  # Skip noisy extensions
```

### MCP server

```json
{ "mcpServers": { "ripvec": { "command": "ripvec-mcp" } } }
```

Tools (7 retrieval + 9 LSP):

| Category | Tools |
|---|---|
| Retrieval | `search` (with `scope` / `include_extensions` / `exclude_extensions`), `find_similar`, `find_duplicates`, `get_repo_map`, `reindex`, `index_status`, `up_to_date` |
| LSP | `lsp_document_symbols`, `lsp_workspace_symbols`, `lsp_hover`, `lsp_goto_definition`, `lsp_goto_implementation`, `lsp_references`, `lsp_prepare_call_hierarchy`, `lsp_incoming_calls`, `lsp_outgoing_calls` |
| Diagnostics | `debug_log`, `log_level` |

A single `search` tool covers code and prose. The agent picks `scope` (`code` / `docs` / `all`); the corpus-aware rerank gate decides whether the cross-encoder fires on a given query. `index_status` reports `engine: "ripvec"` and `cache_location: "in-memory"`.

### LSP server

```sh
ripvec-mcp --lsp   # serves LSP over stdio
```

Same binary, `--lsp` flag selects protocol.

## Supported languages

19 tree-sitter grammars, 30 file extensions:

| Language | Extensions | Extracted elements |
|----------|-----------|-------------------|
| Rust | `.rs` | functions, structs, enums, variants, fields, impls, traits, consts, mods |
| Python | `.py` | functions, classes, assignments |
| JavaScript | `.js` `.jsx` | functions, classes, methods, variables |
| TypeScript | `.ts` `.tsx` | functions, classes, interfaces, type aliases, enums |
| Go | `.go` | functions, methods, types, constants |
| Java | `.java` | methods, classes, interfaces, enums, fields, constructors |
| C | `.c` `.h` | functions, structs, enums, typedefs |
| C++ | `.cpp` `.cc` `.cxx` `.hpp` | functions, classes, namespaces, enums, fields |
| Bash | `.sh` `.bash` `.bats` | functions, variables |
| Ruby | `.rb` | methods, classes, modules, constants |
| HCL / Terraform | `.tf` `.tfvars` `.hcl` | blocks (resources, data, variables) |
| Kotlin | `.kt` `.kts` | functions, classes, objects, properties |
| Swift | `.swift` | functions, classes, protocols, properties |
| Scala | `.scala` | functions, classes, traits, objects, vals, types |
| TOML | `.toml` | tables, key-value pairs |
| JSON | `.json` | object keys |
| YAML | `.yaml` `.yml` | mapping keys |
| Markdown | `.md` | headings |

Unsupported file types get sliding-window plain-text chunking. The embedding model handles any language; tree-sitter just provides better chunk boundaries.

## Acknowledgments

ripvec's static bi-encoder uses [Model2Vec](https://github.com/MinishLab/model2vec) embeddings (`potion-base-32M`, `potion-code-16M`) from MinishLab, whose [semble](https://github.com/MinishLab/semble) pipeline inspired the path-enriched BM25 and query-shape boosting design we ported to Rust and extended. Cross-encoder rerank uses [`ms-marco-TinyBERT-L-2-v2`](https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2-v2). See [CREDITS.md](CREDITS.md) for the full ledger of what we used, what we ported, and what we built on top.

## Limitations

- **goToDefinition is best-effort**: resolves by name matching and structural importance, not by type system analysis. Use dedicated LSPs (rust-analyzer, pyright, gopls) when you need exact resolution for overloaded symbols.
- **Call graph is approximate**: common names like `new`, `run`, `render` may resolve to the wrong definition. Cross-crate resolution limited to workspace members.
- **Static encoder top-10 coherence on long-form prose**: the Model2Vec bi-encoder (256-dim, no cross-token attention) can lose coherence across positions 4-10 on narrative corpora. The cross-encoder rerank gate fires on prose-class queries and substantially recovers top-K quality (NDCG@10 = 1.000 on the Gutenberg benchmark), but on very long narrative archives the bi-encoder ranking pre-rerank sets the ceiling.
- **Cold start scales linearly**: first-query indexing is O(files). At 92K files (Linux kernel) it is ~75s. The index is discarded on process exit; each fresh process re-indexes.
- **English-centric**: the embedding model was trained primarily on English text. Queries and code comments in other languages will have lower recall.

## Development

```sh
cargo fmt --check && cargo clippy --all-targets -- -D warnings && cargo test --workspace
```

See [CLAUDE.md](CLAUDE.md) for detailed development conventions, architecture notes, and MCP tool namespace resolution.

### Architecture

Cargo workspace with three crates:

| Crate | Role |
|-------|------|
| [`ripvec-core`](crates/ripvec-core) | Static encoder engine, CPU rerank backend, chunking, embedding, search, repo map, call graph, ranking layers |
| [`ripvec`](crates/ripvec) | CLI binary (clap + ratatui TUI) |
| [`ripvec-mcp`](crates/ripvec-mcp) | MCP + LSP server binary (rmcp + tower-lsp-server) |

### Docs

- [CREDITS.md](CREDITS.md): full attribution for models, libraries, and design inspiration
- [Development Learnings](docs/LEARNINGS.md)
- [Metal/MPS Architecture (archived)](docs/archive/METAL_MPS_ARCHITECTURE.md)
- [CUDA Architecture (archived)](docs/archive/CUDA_ARCHITECTURE.md)

## License

Licensed under either of [Apache-2.0](LICENSE-APACHE) or [MIT](LICENSE-MIT) at your option.