lexa-core 0.1.0

Local-first hybrid retrieval engine: BM25 + binary-quantized Matryoshka KNN + cross-encoder rerank, in one Rust crate. Pairs with `lexa-obsidian` for vault-aware MCP.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
# Lexa — local Exa

> Hybrid retrieval over your local files and code, in a single static Rust
> binary. Lexa applies the architecture of [Exa]https://exa.ai/ — five
> latency-tiered search modes, hybrid BM25 + dense + RRF, two-stage
> Matryoshka KNN, binary-quantized vectors, query-aware highlights, deep
> reranking with optional query expansion, LLM-as-judge evaluation — to
> the corpus already on your disk.

```bash
lexa index ~/repos/myproject
lexa search "where does the rate limiter back off when redis is down"
```

```text
crates/api/src/limiter.rs:48-72   0.7141
  if !backend.is_healthy().await { tracing::warn!("redis down, switching to in-memory backoff");
   return self.fallback.acquire(key).await; }
```

## Highlights

- **Single static binary**, no daemon, no Python, no Docker. SQLite (with
  FTS5 and `sqlite-vec`) is the entire backend.
- **Sub-10 ms `fast` tier** on real Nomic-v1.5 embeddings (M-series
  warm-state, 2 000 docs, 500 iterations). 38× faster than the published
  Exa Fast latency budget.
- **Five search tiers**`instant`, `dense`, `fast`, `deep`, `auto`  mirroring Exa's tiered API.
- **Two-stage Matryoshka KNN** (256-bit preview → 768-bit re-score) the
  same way Exa runs prefix-256 over their 4096-dim embeddings.
- **Deep tier with query expansion** (`additional_queries`) and a sigmoid-
  blended cross-encoder reranker that fixes the override-RRF failure
  mode.
- **Query-aware highlights** — sentence-level span extraction, the same
  idea behind Exa's [contents API "highlights"]https://exa.ai/docs/reference/contents.
- **Five reproducible benchmark harnesses**, full-methodology JSON
  artifacts, CI gate. See [`docs/BENCHMARKS.md`]docs/BENCHMARKS.md.
- **MCP server** (`lexa-mcp`) over stdio so any Anthropic-MCP client
  (Claude Desktop, Claude Code, Cursor, etc.) gets `search_files`,
  `index_path`, `purge_path`, and friends for free.

## How Lexa maps to Exa

| Exa concept                      | Lexa equivalent                                                                                       |
|----------------------------------|-------------------------------------------------------------------------------------------------------|
| Instant tier (<200 ms, BM25)     | `lexa search --tier instant` — FTS5 BM25, p50 ~250 µs.                                                |
| Fast tier (~350 ms, neural)      | `lexa search --tier dense` (KNN-only) or `--tier fast` (hybrid). p50 ~9 ms.                           |
| Auto tier (~1 s, intelligent)    | `lexa search --tier auto` — query router in `classify_query`. Default tier.                           |
| Deep tier (5-60 s, agentic)      | `lexa search --tier deep` + `SearchOptions::additional_queries` for [`additionalQueries`-style]https://exa.ai/blog/exa-deep fan-out. |
| Hybrid retrieval (BM25 + dense)  | RRF (k=60) over FTS5 BM25 and binary-quantized vector KNN, run concurrently. See [Exa: Composing a Search Engine]https://exa.ai/blog/composing-a-search-engine. |
| BM25 optimizations               | FTS5's built-in BM25 implementation; OR-of-quoted-tokens query construction with a curated stopword set. (Lexa doesn't reimplement Exa's six [posting-list compression tricks]https://exa.ai/blog/bm25-optimization — local corpora don't justify them.) |
| Matryoshka prefix                | Nomic v1.5-Q (768d, MRL-trained at `{64, 128, 256, 512, 768}`); `vectors_bin_preview bit[256]` table for first-stage KNN. See [Exa 2.0: building a web-scale vector DB]https://exa.ai/blog/exa-api-2-0. |
| Binary quantization              | `sqlite-vec`'s `vec_quantize_binary()` and `bit[N]` columns; Hamming distance via SIMD intrinsics. 32× storage shrink. |
| Cross-encoder reranking          | `BAAI/bge-reranker-base` over top-15 fused candidates, sigmoid-blended at α = 0.7 with the RRF score. |
| Highlights / contents API        | `search.rs::highlight` — query-token-overlap-scored sentence span, ~10× LLM-token reduction vs full chunks. |
| `additionalQueries`              | `SearchOptions::additional_queries: Vec<String>`; the deep tier fans out N+1 queries, RRF-fuses them, then reranks. The bench harness includes an Ollama-backed reformulation helper. |
| LLM-as-judge eval (5-dim rubric) | `lexa-bench simpleqa` — Harness E. Scores relevance, authority, content_issues, evaluator_confidence, overall in [0, 1]. Default judge is local Ollama running `qwen3:8b`. See [Exa: Evaluating Search]https://exa.ai/blog/evals-at-exa. |

What Lexa **doesn't** clone:

- Crawl freshness — Lexa indexes static local trees, not the web.
- Websets-scale entity finding — billions of records / async enrichment
  pipelines aren't a single-binary local feature.
- Authority / domain reputation signals — those are web-graph specific.

The local-first tradeoff is what makes the latency budget viable. Exa
Fast targets <500 ms because it's reaching across a planet-scale index;
Lexa Fast hits 9 ms because everything is in SQLite next to your CPU.

## Install

```bash
cargo install --path crates/lexa-cli       # the `lexa` CLI
cargo install --path crates/lexa-mcp       # the `lexa-mcp` MCP server
```

Or run from a clone:

```bash
cargo build --workspace --release
./target/release/lexa --help
```

The first time you run a real-embedding command, fastembed downloads the
Nomic v1.5-Q ONNX (~110 MB) and the BGE-reranker-base ONNX (~280 MB) into
`./.fastembed_cache/`. Subsequent runs reuse the cache.

## CLI

```text
lexa index <path> [--db <path>]
lexa search <query> [--tier instant|dense|fast|deep|auto] [--limit N] [--json] [--db <path>]
lexa purge <path> [--db <path>]
lexa status [--db <path>]
lexa watch <path> [--db <path>]
```

Default DB is `~/.lexa/index.sqlite`. `--hash-embeddings` swaps to the
deterministic FNV-1a hash backend for tests / offline runs.

`--json` produces a stable JSON shape with `path`, `line_start`,
`line_end`, `score`, `excerpt`, and a `breakdown` object exposing the
RRF inputs, rerank score (deep only), and the routed tier (auto only).

## MCP server

Add to your MCP client config (Claude Desktop / Claude Code / Cursor):

```json
{
  "mcpServers": {
    "lexa": {
      "command": "lexa-mcp",
      "env": { "LEXA_DB": "/Users/you/.lexa/index.sqlite" }
    }
  }
}
```

Tools:

- `search_files(query, tier?, limit?)`
- `index_path(path)`
- `list_indexed_paths()`
- `purge_path(path)`
- `status()`

stderr is the only log channel; stdout is reserved for the JSON-RPC
stream so the protocol stays clean.

## Library

```toml
[dependencies]
lexa-core = "0.1"
```

```rust
use lexa_core::{open, EmbeddingConfig, SearchOptions};

let mut db = open("/tmp/lexa.sqlite", EmbeddingConfig::default())?;
db.index_path("/path/to/repo")?;

let hits = db.search(&SearchOptions::new("hybrid retrieval implementation"))?;
for hit in hits {
    println!("{}:{}-{}  {:.4}  {}", hit.path, hit.line_start, hit.line_end, hit.score, hit.excerpt);
}
```

The default `SearchOptions` uses the `auto` tier and limit 10. Set
`tier: SearchTier::Deep` and populate `additional_queries` for Exa-style
multi-query deep search.

## Lexa for Obsidian — 60 seconds to a working setup

Ask your Obsidian vault questions through Codex, Claude Desktop,
Cursor, Claude Code, or any MCP client. **Local-first** — your notes
never leave your machine, no API keys, no cloud round-trips.

```bash
curl -fsSL https://raw.githubusercontent.com/rishiskhare/lexa/main/scripts/install.sh | sh
lexa-obsidian setup
```

`setup` is interactive: it asks for your vault path, optionally
pre-indexes (recommended for >1 000-note vaults), writes the right MCP
config block into `~/.codex/config.toml` (and Claude Desktop / Claude
Code if you opt in), and drops an `AGENTS.md` in your vault root so
agents route note questions through Lexa **without** having to be
prompted with "Use lexa-obsidian.".

Restart your MCP client and try:

```text
> what did I write about <some topic>?
> list my top 10 tags
> show me backlinks for "<some note name>"
> find notes similar to "<some note>"
```

The agent picks the right tool from the natural-language phrasing.

### What the AI gets

| When you ask                          | The AI calls       | What it returns                                                  |
|---------------------------------------|--------------------|-------------------------------------------------------------------|
| "What did I write about X?"           | `search_notes`     | Top notes ranked by hybrid (BM25 + dense + reranker) score, with title, path, line range, headline excerpt, tags, and the routed tier. |
| "Show me my note titled Y"            | `get_note`         | Frontmatter + body + outgoing/incoming wiki-links + tags. Optionally a single block by `^id`. |
| "What links to Y?"                    | `find_backlinks`   | Every linking note with the alias / header / block id used.       |
| "Find notes similar to Y"             | `get_similar`      | Semantic neighbours of the seed note (excluding itself).          |
| "What tags do I use most?"            | `list_tags`        | Top tags by usage, optional prefix filter.                         |
| "Re-index" / "drop the index"         | `index_vault` / `purge_vault` | Maintenance.                                          |

### Subcommands

```text
lexa-obsidian setup            # interactive bootstrap (most users only need this)
lexa-obsidian doctor           # diagnose every common failure mode
lexa-obsidian models prefetch  # download retrieval models (~390 MB) ahead of time
lexa-obsidian --vault <path> index
lexa-obsidian --vault <path> status
lexa-obsidian --vault <path> tags [--prefix X] [--limit N]
lexa-obsidian --vault <path> backlinks <note>
lexa-obsidian --vault <path> search <query> [--tier auto|fast|deep] [--tag X] [--folder Y] [--json]
lexa-obsidian --vault <path> watch
```

`--vault` falls back to `LEXA_OBSIDIAN_VAULT`. The DB path defaults to
`~/.lexa/obsidian-<sha-of-vault>.sqlite` so two distinct vaults never
share an index.

### What gets parsed

- **Frontmatter** (`title:`, `aliases:`, `tags:` + arbitrary custom
  fields preserved in `note_metadata.raw_json`). Stripped *before*
  embedding so it doesn't pollute the vector representation.
- **Wiki-links**`[[Note]]`, `[[Note|Alias]]`, `[[Note#Header]]`,
  `[[Note^block-id]]`, `![[Embed]]`. Stored in `note_links`; backlinks
  are a single SQL JOIN.
- **Tags** — frontmatter `tags:` (string, list, or comma-string) plus
  inline `#tag` (including nested `#project/lexa`). Lowercase-
  normalised. Code fences and heading lines are correctly skipped.
- **Block ids** — trailing `^block-id` markers persist into
  `note_blocks` and are queryable through `get_note { block: "^abc" }`.

### Schema (sidecar tables in the same SQLite file)

```sql
note_metadata (doc_id PK, title, aliases_json, raw_json)
note_links    (id PK, src_doc_id, target_name, target_path, header, block_id, alias, kind)
note_tags     (doc_id, tag, PRIMARY KEY(doc_id, tag))
note_blocks   (chunk_id PK, doc_id, block_id)
```

`ON DELETE CASCADE` rides on `documents.id`, so purging a path cleans
the sidecars automatically.

### Indexing UX

Lexa indexes in the **background** as soon as the MCP server starts.
While indexing is in flight, content-bearing tool calls (`search_notes`,
`get_note`, `get_similar`, `find_backlinks`) return a fast `{indexing:
true, notes_seen, elapsed_seconds}` payload instead of blocking — so
Codex never appears hung. For large vaults (>1 000 notes) running
`lexa-obsidian setup` once with the pre-index step (or `lexa-obsidian
index` ahead of time) eliminates the wait entirely.

### Privacy + threat model

- 100 % local. Network calls: model downloads on first run (Nomic v1.5
  ONNX ~110 MB, BGE reranker ~280 MB), nothing after. No telemetry, no
  analytics, no API keys.
- Read-only on your vault. The MCP server does not create, edit, or
  delete notes.
- The MCP server only spawns the `lexa-obsidian-mcp` binary itself,
  never user-supplied subprocesses.
- Verify yourself: `tcpdump -i any host huggingface.co` for ten minutes
  of usage shows zero traffic after the model cache is hot.

For more, see [`docs/FAQ.md`](docs/FAQ.md) and
[`docs/adr/006-obsidian.md`](docs/adr/006-obsidian.md).

## Benchmarks

Five harnesses, fully reproducible. Numbers below are warm-state on
M-series macOS arm64, release build, real Nomic v1.5-Q. See
[`docs/BENCHMARKS.md`](docs/BENCHMARKS.md) for hardware details, full
methodology, and the date each number was measured.

### Harness A — Latency

2 000 synthetic Markdown docs, 500 iterations / tier, fixed query set:

| Tier    | p50      | p95      | p99      |
|---------|----------|----------|----------|
| instant | 245 µs   | 840 µs   | 861 µs   |
| dense   | 8.97 ms  | 9.82 ms  | 10.20 ms |
| fast    | 9.00 ms  | 9.92 ms  | 10.19 ms |
| deep    | 261 ms   | 298 ms   | 313 ms   |

Pairs with a Criterion bench (`cargo bench -p lexa-bench --bench latency`)
and a CI gate that fails if fast-tier p50 > 400 ms on shared GitHub
Actions runners.

### Harness B — BEIR retrieval quality (SciFact, 100 queries)

| Tier    | nDCG@10  | MRR@10 | Recall@100 | p50      | p95      |
|---------|----------|--------|------------|----------|----------|
| instant | 0.6560   | 0.6184 | 0.8680     | 3 ms     | 5 ms     |
| fast    | 0.6778   | 0.6395 | 0.8980     | 17 ms    | 22 ms    |
| deep    | **0.7042** | **0.6674** | 0.8360 | 2.57 s   | 2.81 s   |

Hybrid lifts BM25-only by +2.2 nDCG points; deep adds another +2.6 nDCG
on top, **eliminating the previous deep-tier regression** caused by
unbounded reranker logits overriding RRF. Beats the published BEIR BM25
SciFact baseline (~0.665) at p95 < 25 ms on the fast tier.

### Harness C — Agent quality (20 NL queries on this repo)

| Tool                 | Tier      | Correct | Accuracy | Median latency |
|----------------------|-----------|---------|----------|----------------|
| `lexa` (Nomic)       | **auto**  | 16 / 20 | **0.80** | 11 ms          |
| `lexa` (Nomic)       | fast      | 15 / 20 | 0.75     | 10 ms          |
| `grep -rE`           | external  |  0 / 20 | 0.00     |  8 ms          |

`auto` outperforms `fast` because the router sends single-identifier
queries (`vec_quantize_binary`, `LexaDb::open`) straight to BM25-only
`instant`, where exact-symbol lookups beat hybrid scoring.

### Harness D — Head-to-head against external CLIs

Wraps any external command (`grep`, `rg`, `qmd-cli`, ...) and runs the
same query set. Reports per-tool latency and match rate against expected
file paths. See `lexa-bench compare --help`.

### Harness E — SimpleQA-style LLM-as-judge

Mirrors Exa's [evaluation methodology](https://exa.ai/blog/evals-at-exa):
hand-curated factual questions, scored on the five-dim rubric (relevance,
authority, content_issues, evaluator_confidence, overall) in `[0, 1]`.

```bash
# Local-first: judge is whatever's running in Ollama.
cargo run -p lexa-bench --release -- simpleqa \
  --queries bench/simpleqa/questions.json --corpus . \
  --tier auto --judge ollama --judge-model qwen3:8b \
  --real-embeddings --json bench-results/simpleqa.json
```

A deterministic `--judge mock` backend exists for CI smoke runs that
need to verify wiring without a model download.

### Reproducers

```bash
# Harness A — latency (writes JSON, gates CI)
cargo run -p lexa-bench --release -- latency \
  --db /tmp/lexa.sqlite --docs 2000 --iterations 500 \
  --real-embeddings --json bench-results/latency-nomic.json

# Harness A — Criterion (HTML reports under target/criterion/)
cargo bench -p lexa-bench --bench latency

# Harness B — BEIR
cargo run -p lexa-bench --release -- beir scifact --download \
  --db /tmp/lexa-scifact.sqlite --real-embeddings \
  --tiers instant,dense,fast,deep --max-queries 100 \
  --json bench-results/scifact.json

# Harness C — agent (auto tier on the lexa repo)
cargo run -p lexa-bench --release -- agent \
  --queries bench/agent/queries.json --corpus . \
  --tool lexa --tier auto --real-embeddings \
  --db /tmp/lexa-agent.sqlite \
  --json bench-results/agent-auto.json

# Harness D — head-to-head with grep
cargo run -p lexa-bench --release -- compare \
  --queries bench/agent/queries.json --corpus . \
  --command "grep -rEln {query} {corpus}/crates" --label grep \
  --json bench-results/compare-grep.json

# Harness E — SimpleQA (mock judge for CI)
cargo run -p lexa-bench --release -- simpleqa \
  --queries bench/simpleqa/questions.json --corpus . \
  --tier auto --judge mock --real-embeddings \
  --json bench-results/simpleqa.json
```

## Project layout

```text
lexa/
├── Cargo.toml                     # workspace
├── README.md                      # this file
├── crates/
│   ├── lexa-core/                 # library: chunking, embed, retrieval
│   ├── lexa-cli/                  # `lexa` binary
│   ├── lexa-mcp/                  # `lexa-mcp` rmcp stdio server
│   └── lexa-bench/                # `lexa-bench` — five harnesses
├── docs/
│   ├── ARCHITECTURE.md            # this is the design doc
│   ├── BENCHMARKS.md              # full benchmark methodology
│   └── adr/000–005-*.md           # one-page decisions
├── bench/
│   ├── agent/queries.json         # 20 NL queries against this repo
│   ├── agent/SKILL.md             # full agent-loop spec (Anthropic API)
│   └── simpleqa/questions.json    # SimpleQA seed set
├── bench-results/                 # committed JSON artifacts
└── tests/fixtures/sample/         # tiny corpus for tests
```

## License

Dual-licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE] or
  <https://www.apache.org/licenses/LICENSE-2.0>)
- MIT license ([LICENSE-MIT] or
  <https://opensource.org/licenses/MIT>)

at your option.