# Lexa — local Exa
> Hybrid retrieval over your local files and code, in a single static Rust
> binary. Lexa applies the architecture of [Exa](https://exa.ai/) — five
> latency-tiered search modes, hybrid BM25 + dense + RRF, two-stage
> Matryoshka KNN, binary-quantized vectors, query-aware highlights, deep
> reranking with optional query expansion, LLM-as-judge evaluation — to
> the corpus already on your disk.
```bash
lexa index ~/repos/myproject
lexa search "where does the rate limiter back off when redis is down"
```
```text
crates/api/src/limiter.rs:48-72 0.7141
if !backend.is_healthy().await { tracing::warn!("redis down, switching to in-memory backoff");
return self.fallback.acquire(key).await; }
```
## Highlights
- **Single static binary**, no daemon, no Python, no Docker. SQLite (with
FTS5 and `sqlite-vec`) is the entire backend.
- **Sub-10 ms `fast` tier** on real Nomic-v1.5 embeddings (M-series
warm-state, 2 000 docs, 500 iterations). 38× faster than the published
Exa Fast latency budget.
- **Five search tiers** — `instant`, `dense`, `fast`, `deep`, `auto` —
mirroring Exa's tiered API.
- **Two-stage Matryoshka KNN** (256-bit preview → 768-bit re-score) the
same way Exa runs prefix-256 over their 4096-dim embeddings.
- **Deep tier with query expansion** (`additional_queries`) and a sigmoid-
blended cross-encoder reranker that fixes the override-RRF failure
mode.
- **Query-aware highlights** — sentence-level span extraction, the same
idea behind Exa's [contents API "highlights"](https://exa.ai/docs/reference/contents).
- **Five reproducible benchmark harnesses**, full-methodology JSON
artifacts, CI gate. See [`docs/BENCHMARKS.md`](docs/BENCHMARKS.md).
- **MCP server** (`lexa-mcp`) over stdio so any Anthropic-MCP client
(Claude Desktop, Claude Code, Cursor, etc.) gets `search_files`,
`index_path`, `purge_path`, and friends for free.
## How Lexa maps to Exa
| Instant tier (<200 ms, BM25) | `lexa search --tier instant` — FTS5 BM25, p50 ~250 µs. |
| Fast tier (~350 ms, neural) | `lexa search --tier dense` (KNN-only) or `--tier fast` (hybrid). p50 ~9 ms. |
| Auto tier (~1 s, intelligent) | `lexa search --tier auto` — query router in `classify_query`. Default tier. |
| Deep tier (5-60 s, agentic) | `lexa search --tier deep` + `SearchOptions::additional_queries` for [`additionalQueries`-style](https://exa.ai/blog/exa-deep) fan-out. |
| Hybrid retrieval (BM25 + dense) | RRF (k=60) over FTS5 BM25 and binary-quantized vector KNN, run concurrently. See [Exa: Composing a Search Engine](https://exa.ai/blog/composing-a-search-engine). |
| BM25 optimizations | FTS5's built-in BM25 implementation; OR-of-quoted-tokens query construction with a curated stopword set. (Lexa doesn't reimplement Exa's six [posting-list compression tricks](https://exa.ai/blog/bm25-optimization) — local corpora don't justify them.) |
| Matryoshka prefix | Nomic v1.5-Q (768d, MRL-trained at `{64, 128, 256, 512, 768}`); `vectors_bin_preview bit[256]` table for first-stage KNN. See [Exa 2.0: building a web-scale vector DB](https://exa.ai/blog/exa-api-2-0). |
| Binary quantization | `sqlite-vec`'s `vec_quantize_binary()` and `bit[N]` columns; Hamming distance via SIMD intrinsics. 32× storage shrink. |
| Cross-encoder reranking | `BAAI/bge-reranker-base` over top-15 fused candidates, sigmoid-blended at α = 0.7 with the RRF score. |
| Highlights / contents API | `search.rs::highlight` — query-token-overlap-scored sentence span, ~10× LLM-token reduction vs full chunks. |
| `additionalQueries` | `SearchOptions::additional_queries: Vec<String>`; the deep tier fans out N+1 queries, RRF-fuses them, then reranks. The bench harness includes an Ollama-backed reformulation helper. |
| LLM-as-judge eval (5-dim rubric) | `lexa-bench simpleqa` — Harness E. Scores relevance, authority, content_issues, evaluator_confidence, overall in [0, 1]. Default judge is local Ollama running `qwen3:8b`. See [Exa: Evaluating Search](https://exa.ai/blog/evals-at-exa). |
What Lexa **doesn't** clone:
- Crawl freshness — Lexa indexes static local trees, not the web.
- Websets-scale entity finding — billions of records / async enrichment
pipelines aren't a single-binary local feature.
- Authority / domain reputation signals — those are web-graph specific.
The local-first tradeoff is what makes the latency budget viable. Exa
Fast targets <500 ms because it's reaching across a planet-scale index;
Lexa Fast hits 9 ms because everything is in SQLite next to your CPU.
## Install
```bash
cargo install --path crates/lexa-cli # the `lexa` CLI
cargo install --path crates/lexa-mcp # the `lexa-mcp` MCP server
```
Or run from a clone:
```bash
cargo build --workspace --release
./target/release/lexa --help
```
The first time you run a real-embedding command, fastembed downloads the
Nomic v1.5-Q ONNX (~110 MB) and the BGE-reranker-base ONNX (~280 MB) into
`./.fastembed_cache/`. Subsequent runs reuse the cache.
## CLI
```text
lexa index <path> [--db <path>]
lexa search <query> [--tier instant|dense|fast|deep|auto] [--limit N] [--json] [--db <path>]
lexa purge <path> [--db <path>]
lexa status [--db <path>]
lexa watch <path> [--db <path>]
```
Default DB is `~/.lexa/index.sqlite`. `--hash-embeddings` swaps to the
deterministic FNV-1a hash backend for tests / offline runs.
`--json` produces a stable JSON shape with `path`, `line_start`,
`line_end`, `score`, `excerpt`, and a `breakdown` object exposing the
RRF inputs, rerank score (deep only), and the routed tier (auto only).
## MCP server
Add to your MCP client config (Claude Desktop / Claude Code / Cursor):
```json
{
"mcpServers": {
"lexa": {
"command": "lexa-mcp",
"env": { "LEXA_DB": "/Users/you/.lexa/index.sqlite" }
}
}
}
```
Tools:
- `search_files(query, tier?, limit?)`
- `index_path(path)`
- `list_indexed_paths()`
- `purge_path(path)`
- `status()`
stderr is the only log channel; stdout is reserved for the JSON-RPC
stream so the protocol stays clean.
## Library
```toml
[dependencies]
lexa-core = "0.1"
```
```rust
use lexa_core::{open, EmbeddingConfig, SearchOptions};
let mut db = open("/tmp/lexa.sqlite", EmbeddingConfig::default())?;
db.index_path("/path/to/repo")?;
let hits = db.search(&SearchOptions::new("hybrid retrieval implementation"))?;
for hit in hits {
println!("{}:{}-{} {:.4} {}", hit.path, hit.line_start, hit.line_end, hit.score, hit.excerpt);
}
```
The default `SearchOptions` uses the `auto` tier and limit 10. Set
`tier: SearchTier::Deep` and populate `additional_queries` for Exa-style
multi-query deep search.
## Lexa for Obsidian — 60 seconds to a working setup
Ask your Obsidian vault questions through Codex, Claude Desktop,
Cursor, Claude Code, or any MCP client. **Local-first** — your notes
never leave your machine, no API keys, no cloud round-trips.
```bash
```
`setup` is interactive: it asks for your vault path, optionally
pre-indexes (recommended for >1 000-note vaults), writes the right MCP
config block into `~/.codex/config.toml` (and Claude Desktop / Claude
Code if you opt in), and drops an `AGENTS.md` in your vault root so
agents route note questions through Lexa **without** having to be
prompted with "Use lexa-obsidian.".
Restart your MCP client and try:
```text
> what did I write about <some topic>?
> list my top 10 tags
> show me backlinks for "<some note name>"
> find notes similar to "<some note>"
```
The agent picks the right tool from the natural-language phrasing.
### What the AI gets
| When you ask | The AI calls | What it returns |
|---------------------------------------|--------------------|-------------------------------------------------------------------|
| "What did I write about X?" | `search_notes` | Top notes ranked by hybrid (BM25 + dense + reranker) score, with title, path, line range, headline excerpt, tags, and the routed tier. |
| "Show me my note titled Y" | `get_note` | Frontmatter + body + outgoing/incoming wiki-links + tags. Optionally a single block by `^id`. |
| "What links to Y?" | `find_backlinks` | Every linking note with the alias / header / block id used. |
| "Find notes similar to Y" | `get_similar` | Semantic neighbours of the seed note (excluding itself). |
| "What tags do I use most?" | `list_tags` | Top tags by usage, optional prefix filter. |
| "Re-index" / "drop the index" | `index_vault` / `purge_vault` | Maintenance. |
### Subcommands
```text
lexa-obsidian setup # interactive bootstrap (most users only need this)
lexa-obsidian doctor # diagnose every common failure mode
lexa-obsidian models prefetch # download retrieval models (~390 MB) ahead of time
lexa-obsidian --vault <path> index
lexa-obsidian --vault <path> status
lexa-obsidian --vault <path> tags [--prefix X] [--limit N]
lexa-obsidian --vault <path> backlinks <note>
lexa-obsidian --vault <path> search <query> [--tier auto|fast|deep] [--tag X] [--folder Y] [--json]
lexa-obsidian --vault <path> watch
```
`--vault` falls back to `LEXA_OBSIDIAN_VAULT`. The DB path defaults to
`~/.lexa/obsidian-<sha-of-vault>.sqlite` so two distinct vaults never
share an index.
### What gets parsed
- **Frontmatter** (`title:`, `aliases:`, `tags:` + arbitrary custom
fields preserved in `note_metadata.raw_json`). Stripped *before*
embedding so it doesn't pollute the vector representation.
- **Wiki-links** — `[[Note]]`, `[[Note|Alias]]`, `[[Note#Header]]`,
`[[Note^block-id]]`, `![[Embed]]`. Stored in `note_links`; backlinks
are a single SQL JOIN.
- **Tags** — frontmatter `tags:` (string, list, or comma-string) plus
inline `#tag` (including nested `#project/lexa`). Lowercase-
normalised. Code fences and heading lines are correctly skipped.
- **Block ids** — trailing `^block-id` markers persist into
`note_blocks` and are queryable through `get_note { block: "^abc" }`.
### Schema (sidecar tables in the same SQLite file)
```sql
note_metadata (doc_id PK, title, aliases_json, raw_json)
note_links (id PK, src_doc_id, target_name, target_path, header, block_id, alias, kind)
note_tags (doc_id, tag, PRIMARY KEY(doc_id, tag))
note_blocks (chunk_id PK, doc_id, block_id)
```
`ON DELETE CASCADE` rides on `documents.id`, so purging a path cleans
the sidecars automatically.
### Indexing UX
Lexa indexes in the **background** as soon as the MCP server starts.
While indexing is in flight, content-bearing tool calls (`search_notes`,
`get_note`, `get_similar`, `find_backlinks`) return a fast `{indexing:
true, notes_seen, elapsed_seconds}` payload instead of blocking — so
Codex never appears hung. For large vaults (>1 000 notes) running
`lexa-obsidian setup` once with the pre-index step (or `lexa-obsidian
index` ahead of time) eliminates the wait entirely.
### Privacy + threat model
- 100 % local. Network calls: model downloads on first run (Nomic v1.5
ONNX ~110 MB, BGE reranker ~280 MB), nothing after. No telemetry, no
analytics, no API keys.
- Read-only on your vault. The MCP server does not create, edit, or
delete notes.
- The MCP server only spawns the `lexa-obsidian-mcp` binary itself,
never user-supplied subprocesses.
- Verify yourself: `tcpdump -i any host huggingface.co` for ten minutes
of usage shows zero traffic after the model cache is hot.
For more, see [`docs/FAQ.md`](docs/FAQ.md) and
[`docs/adr/006-obsidian.md`](docs/adr/006-obsidian.md).
## Benchmarks
Five harnesses, fully reproducible. Numbers below are warm-state on
M-series macOS arm64, release build, real Nomic v1.5-Q. See
[`docs/BENCHMARKS.md`](docs/BENCHMARKS.md) for hardware details, full
methodology, and the date each number was measured.
### Harness A — Latency
2 000 synthetic Markdown docs, 500 iterations / tier, fixed query set:
| instant | 245 µs | 840 µs | 861 µs |
| dense | 8.97 ms | 9.82 ms | 10.20 ms |
| fast | 9.00 ms | 9.92 ms | 10.19 ms |
| deep | 261 ms | 298 ms | 313 ms |
Pairs with a Criterion bench (`cargo bench -p lexa-bench --bench latency`)
and a CI gate that fails if fast-tier p50 > 400 ms on shared GitHub
Actions runners.
### Harness B — BEIR retrieval quality (SciFact, 100 queries)
| instant | 0.6560 | 0.6184 | 0.8680 | 3 ms | 5 ms |
| fast | 0.6778 | 0.6395 | 0.8980 | 17 ms | 22 ms |
| deep | **0.7042** | **0.6674** | 0.8360 | 2.57 s | 2.81 s |
Hybrid lifts BM25-only by +2.2 nDCG points; deep adds another +2.6 nDCG
on top, **eliminating the previous deep-tier regression** caused by
unbounded reranker logits overriding RRF. Beats the published BEIR BM25
SciFact baseline (~0.665) at p95 < 25 ms on the fast tier.
### Harness C — Agent quality (20 NL queries on this repo)
| `lexa` (Nomic) | **auto** | 16 / 20 | **0.80** | 11 ms |
| `lexa` (Nomic) | fast | 15 / 20 | 0.75 | 10 ms |
| `grep -rE` | external | 0 / 20 | 0.00 | 8 ms |
`auto` outperforms `fast` because the router sends single-identifier
queries (`vec_quantize_binary`, `LexaDb::open`) straight to BM25-only
`instant`, where exact-symbol lookups beat hybrid scoring.
### Harness D — Head-to-head against external CLIs
Wraps any external command (`grep`, `rg`, `qmd-cli`, ...) and runs the
same query set. Reports per-tool latency and match rate against expected
file paths. See `lexa-bench compare --help`.
### Harness E — SimpleQA-style LLM-as-judge
Mirrors Exa's [evaluation methodology](https://exa.ai/blog/evals-at-exa):
hand-curated factual questions, scored on the five-dim rubric (relevance,
authority, content_issues, evaluator_confidence, overall) in `[0, 1]`.
```bash
# Local-first: judge is whatever's running in Ollama.
cargo run -p lexa-bench --release -- simpleqa \
--queries bench/simpleqa/questions.json --corpus . \
--tier auto --judge ollama --judge-model qwen3:8b \
--real-embeddings --json bench-results/simpleqa.json
```
A deterministic `--judge mock` backend exists for CI smoke runs that
need to verify wiring without a model download.
### Reproducers
```bash
# Harness A — latency (writes JSON, gates CI)
cargo run -p lexa-bench --release -- latency \
--db /tmp/lexa.sqlite --docs 2000 --iterations 500 \
--real-embeddings --json bench-results/latency-nomic.json
# Harness A — Criterion (HTML reports under target/criterion/)
cargo bench -p lexa-bench --bench latency
# Harness B — BEIR
cargo run -p lexa-bench --release -- beir scifact --download \
--db /tmp/lexa-scifact.sqlite --real-embeddings \
--tiers instant,dense,fast,deep --max-queries 100 \
--json bench-results/scifact.json
# Harness C — agent (auto tier on the lexa repo)
cargo run -p lexa-bench --release -- agent \
--queries bench/agent/queries.json --corpus . \
--tool lexa --tier auto --real-embeddings \
--db /tmp/lexa-agent.sqlite \
--json bench-results/agent-auto.json
# Harness D — head-to-head with grep
cargo run -p lexa-bench --release -- compare \
--queries bench/agent/queries.json --corpus . \
--command "grep -rEln {query} {corpus}/crates" --label grep \
--json bench-results/compare-grep.json
# Harness E — SimpleQA (mock judge for CI)
cargo run -p lexa-bench --release -- simpleqa \
--queries bench/simpleqa/questions.json --corpus . \
--tier auto --judge mock --real-embeddings \
--json bench-results/simpleqa.json
```
## Project layout
```text
lexa/
├── Cargo.toml # workspace
├── README.md # this file
├── crates/
│ ├── lexa-core/ # library: chunking, embed, retrieval
│ ├── lexa-cli/ # `lexa` binary
│ ├── lexa-mcp/ # `lexa-mcp` rmcp stdio server
│ └── lexa-bench/ # `lexa-bench` — five harnesses
├── docs/
│ ├── ARCHITECTURE.md # this is the design doc
│ ├── BENCHMARKS.md # full benchmark methodology
│ └── adr/000–005-*.md # one-page decisions
├── bench/
│ ├── agent/queries.json # 20 NL queries against this repo
│ ├── agent/SKILL.md # full agent-loop spec (Anthropic API)
│ └── simpleqa/questions.json # SimpleQA seed set
├── bench-results/ # committed JSON artifacts
└── tests/fixtures/sample/ # tiny corpus for tests
```
## License
Dual-licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE] or
<https://www.apache.org/licenses/LICENSE-2.0>)
- MIT license ([LICENSE-MIT] or
<https://opensource.org/licenses/MIT>)
at your option.