Lexa — local Exa
Hybrid retrieval over your local files and code, in a single static Rust binary. Lexa applies the architecture of Exa — five latency-tiered search modes, hybrid BM25 + dense + RRF, two-stage Matryoshka KNN, binary-quantized vectors, query-aware highlights, deep reranking with optional query expansion, LLM-as-judge evaluation — to the corpus already on your disk.
crates/api/src/limiter.rs:48-72 0.7141
if !backend.is_healthy().await { tracing::warn!("redis down, switching to in-memory backoff");
return self.fallback.acquire(key).await; }
Highlights
- Single static binary, no daemon, no Python, no Docker. SQLite (with
FTS5 and
sqlite-vec) is the entire backend. - Sub-10 ms
fasttier on real Nomic-v1.5 embeddings (M-series warm-state, 2 000 docs, 500 iterations). 38× faster than the published Exa Fast latency budget. - Five search tiers —
instant,dense,fast,deep,auto— mirroring Exa's tiered API. - Two-stage Matryoshka KNN (256-bit preview → 768-bit re-score) the same way Exa runs prefix-256 over their 4096-dim embeddings.
- Deep tier with query expansion (
additional_queries) and a sigmoid- blended cross-encoder reranker that fixes the override-RRF failure mode. - Query-aware highlights — sentence-level span extraction, the same idea behind Exa's contents API "highlights".
- Five reproducible benchmark harnesses, full-methodology JSON
artifacts, CI gate. See
docs/BENCHMARKS.md. - MCP server (
lexa-mcp) over stdio so any Anthropic-MCP client (Claude Desktop, Claude Code, Cursor, etc.) getssearch_files,index_path,purge_path, and friends for free.
How Lexa maps to Exa
| Exa concept | Lexa equivalent |
|---|---|
| Instant tier (<200 ms, BM25) | lexa search --tier instant — FTS5 BM25, p50 ~250 µs. |
| Fast tier (~350 ms, neural) | lexa search --tier dense (KNN-only) or --tier fast (hybrid). p50 ~9 ms. |
| Auto tier (~1 s, intelligent) | lexa search --tier auto — query router in classify_query. Default tier. |
| Deep tier (5-60 s, agentic) | lexa search --tier deep + SearchOptions::additional_queries for additionalQueries-style fan-out. |
| Hybrid retrieval (BM25 + dense) | RRF (k=60) over FTS5 BM25 and binary-quantized vector KNN, run concurrently. See Exa: Composing a Search Engine. |
| BM25 optimizations | FTS5's built-in BM25 implementation; OR-of-quoted-tokens query construction with a curated stopword set. (Lexa doesn't reimplement Exa's six posting-list compression tricks — local corpora don't justify them.) |
| Matryoshka prefix | Nomic v1.5-Q (768d, MRL-trained at {64, 128, 256, 512, 768}); vectors_bin_preview bit[256] table for first-stage KNN. See Exa 2.0: building a web-scale vector DB. |
| Binary quantization | sqlite-vec's vec_quantize_binary() and bit[N] columns; Hamming distance via SIMD intrinsics. 32× storage shrink. |
| Cross-encoder reranking | BAAI/bge-reranker-base over top-15 fused candidates, sigmoid-blended at α = 0.7 with the RRF score. |
| Highlights / contents API | search.rs::highlight — query-token-overlap-scored sentence span, ~10× LLM-token reduction vs full chunks. |
additionalQueries |
SearchOptions::additional_queries: Vec<String>; the deep tier fans out N+1 queries, RRF-fuses them, then reranks. The bench harness includes an Ollama-backed reformulation helper. |
| LLM-as-judge eval (5-dim rubric) | lexa-bench simpleqa — Harness E. Scores relevance, authority, content_issues, evaluator_confidence, overall in [0, 1]. Default judge is local Ollama running qwen3:8b. See Exa: Evaluating Search. |
What Lexa doesn't clone:
- Crawl freshness — Lexa indexes static local trees, not the web.
- Websets-scale entity finding — billions of records / async enrichment pipelines aren't a single-binary local feature.
- Authority / domain reputation signals — those are web-graph specific.
The local-first tradeoff is what makes the latency budget viable. Exa Fast targets <500 ms because it's reaching across a planet-scale index; Lexa Fast hits 9 ms because everything is in SQLite next to your CPU.
Install
Or run from a clone:
The first time you run a real-embedding command, fastembed downloads the
Nomic v1.5-Q ONNX (~110 MB) and the BGE-reranker-base ONNX (~280 MB) into
./.fastembed_cache/. Subsequent runs reuse the cache.
CLI
lexa index <path> [--db <path>]
lexa search <query> [--tier instant|dense|fast|deep|auto] [--limit N] [--json] [--db <path>]
lexa purge <path> [--db <path>]
lexa status [--db <path>]
lexa watch <path> [--db <path>]
Default DB is ~/.lexa/index.sqlite. --hash-embeddings swaps to the
deterministic FNV-1a hash backend for tests / offline runs.
--json produces a stable JSON shape with path, line_start,
line_end, score, excerpt, and a breakdown object exposing the
RRF inputs, rerank score (deep only), and the routed tier (auto only).
MCP server
Add to your MCP client config (Claude Desktop / Claude Code / Cursor):
Tools:
search_files(query, tier?, limit?)index_path(path)list_indexed_paths()purge_path(path)status()
stderr is the only log channel; stdout is reserved for the JSON-RPC stream so the protocol stays clean.
Library
[]
= "0.1"
use ;
let mut db = open?;
db.index_path?;
let hits = db.search?;
for hit in hits
The default SearchOptions uses the auto tier and limit 10. Set
tier: SearchTier::Deep and populate additional_queries for Exa-style
multi-query deep search.
Lexa for Obsidian — 60 seconds to a working setup
Ask your Obsidian vault questions through Codex, Claude Desktop, Cursor, Claude Code, or any MCP client. Local-first — your notes never leave your machine, no API keys, no cloud round-trips.
|
setup is interactive: it asks for your vault path, optionally
pre-indexes (recommended for >1 000-note vaults), writes the right MCP
config block into ~/.codex/config.toml (and Claude Desktop / Claude
Code if you opt in), and drops an AGENTS.md in your vault root so
agents route note questions through Lexa without having to be
prompted with "Use lexa-obsidian.".
Restart your MCP client and try:
> what did I write about <some topic>?
> list my top 10 tags
> show me backlinks for "<some note name>"
> find notes similar to "<some note>"
The agent picks the right tool from the natural-language phrasing.
What the AI gets
| When you ask | The AI calls | What it returns |
|---|---|---|
| "What did I write about X?" | search_notes |
Top notes ranked by hybrid (BM25 + dense + reranker) score, with title, path, line range, headline excerpt, tags, and the routed tier. |
| "Show me my note titled Y" | get_note |
Frontmatter + body + outgoing/incoming wiki-links + tags. Optionally a single block by ^id. |
| "What links to Y?" | find_backlinks |
Every linking note with the alias / header / block id used. |
| "Find notes similar to Y" | get_similar |
Semantic neighbours of the seed note (excluding itself). |
| "What tags do I use most?" | list_tags |
Top tags by usage, optional prefix filter. |
| "Re-index" / "drop the index" | index_vault / purge_vault |
Maintenance. |
Subcommands
lexa-obsidian setup # interactive bootstrap (most users only need this)
lexa-obsidian doctor # diagnose every common failure mode
lexa-obsidian models prefetch # download retrieval models (~390 MB) ahead of time
lexa-obsidian --vault <path> index
lexa-obsidian --vault <path> status
lexa-obsidian --vault <path> tags [--prefix X] [--limit N]
lexa-obsidian --vault <path> backlinks <note>
lexa-obsidian --vault <path> search <query> [--tier auto|fast|deep] [--tag X] [--folder Y] [--json]
lexa-obsidian --vault <path> watch
--vault falls back to LEXA_OBSIDIAN_VAULT. The DB path defaults to
~/.lexa/obsidian-<sha-of-vault>.sqlite so two distinct vaults never
share an index.
What gets parsed
- Frontmatter (
title:,aliases:,tags:+ arbitrary custom fields preserved innote_metadata.raw_json). Stripped before embedding so it doesn't pollute the vector representation. - Wiki-links —
[[Note]],[[Note|Alias]],[[Note#Header]],[[Note^block-id]],![[Embed]]. Stored innote_links; backlinks are a single SQL JOIN. - Tags — frontmatter
tags:(string, list, or comma-string) plus inline#tag(including nested#project/lexa). Lowercase- normalised. Code fences and heading lines are correctly skipped. - Block ids — trailing
^block-idmarkers persist intonote_blocksand are queryable throughget_note { block: "^abc" }.
Schema (sidecar tables in the same SQLite file)
note_metadata (doc_id PK, title, aliases_json, raw_json)
note_links (id PK, src_doc_id, target_name, target_path, header, block_id, alias, kind)
note_tags (doc_id, tag, PRIMARY KEY(doc_id, tag))
note_blocks (chunk_id PK, doc_id, block_id)
ON DELETE CASCADE rides on documents.id, so purging a path cleans
the sidecars automatically.
Indexing UX
Lexa indexes in the background as soon as the MCP server starts.
While indexing is in flight, content-bearing tool calls (search_notes,
get_note, get_similar, find_backlinks) return a fast {indexing: true, notes_seen, elapsed_seconds} payload instead of blocking — so
Codex never appears hung. For large vaults (>1 000 notes) running
lexa-obsidian setup once with the pre-index step (or lexa-obsidian index ahead of time) eliminates the wait entirely.
Privacy + threat model
- 100 % local. Network calls: model downloads on first run (Nomic v1.5 ONNX ~110 MB, BGE reranker ~280 MB), nothing after. No telemetry, no analytics, no API keys.
- Read-only on your vault. The MCP server does not create, edit, or delete notes.
- The MCP server only spawns the
lexa-obsidian-mcpbinary itself, never user-supplied subprocesses. - Verify yourself:
tcpdump -i any host huggingface.cofor ten minutes of usage shows zero traffic after the model cache is hot.
For more, see docs/FAQ.md and
docs/adr/006-obsidian.md.
Benchmarks
Five harnesses, fully reproducible. Numbers below are warm-state on
M-series macOS arm64, release build, real Nomic v1.5-Q. See
docs/BENCHMARKS.md for hardware details, full
methodology, and the date each number was measured.
Harness A — Latency
2 000 synthetic Markdown docs, 500 iterations / tier, fixed query set:
| Tier | p50 | p95 | p99 |
|---|---|---|---|
| instant | 245 µs | 840 µs | 861 µs |
| dense | 8.97 ms | 9.82 ms | 10.20 ms |
| fast | 9.00 ms | 9.92 ms | 10.19 ms |
| deep | 261 ms | 298 ms | 313 ms |
Pairs with a Criterion bench (cargo bench -p lexa-bench --bench latency)
and a CI gate that fails if fast-tier p50 > 400 ms on shared GitHub
Actions runners.
Harness B — BEIR retrieval quality (SciFact, 100 queries)
| Tier | nDCG@10 | MRR@10 | Recall@100 | p50 | p95 |
|---|---|---|---|---|---|
| instant | 0.6560 | 0.6184 | 0.8680 | 3 ms | 5 ms |
| fast | 0.6778 | 0.6395 | 0.8980 | 17 ms | 22 ms |
| deep | 0.7042 | 0.6674 | 0.8360 | 2.57 s | 2.81 s |
Hybrid lifts BM25-only by +2.2 nDCG points; deep adds another +2.6 nDCG on top, eliminating the previous deep-tier regression caused by unbounded reranker logits overriding RRF. Beats the published BEIR BM25 SciFact baseline (~0.665) at p95 < 25 ms on the fast tier.
Harness C — Agent quality (20 NL queries on this repo)
| Tool | Tier | Correct | Accuracy | Median latency |
|---|---|---|---|---|
lexa (Nomic) |
auto | 16 / 20 | 0.80 | 11 ms |
lexa (Nomic) |
fast | 15 / 20 | 0.75 | 10 ms |
grep -rE |
external | 0 / 20 | 0.00 | 8 ms |
auto outperforms fast because the router sends single-identifier
queries (vec_quantize_binary, LexaDb::open) straight to BM25-only
instant, where exact-symbol lookups beat hybrid scoring.
Harness D — Head-to-head against external CLIs
Wraps any external command (grep, rg, qmd-cli, ...) and runs the
same query set. Reports per-tool latency and match rate against expected
file paths. See lexa-bench compare --help.
Harness E — SimpleQA-style LLM-as-judge
Mirrors Exa's evaluation methodology:
hand-curated factual questions, scored on the five-dim rubric (relevance,
authority, content_issues, evaluator_confidence, overall) in [0, 1].
# Local-first: judge is whatever's running in Ollama.
A deterministic --judge mock backend exists for CI smoke runs that
need to verify wiring without a model download.
Reproducers
# Harness A — latency (writes JSON, gates CI)
# Harness A — Criterion (HTML reports under target/criterion/)
# Harness B — BEIR
# Harness C — agent (auto tier on the lexa repo)
# Harness D — head-to-head with grep
# Harness E — SimpleQA (mock judge for CI)
Project layout
lexa/
├── Cargo.toml # workspace
├── README.md # this file
├── crates/
│ ├── lexa-core/ # library: chunking, embed, retrieval
│ ├── lexa-cli/ # `lexa` binary
│ ├── lexa-mcp/ # `lexa-mcp` rmcp stdio server
│ └── lexa-bench/ # `lexa-bench` — five harnesses
├── docs/
│ ├── ARCHITECTURE.md # this is the design doc
│ ├── BENCHMARKS.md # full benchmark methodology
│ └── adr/000–005-*.md # one-page decisions
├── bench/
│ ├── agent/queries.json # 20 NL queries against this repo
│ ├── agent/SKILL.md # full agent-loop spec (Anthropic API)
│ └── simpleqa/questions.json # SimpleQA seed set
├── bench-results/ # committed JSON artifacts
└── tests/fixtures/sample/ # tiny corpus for tests
License
Dual-licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE] or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT] or https://opensource.org/licenses/MIT)
at your option.