Lexa — local Exa

Hybrid retrieval over your local files and code, in a single static Rust binary. Lexa applies the architecture of Exa — five latency-tiered search modes, hybrid BM25 + dense + RRF, two-stage Matryoshka KNN, binary-quantized vectors, query-aware highlights, deep reranking with optional query expansion, LLM-as-judge evaluation — to the corpus already on your disk.

lexa index ~/repos/myproject
lexa search "where does the rate limiter back off when redis is down"

crates/api/src/limiter.rs:48-72   0.7141
  if !backend.is_healthy().await { tracing::warn!("redis down, switching to in-memory backoff");
   return self.fallback.acquire(key).await; }

Highlights

Single static binary, no daemon, no Python, no Docker. SQLite (with FTS5 and sqlite-vec) is the entire backend.
Sub-10 ms fast tier on real Nomic-v1.5 embeddings (M-series warm-state, 2 000 docs, 500 iterations). 38× faster than the published Exa Fast latency budget.
Five search tiers — instant, dense, fast, deep, auto — mirroring Exa's tiered API.
Two-stage Matryoshka KNN (256-bit preview → 768-bit re-score) the same way Exa runs prefix-256 over their 4096-dim embeddings.
Deep tier with query expansion (additional_queries) and a sigmoid- blended cross-encoder reranker that fixes the override-RRF failure mode.
Query-aware highlights — sentence-level span extraction, the same idea behind Exa's contents API "highlights".
Five reproducible benchmark harnesses, full-methodology JSON artifacts, CI gate. See docs/BENCHMARKS.md.
MCP server (lexa-mcp) over stdio so any Anthropic-MCP client (Claude Desktop, Claude Code, Cursor, etc.) gets search_files, index_path, purge_path, and friends for free.

How Lexa maps to Exa

Exa concept	Lexa equivalent
Instant tier (<200 ms, BM25)	`lexa search --tier instant` — FTS5 BM25, p50 ~250 µs.
Fast tier (~350 ms, neural)	`lexa search --tier dense` (KNN-only) or `--tier fast` (hybrid). p50 ~9 ms.
Auto tier (~1 s, intelligent)	`lexa search --tier auto` — query router in `classify_query`. Default tier.
Deep tier (5-60 s, agentic)	`lexa search --tier deep` + `SearchOptions::additional_queries` for `additionalQueries`-style fan-out.
Hybrid retrieval (BM25 + dense)	RRF (k=60) over FTS5 BM25 and binary-quantized vector KNN, run concurrently. See Exa: Composing a Search Engine.
BM25 optimizations	FTS5's built-in BM25 implementation; OR-of-quoted-tokens query construction with a curated stopword set. (Lexa doesn't reimplement Exa's six posting-list compression tricks — local corpora don't justify them.)
Matryoshka prefix	Nomic v1.5-Q (768d, MRL-trained at `{64, 128, 256, 512, 768}`); `vectors_bin_preview bit[256]` table for first-stage KNN. See Exa 2.0: building a web-scale vector DB.
Binary quantization	`sqlite-vec`'s `vec_quantize_binary()` and `bit[N]` columns; Hamming distance via SIMD intrinsics. 32× storage shrink.
Cross-encoder reranking	`BAAI/bge-reranker-base` over top-15 fused candidates, sigmoid-blended at α = 0.7 with the RRF score.
Highlights / contents API	`search.rs::highlight` — query-token-overlap-scored sentence span, ~10× LLM-token reduction vs full chunks.
`additionalQueries`	`SearchOptions::additional_queries: Vec<String>`; the deep tier fans out N+1 queries, RRF-fuses them, then reranks. The bench harness includes an Ollama-backed reformulation helper.
LLM-as-judge eval (5-dim rubric)	`lexa-bench simpleqa` — Harness E. Scores relevance, authority, content_issues, evaluator_confidence, overall in [0, 1]. Default judge is local Ollama running `qwen3:8b`. See Exa: Evaluating Search.

What Lexa doesn't clone:

Crawl freshness — Lexa indexes static local trees, not the web.
Websets-scale entity finding — billions of records / async enrichment pipelines aren't a single-binary local feature.
Authority / domain reputation signals — those are web-graph specific.

The local-first tradeoff is what makes the latency budget viable. Exa Fast targets <500 ms because it's reaching across a planet-scale index; Lexa Fast hits 9 ms because everything is in SQLite next to your CPU.

Install

cargo install --path crates/lexa-cli       # the `lexa` CLI
cargo install --path crates/lexa-mcp       # the `lexa-mcp` MCP server

Or run from a clone:

cargo build --workspace --release
./target/release/lexa --help

The first time you run a real-embedding command, fastembed downloads the Nomic v1.5-Q ONNX (~110 MB) and the BGE-reranker-base ONNX (~280 MB) into ./.fastembed_cache/. Subsequent runs reuse the cache.

CLI

lexa index <path> [--db <path>]
lexa search <query> [--tier instant|dense|fast|deep|auto] [--limit N] [--json] [--db <path>]
lexa purge <path> [--db <path>]
lexa status [--db <path>]
lexa watch <path> [--db <path>]

Default DB is ~/.lexa/index.sqlite. --hash-embeddings swaps to the deterministic FNV-1a hash backend for tests / offline runs.

--json produces a stable JSON shape with path, line_start, line_end, score, excerpt, and a breakdown object exposing the RRF inputs, rerank score (deep only), and the routed tier (auto only).

MCP server

Add to your MCP client config (Claude Desktop / Claude Code / Cursor):

{
  "mcpServers": {
    "lexa": {
      "command": "lexa-mcp",
      "env": { "LEXA_DB": "/Users/you/.lexa/index.sqlite" }
    }
  }
}

Tools:

search_files(query, tier?, limit?)
index_path(path)
list_indexed_paths()
purge_path(path)
status()

stderr is the only log channel; stdout is reserved for the JSON-RPC stream so the protocol stays clean.

Library

[dependencies]
lexa-core = "0.1"

use lexa_core::{open, EmbeddingConfig, SearchOptions};

let mut db = open("/tmp/lexa.sqlite", EmbeddingConfig::default())?;
db.index_path("/path/to/repo")?;

let hits = db.search(&SearchOptions::new("hybrid retrieval implementation"))?;
for hit in hits {
    println!("{}:{}-{}  {:.4}  {}", hit.path, hit.line_start, hit.line_end, hit.score, hit.excerpt);
}

The default SearchOptions uses the auto tier and limit 10. Set tier: SearchTier::Deep and populate additional_queries for Exa-style multi-query deep search.

Lexa for Obsidian — 60 seconds to a working setup

Ask your Obsidian vault questions through Codex, Claude Desktop, Cursor, Claude Code, or any MCP client. Local-first — your notes never leave your machine, no API keys, no cloud round-trips.

curl -fsSL https://raw.githubusercontent.com/rishiskhare/lexa/main/scripts/install.sh | sh
lexa-obsidian setup

setup is interactive: it asks for your vault path, optionally pre-indexes (recommended for >1 000-note vaults), writes the right MCP config block into ~/.codex/config.toml (and Claude Desktop / Claude Code if you opt in), and drops an AGENTS.md in your vault root so agents route note questions through Lexa without having to be prompted with "Use lexa-obsidian.".

Restart your MCP client and try:

> what did I write about <some topic>?
> list my top 10 tags
> show me backlinks for "<some note name>"
> find notes similar to "<some note>"

The agent picks the right tool from the natural-language phrasing.

What the AI gets

When you ask	The AI calls	What it returns
"What did I write about X?"	`search_notes`	Top notes ranked by hybrid (BM25 + dense + reranker) score, with title, path, line range, headline excerpt, tags, and the routed tier.
"Show me my note titled Y"	`get_note`	Frontmatter + body + outgoing/incoming wiki-links + tags. Optionally a single block by `^id`.
"What links to Y?"	`find_backlinks`	Every linking note with the alias / header / block id used.
"Find notes similar to Y"	`get_similar`	Semantic neighbours of the seed note (excluding itself).
"What tags do I use most?"	`list_tags`	Top tags by usage, optional prefix filter.
"Re-index" / "drop the index"	`index_vault` / `purge_vault`	Maintenance.

Subcommands

lexa-obsidian setup            # interactive bootstrap (most users only need this)
lexa-obsidian doctor           # diagnose every common failure mode
lexa-obsidian models prefetch  # download retrieval models (~390 MB) ahead of time
lexa-obsidian --vault <path> index
lexa-obsidian --vault <path> status
lexa-obsidian --vault <path> tags [--prefix X] [--limit N]
lexa-obsidian --vault <path> backlinks <note>
lexa-obsidian --vault <path> search <query> [--tier auto|fast|deep] [--tag X] [--folder Y] [--json]
lexa-obsidian --vault <path> watch

--vault falls back to LEXA_OBSIDIAN_VAULT. The DB path defaults to ~/.lexa/obsidian-<sha-of-vault>.sqlite so two distinct vaults never share an index.

What gets parsed

Frontmatter (title:, aliases:, tags: + arbitrary custom fields preserved in note_metadata.raw_json). Stripped before embedding so it doesn't pollute the vector representation.
Wiki-links — [[Note]], [[Note|Alias]], [[Note#Header]], [[Note^block-id]], ![[Embed]]. Stored in note_links; backlinks are a single SQL JOIN.
Tags — frontmatter tags: (string, list, or comma-string) plus inline #tag (including nested #project/lexa). Lowercase- normalised. Code fences and heading lines are correctly skipped.
Block ids — trailing ^block-id markers persist into note_blocks and are queryable through get_note { block: "^abc" }.

Schema (sidecar tables in the same SQLite file)

note_metadata (doc_id PK, title, aliases_json, raw_json)
note_links    (id PK, src_doc_id, target_name, target_path, header, block_id, alias, kind)
note_tags     (doc_id, tag, PRIMARY KEY(doc_id, tag))
note_blocks   (chunk_id PK, doc_id, block_id)

ON DELETE CASCADE rides on documents.id, so purging a path cleans the sidecars automatically.

Indexing UX

Lexa indexes in the background as soon as the MCP server starts. While indexing is in flight, content-bearing tool calls (search_notes, get_note, get_similar, find_backlinks) return a fast {indexing: true, notes_seen, elapsed_seconds} payload instead of blocking — so Codex never appears hung. For large vaults (>1 000 notes) running lexa-obsidian setup once with the pre-index step (or lexa-obsidian index ahead of time) eliminates the wait entirely.

Privacy + threat model

100 % local. Network calls: model downloads on first run (Nomic v1.5 ONNX ~110 MB, BGE reranker ~280 MB), nothing after. No telemetry, no analytics, no API keys.
Read-only on your vault. The MCP server does not create, edit, or delete notes.
The MCP server only spawns the lexa-obsidian-mcp binary itself, never user-supplied subprocesses.
Verify yourself: tcpdump -i any host huggingface.co for ten minutes of usage shows zero traffic after the model cache is hot.

For more, see docs/FAQ.md and docs/adr/006-obsidian.md.

Benchmarks

Five harnesses, fully reproducible. Numbers below are warm-state on M-series macOS arm64, release build, real Nomic v1.5-Q. See docs/BENCHMARKS.md for hardware details, full methodology, and the date each number was measured.

Harness A — Latency

2 000 synthetic Markdown docs, 500 iterations / tier, fixed query set:

Tier	p50	p95	p99
instant	245 µs	840 µs	861 µs
dense	8.97 ms	9.82 ms	10.20 ms
fast	9.00 ms	9.92 ms	10.19 ms
deep	261 ms	298 ms	313 ms

Pairs with a Criterion bench (cargo bench -p lexa-bench --bench latency) and a CI gate that fails if fast-tier p50 > 400 ms on shared GitHub Actions runners.

Harness B — BEIR retrieval quality (SciFact, 100 queries)

Tier	nDCG@10	MRR@10	Recall@100	p50	p95
instant	0.6560	0.6184	0.8680	3 ms	5 ms
fast	0.6778	0.6395	0.8980	17 ms	22 ms
deep	0.7042	0.6674	0.8360	2.57 s	2.81 s

Hybrid lifts BM25-only by +2.2 nDCG points; deep adds another +2.6 nDCG on top, eliminating the previous deep-tier regression caused by unbounded reranker logits overriding RRF. Beats the published BEIR BM25 SciFact baseline (~0.665) at p95 < 25 ms on the fast tier.

Harness C — Agent quality (20 NL queries on this repo)

Tool	Tier	Correct	Accuracy	Median latency
`lexa` (Nomic)	auto	16 / 20	0.80	11 ms
`lexa` (Nomic)	fast	15 / 20	0.75	10 ms
`grep -rE`	external	0 / 20	0.00	8 ms

auto outperforms fast because the router sends single-identifier queries (vec_quantize_binary, LexaDb::open) straight to BM25-only instant, where exact-symbol lookups beat hybrid scoring.

Harness D — Head-to-head against external CLIs

Wraps any external command (grep, rg, qmd-cli, ...) and runs the same query set. Reports per-tool latency and match rate against expected file paths. See lexa-bench compare --help.

Harness E — SimpleQA-style LLM-as-judge

Mirrors Exa's evaluation methodology: hand-curated factual questions, scored on the five-dim rubric (relevance, authority, content_issues, evaluator_confidence, overall) in [0, 1].

# Local-first: judge is whatever's running in Ollama.
cargo run -p lexa-bench --release -- simpleqa \
  --queries bench/simpleqa/questions.json --corpus . \
  --tier auto --judge ollama --judge-model qwen3:8b \
  --real-embeddings --json bench-results/simpleqa.json

A deterministic --judge mock backend exists for CI smoke runs that need to verify wiring without a model download.

Reproducers

# Harness A — latency (writes JSON, gates CI)
cargo run -p lexa-bench --release -- latency \
  --db /tmp/lexa.sqlite --docs 2000 --iterations 500 \
  --real-embeddings --json bench-results/latency-nomic.json

# Harness A — Criterion (HTML reports under target/criterion/)
cargo bench -p lexa-bench --bench latency

# Harness B — BEIR
cargo run -p lexa-bench --release -- beir scifact --download \
  --db /tmp/lexa-scifact.sqlite --real-embeddings \
  --tiers instant,dense,fast,deep --max-queries 100 \
  --json bench-results/scifact.json

# Harness C — agent (auto tier on the lexa repo)
cargo run -p lexa-bench --release -- agent \
  --queries bench/agent/queries.json --corpus . \
  --tool lexa --tier auto --real-embeddings \
  --db /tmp/lexa-agent.sqlite \
  --json bench-results/agent-auto.json

# Harness D — head-to-head with grep
cargo run -p lexa-bench --release -- compare \
  --queries bench/agent/queries.json --corpus . \
  --command "grep -rEln {query} {corpus}/crates" --label grep \
  --json bench-results/compare-grep.json

# Harness E — SimpleQA (mock judge for CI)
cargo run -p lexa-bench --release -- simpleqa \
  --queries bench/simpleqa/questions.json --corpus . \
  --tier auto --judge mock --real-embeddings \
  --json bench-results/simpleqa.json

Project layout

lexa/
├── Cargo.toml                     # workspace
├── README.md                      # this file
├── crates/
│   ├── lexa-core/                 # library: chunking, embed, retrieval
│   ├── lexa-cli/                  # `lexa` binary
│   ├── lexa-mcp/                  # `lexa-mcp` rmcp stdio server
│   └── lexa-bench/                # `lexa-bench` — five harnesses
├── docs/
│   ├── ARCHITECTURE.md            # this is the design doc
│   ├── BENCHMARKS.md              # full benchmark methodology
│   └── adr/000–005-*.md           # one-page decisions
├── bench/
│   ├── agent/queries.json         # 20 NL queries against this repo
│   ├── agent/SKILL.md             # full agent-loop spec (Anthropic API)
│   └── simpleqa/questions.json    # SimpleQA seed set
├── bench-results/                 # committed JSON artifacts
└── tests/fixtures/sample/         # tiny corpus for tests

License

Dual-licensed under either of:

Apache License, Version 2.0 ([LICENSE-APACHE] or https://www.apache.org/licenses/LICENSE-2.0)
MIT license ([LICENSE-MIT] or https://opensource.org/licenses/MIT)

at your option.

lexa-core 0.1.0