Lexa
Local-first hybrid retrieval for your Obsidian vault and code. A single static Rust binary plus an MCP server, so Codex / Claude Desktop / Cursor / Claude Code can answer questions from your notes without anything leaving your machine.
Quick start
|
# restart Codex / Claude Desktop / Cursor, then ask:
# > what did I write about <topic>?
That's it. setup is interactive: it picks up your vault, optionally
pre-indexes it, writes the right MCP config block into
~/.codex/config.toml (and Claude Desktop / Claude Code if you opt
in), and drops an AGENTS.md in your vault root so agents route note
questions through Lexa without the "Use lexa-obsidian." prefix.
Demo
> what did I write about the rate limiter redis fallback last quarter?
Want to try it without your own vault? Point Lexa at the bundled
demo-vault/:
Features
| Feature | What it gives you |
|---|---|
| Hybrid retrieval | BM25 (FTS5) + binary-quantized 768-d Matryoshka KNN (sqlite-vec) + cross-encoder rerank, fused with RRF (k=60). |
| Five tiers | instant / dense / fast / deep / auto — mirroring Exa's API. p50 ~9 ms on the fast tier with real Nomic v1.5-Q. |
| Auto-routing | Single-identifier queries → instant (BM25); long question-form → deep; default → fast. No "Use lexa-obsidian." needed. |
| Obsidian-native | Frontmatter stripped before embedding. Wiki-links, inline #tags, ^block-ids, and ![[embeds]] parsed into sidecar tables. |
| MCP-first | Eight tools — search_notes, find_backlinks, list_tags, get_note, get_similar, index_vault, purge_vault, vault_status. |
| Background indexing | The MCP server indexes in the background; content calls return {indexing: true, ...} while in-flight, so Codex never hangs. |
| Live re-indexing | Built-in notify-debouncer-mini watcher: notes you add, edit, or delete in Obsidian appear in (or vanish from) search within ~500 ms. Idempotent on every event. |
| One static binary | No daemon, no Python, no Docker. SQLite + sqlite-vec is the entire backend. |
| 100% local | First run downloads Nomic + BGE ONNX (~390 MB). After that, zero network calls. No telemetry, no API keys. |
--offline flag |
Sets HF_HUB_OFFLINE=1 so fastembed refuses every network fetch — hard offline guarantee after models prefetch. |
Use cases
| Want to... | Do this |
|---|---|
| Ask Codex / Claude / Cursor about your vault | lexa-obsidian setup and ask in plain English. |
| Search a code repo from the CLI | lexa index ~/repo && lexa search "rate limiter fallback". |
| Wire MCP search into a custom agent | Use lexa-obsidian-mcp (rmcp stdio) or lexa-mcp (file-shaped). |
| Embed retrieval into your own Rust app | lexa_obsidian::LexaObsidianDb::open(...) — see the crate docs. |
| Try it without your own data | lexa-obsidian --vault ./demo-vault setup --no-codex --no-agents-md. |
Subcommands
lexa-obsidian setup # interactive bootstrap (most users only need this)
lexa-obsidian doctor # diagnose every common failure mode
lexa-obsidian models prefetch # download retrieval models (~390 MB) ahead of time
lexa-obsidian --vault <path> index
lexa-obsidian --vault <path> status
lexa-obsidian --vault <path> tags [--prefix X] [--limit N]
lexa-obsidian --vault <path> backlinks <note>
lexa-obsidian --vault <path> search <query> [--tier auto|fast|deep] [--tag X] [--folder Y] [--json]
lexa-obsidian --vault <path> watch
--vault falls back to LEXA_OBSIDIAN_VAULT. The DB defaults to
~/.lexa/obsidian-<sha-of-vault>.sqlite — two distinct vaults never
share an index.
Other ways to install
# Cargo (any Rust target, including Linux ARM64):
# Homebrew (macOS) — coming soon, file an issue if you want it sooner.
# Manual: grab a tarball from
# https://github.com/rishiskhare/lexa/releases/latest
Project layout
lexa/
├── crates/
│ ├── lexa-core/ # Library: chunking, embedding, hybrid retrieval, fusion.
│ ├── lexa-cli/ # `lexa` CLI for any file tree (code, docs).
│ ├── lexa-mcp/ # `lexa-mcp` rmcp stdio server (file-shaped tools).
│ ├── lexa-obsidian/ # `lexa-obsidian` CLI + `lexa-obsidian-mcp` server.
│ └── lexa-bench/ # Five reproducible benchmark harnesses.
├── docs/
│ ├── ARCHITECTURE.md # Crate map, schema, retrieval pipeline.
│ ├── BENCHMARKS.md # Latency / BEIR / agent / SimpleQA numbers.
│ ├── FAQ.md # First-run latency, vault switching, uninstall, privacy.
│ ├── THREAT_MODEL.md # Read-only-on-vault guarantee, network footprint.
│ └── adr/ # Six one-page architecture decision records.
├── bench/ # Hand-curated query sets for the agent + SimpleQA harnesses.
├── bench-results/ # Committed JSON artifacts so every published number is reproducible.
├── demo-vault/ # 6-note synthetic vault — try Lexa without your own data.
├── templates/AGENTS.md # Dropped into your vault root by `setup`.
└── scripts/install.sh # `curl | sh` installer.
Documentation
- Architecture —
docs/ARCHITECTURE.md - Benchmarks —
docs/BENCHMARKS.md: real numbers with hardware, exact corpus URL, and the command that produces them. - FAQ —
docs/FAQ.md: first-run delay, vault switching, uninstall, model swap, privacy verification withtcpdump. - Threat model —
docs/THREAT_MODEL.md: what Lexa does, what it does not do, and how to verify each claim. - Decision log —
docs/adr/: six ADRs covering name, storage, embeddings, chunking, search tiers, MCP posture, Obsidian adapter.
How Lexa maps to Exa
Lexa is local-first, but the architecture follows Exa's: tiered search, hybrid retrieval, Matryoshka prefixes, binary quantization, query-aware highlights.
| Exa concept | Lexa equivalent |
|---|---|
| Instant tier (<200 ms, BM25) | --tier instant — FTS5 BM25 only, p50 ~250 µs. |
| Fast tier (neural / hybrid) | --tier dense (KNN-only) or --tier fast (hybrid + RRF), p50 ~9 ms. |
| Auto tier | --tier auto — query router in classify_query. Default. |
| Deep tier (agentic) | --tier deep + SearchOptions::additional_queries for additionalQueries-style fan-out. |
| Hybrid retrieval (BM25 + dense) | RRF (k=60) over FTS5 BM25 and binary-quantized vector KNN, run concurrently. See Exa: Composing a Search Engine. |
| Matryoshka prefix | Nomic v1.5-Q (768d, MRL-trained at {64, 128, 256, 512, 768}) — vectors_bin_preview bit[256] table. See Exa 2.0. |
| Binary quantization | sqlite-vec's vec_quantize_binary() and bit[N] columns; SIMD Hamming distance. |
| Cross-encoder reranking | BAAI/bge-reranker-base over top-15 fused candidates, sigmoid-blended at α = 0.7 with the RRF score. |
| Highlights / contents API | search.rs::highlight — query-token-overlap-scored sentence span. |
| LLM-as-judge eval | lexa-bench simpleqa — Harness E. Five-dim rubric. Default judge qwen3:8b via local Ollama. |
Privacy
Lexa runs entirely locally. The only outbound network call is the first-run download of two ONNX models (Nomic v1.5-Q ~110 MB, BGE reranker ~280 MB) from Hugging Face. After that — zero network. No telemetry, no analytics, no API keys.
For a hard offline guarantee, run lexa-obsidian models prefetch once
on a connected machine, then use --offline (or LEXA_OFFLINE=1) to
make fastembed refuse every network call. See
docs/THREAT_MODEL.md for the verification
recipe (tcpdump-style proof) and the full posture.
Development
47 tests across 11 suites, no unsafe outside the SQLite extension
loader, all hot paths covered by Criterion benches in
crates/lexa-bench/benches/.
Contributing
Issues and PRs welcome. The decision log under
docs/adr/ is the best place to start if you want to
understand the design before changing it. For larger changes, open an
issue first so the architectural fit can be discussed.
License
Dual-licensed under either of:
- Apache License, Version 2.0 (
LICENSE-APACHE) - MIT license (
LICENSE-MIT)
at your option.