Lexa

Local-first hybrid retrieval for your Obsidian vault and code. A single static Rust binary plus an MCP server, so Codex / Claude Desktop / Cursor / Claude Code can answer questions from your notes without anything leaving your machine.

Quick start

curl -fsSL https://raw.githubusercontent.com/rishiskhare/lexa/main/scripts/install.sh | sh
lexa-obsidian setup
# restart Codex / Claude Desktop / Cursor, then ask:
#   > what did I write about <topic>?

That's it. setup is interactive: it picks up your vault, optionally pre-indexes it, writes the right MCP config block into ~/.codex/config.toml (and Claude Desktop / Claude Code if you opt in), and drops an AGENTS.md in your vault root so agents route note questions through Lexa without the "Use lexa-obsidian." prefix.

Demo

> what did I write about the rate limiter redis fallback last quarter?

[
  {
    "path": "Daily/2026-04-30.md",
    "title": "Daily 2026-04-30",
    "score": 0.7141,
    "heading": "Followups",
    "excerpt": "redis down, switching to in-memory backoff in `acquire`. The reranker latency was acceptable at p95 of 261 ms.",
    "line_start": 12,
    "line_end": 18,
    "tags": ["daily", "project/lexa"],
    "breakdown": { "routed_to": "fast" }
  }
]

Want to try it without your own vault? Point Lexa at the bundled demo-vault/:

lexa-obsidian --vault ./demo-vault setup --no-codex --no-agents-md
lexa-obsidian --vault ./demo-vault --hash-embeddings search "reranker latency"

Features

Feature	What it gives you
Hybrid retrieval	BM25 (FTS5) + binary-quantized 768-d Matryoshka KNN (sqlite-vec) + cross-encoder rerank, fused with RRF (k=60).
Five tiers	`instant` / `dense` / `fast` / `deep` / `auto` — mirroring Exa's API. p50 ~9 ms on the fast tier with real Nomic v1.5-Q.
Auto-routing	Single-identifier queries → `instant` (BM25); long question-form → `deep`; default → `fast`. No "Use lexa-obsidian." needed.
Obsidian-native	Frontmatter stripped before embedding. Wiki-links, inline `#tags`, `^block-ids`, and `![[embeds]]` parsed into sidecar tables.
MCP-first	Eight tools — `search_notes`, `find_backlinks`, `list_tags`, `get_note`, `get_similar`, `index_vault`, `purge_vault`, `vault_status`.
Background indexing	The MCP server indexes in the background; content calls return `{indexing: true, ...}` while in-flight, so Codex never hangs.
Live re-indexing	Built-in `notify-debouncer-mini` watcher: notes you add, edit, or delete in Obsidian appear in (or vanish from) search within ~500 ms. Idempotent on every event.
One static binary	No daemon, no Python, no Docker. SQLite + `sqlite-vec` is the entire backend.
100% local	First run downloads Nomic + BGE ONNX (~390 MB). After that, zero network calls. No telemetry, no API keys.
`--offline` flag	Sets `HF_HUB_OFFLINE=1` so fastembed refuses every network fetch — hard offline guarantee after `models prefetch`.

Use cases

Want to...	Do this
Ask Codex / Claude / Cursor about your vault	`lexa-obsidian setup` and ask in plain English.
Search a code repo from the CLI	`lexa index ~/repo && lexa search "rate limiter fallback"`.
Wire MCP search into a custom agent	Use `lexa-obsidian-mcp` (rmcp stdio) or `lexa-mcp` (file-shaped).
Embed retrieval into your own Rust app	`lexa_obsidian::LexaObsidianDb::open(...)` — see the crate docs.
Try it without your own data	`lexa-obsidian --vault ./demo-vault setup --no-codex --no-agents-md`.

Subcommands

lexa-obsidian setup            # interactive bootstrap (most users only need this)
lexa-obsidian doctor           # diagnose every common failure mode
lexa-obsidian models prefetch  # download retrieval models (~390 MB) ahead of time
lexa-obsidian --vault <path> index
lexa-obsidian --vault <path> status
lexa-obsidian --vault <path> tags [--prefix X] [--limit N]
lexa-obsidian --vault <path> backlinks <note>
lexa-obsidian --vault <path> search <query> [--tier auto|fast|deep] [--tag X] [--folder Y] [--json]
lexa-obsidian --vault <path> watch

--vault falls back to LEXA_OBSIDIAN_VAULT. The DB defaults to ~/.lexa/obsidian-<sha-of-vault>.sqlite — two distinct vaults never share an index.

Other ways to install

# Cargo (any Rust target, including Linux ARM64):
cargo install lexa-obsidian

# Homebrew (macOS) — coming soon, file an issue if you want it sooner.

# Manual: grab a tarball from
# https://github.com/rishiskhare/lexa/releases/latest

Project layout

lexa/
├── crates/
│   ├── lexa-core/        # Library: chunking, embedding, hybrid retrieval, fusion.
│   ├── lexa-cli/         # `lexa` CLI for any file tree (code, docs).
│   ├── lexa-mcp/         # `lexa-mcp` rmcp stdio server (file-shaped tools).
│   ├── lexa-obsidian/    # `lexa-obsidian` CLI + `lexa-obsidian-mcp` server.
│   └── lexa-bench/       # Five reproducible benchmark harnesses.
├── docs/
│   ├── ARCHITECTURE.md   # Crate map, schema, retrieval pipeline.
│   ├── BENCHMARKS.md     # Latency / BEIR / agent / SimpleQA numbers.
│   ├── FAQ.md            # First-run latency, vault switching, uninstall, privacy.
│   ├── THREAT_MODEL.md   # Read-only-on-vault guarantee, network footprint.
│   └── adr/              # Six one-page architecture decision records.
├── bench/                # Hand-curated query sets for the agent + SimpleQA harnesses.
├── bench-results/        # Committed JSON artifacts so every published number is reproducible.
├── demo-vault/           # 6-note synthetic vault — try Lexa without your own data.
├── templates/AGENTS.md   # Dropped into your vault root by `setup`.
└── scripts/install.sh    # `curl | sh` installer.

Documentation

Architecture — docs/ARCHITECTURE.md
Benchmarks — docs/BENCHMARKS.md: real numbers with hardware, exact corpus URL, and the command that produces them.
FAQ — docs/FAQ.md: first-run delay, vault switching, uninstall, model swap, privacy verification with tcpdump.
Threat model — docs/THREAT_MODEL.md: what Lexa does, what it does not do, and how to verify each claim.
Decision log — docs/adr/: six ADRs covering name, storage, embeddings, chunking, search tiers, MCP posture, Obsidian adapter.

How Lexa maps to Exa

Lexa is local-first, but the architecture follows Exa's: tiered search, hybrid retrieval, Matryoshka prefixes, binary quantization, query-aware highlights.

Exa concept	Lexa equivalent
Instant tier (<200 ms, BM25)	`--tier instant` — FTS5 BM25 only, p50 ~250 µs.
Fast tier (neural / hybrid)	`--tier dense` (KNN-only) or `--tier fast` (hybrid + RRF), p50 ~9 ms.
Auto tier	`--tier auto` — query router in `classify_query`. Default.
Deep tier (agentic)	`--tier deep` + `SearchOptions::additional_queries` for `additionalQueries`-style fan-out.
Hybrid retrieval (BM25 + dense)	RRF (k=60) over FTS5 BM25 and binary-quantized vector KNN, run concurrently. See Exa: Composing a Search Engine.
Matryoshka prefix	Nomic v1.5-Q (768d, MRL-trained at `{64, 128, 256, 512, 768}`) — `vectors_bin_preview bit[256]` table. See Exa 2.0.
Binary quantization	`sqlite-vec`'s `vec_quantize_binary()` and `bit[N]` columns; SIMD Hamming distance.
Cross-encoder reranking	`BAAI/bge-reranker-base` over top-15 fused candidates, sigmoid-blended at α = 0.7 with the RRF score.
Highlights / contents API	`search.rs::highlight` — query-token-overlap-scored sentence span.
LLM-as-judge eval	`lexa-bench simpleqa` — Harness E. Five-dim rubric. Default judge `qwen3:8b` via local Ollama.

Privacy

Lexa runs entirely locally. The only outbound network call is the first-run download of two ONNX models (Nomic v1.5-Q ~110 MB, BGE reranker ~280 MB) from Hugging Face. After that — zero network. No telemetry, no analytics, no API keys.

For a hard offline guarantee, run lexa-obsidian models prefetch once on a connected machine, then use --offline (or LEXA_OFFLINE=1) to make fastembed refuse every network call. See docs/THREAT_MODEL.md for the verification recipe (tcpdump-style proof) and the full posture.

Development

git clone https://github.com/rishiskhare/lexa
cd lexa
cargo build --workspace --release
cargo test --workspace --release
cargo clippy --workspace --all-targets --release -- -D warnings
cargo fmt --all -- --check

47 tests across 11 suites, no unsafe outside the SQLite extension loader, all hot paths covered by Criterion benches in crates/lexa-bench/benches/.

Contributing

Issues and PRs welcome. The decision log under docs/adr/ is the best place to start if you want to understand the design before changing it. For larger changes, open an issue first so the architectural fit can be discussed.

License

Dual-licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT license (LICENSE-MIT)

at your option.

lexa-core 0.1.1