Matryoshka
Matryoshka is a Rust-first code-intelligence layer for coding agents.
It prewarms a repository into a SQLite-backed map of:
- files
- folders
- symbols
- import and dependency edges
- rich file, folder, and repo cards
- semantic search records for files, snippets, symbols, folders, and the repo
- SQLite FTS records and late-interaction token vectors for hybrid retrieval
The goal is simple: let an agent search for behavior and read rich summaries before it falls back to full-file reads.
What It Does
Matryoshka currently ships these core workflows:
-
indexParse a repository, resolve structural relationships, generate rich cards, build semantic records, and persist everything into SQLite. -
updateRe-run the pipeline incrementally against a changed repository and refresh affected facts, cards, and semantic records. -
watchPoll a repository, run a startup freshness update, debounce changes, and triggerupdateautomatically. It can run in the foreground or as a daemon. -
prewarmRebuild FTS, warm retrieval paths, optionally ensure freshness, and optionally start the watcher. -
searchRun hybrid retrieval over persisted semantic records using embeddings plus FTS, lexical, symbol/path, late-interaction, ownership, intent, graph, and structural boosts. -
opRun task-shaped search for agent operations such asfind-symbol,edit-target,trace-dependency,architecture, andtests-for. -
readReturn a rich file card with folder context, interpreted imports, dependents, blast radius, and selected snippets. -
read-bundleSearch, pick a primary file, select related files, and return packed read context inbrief,edit, orflowmode. -
rebuild-semanticRebuild the semantic search layer from already-persisted facts and cards without reparsing or re-enriching the whole repository.
Architecture
The workspace is organized around focused Rust crates:
-
core-irShared facts, cards, semantic records, provenance, and API types. -
parserSource walking and symbol/import/snippet extraction. -
resolverFolder graph construction, import resolution, and dependency edge creation. -
store-sqliteCanonical persistence for facts, cards, semantic records, and invalidation. -
enricherRich file, folder, and repo card generation through MLX chat or heuristic fallback. -
embed-clientOpenAI-compatible embeddings client plus deterministic offline embedder. -
indexerFull prewarm, incremental refresh, and semantic repair orchestration. -
searchHybrid semantic search and reranking. -
read-apireadandread-bundleassembly. -
watcherPolling and debounce-based repo change detection. -
cliThematryoshka-rscommand surface.
Storage Model
SQLite is the source of truth.
Persisted tables include:
-
structural facts
- files
- folders
- symbols
- edges
-
enriched artifacts
- file cards
- folder cards
- repo card
-
retrieval layer
- semantic records
- FTS records
- late-interaction vectors
By default, Matryoshka stores operational state under:
<repo>/.matryoshka/
The default SQLite DB is:
<repo>/.matryoshka/matryoshka.db
Watcher daemon files live next to it:
<repo>/.matryoshka/watch.pid
<repo>/.matryoshka/logs/index.jsonl
<repo>/.matryoshka/logs/update.jsonl
<repo>/.matryoshka/logs/prewarm.jsonl
<repo>/.matryoshka/logs/semantic-rebuild.jsonl
<repo>/.matryoshka/logs/watch.jsonl
<repo>/.matryoshka/logs/watch.stdout.jsonl
This means semantic search can be rebuilt independently when embeddings or late pipeline stages fail.
Command summaries and JSONL logs include artifact health now: file/folder card summary coverage, sample paths with empty summaries, FTS row counts, embedded record counts, and late-interaction vector counts. If search results look too generic, check these logs first.
Local MLX Defaults
The non-offline path expects a local OpenAI-compatible MLX server:
- base URL:
http://127.0.0.1:44445 - API key:
2508 - embeddings model:
mlx-community--embeddinggemma-300m-bf16 - chat model:
MercuriusDream--Qwen3.5-4B-MLX-mxfp8 - oMLX reranker model:
mlx-community--Qwen3-Reranker-0.6B-mxfp8
Chat enrichment disables thinking by default.
You can override them per command with:
--model <chat-model>--embedding-model <embedding-model>
Quick Start
Examples below assume matryoshka-rs is on your PATH or exported from
target/debug. From this repo, prefix commands with:
Index a repo:
Prewarm retrieval:
Search it:
Read a file card:
Read a packed context bundle:
Repair only the semantic layer:
Docs
- See usage.md for commands, flags, and examples.
- See crates/README.md for crate-local notes.