basemind 0.2.5

Full AI context layer over MCP — tree-sitter code-map, document RAG (PDF/Office/HTML/email + OCR + reranker), shared agent memory, on-demand web crawl, git history + blame + per-symbol diff. 300+ languages, 8 coding-agent harnesses, content-addressed Fjall + LanceDB.
docs.rs failed to build basemind-0.2.5
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

basemind

The full context layer for coding agents. One MCP server that turns any repository into a queryable code map, document library, and shared memory — so agents navigate by structure and search instead of burning context on grep and file reads.

Code map & search across 300+ languages · document processing for 90+ file formats · semantic + full-text search · git history & blame · shared agent memory · on-demand web crawl

crates.io npm PyPI CI License: MIT

Pillars · Tools · Quickstart · Performance · Install


The four pillars

Code — Tree-sitter outlines, symbol search, reference + caller + implementation graphs, call chains, git history per symbol, blame at symbol-level resolution.

Documents — Ingest + semantic search over PDFs, Office (Word/Excel/iWork), HTML, email, archives. Built-in OCR, layout detection, keyword + NER extraction, cross-encoder reranking. All ONNX bundled — no system install needed.

Memory — Per-repo scoped key-value + semantic vector storage. Clones of the same git origin automatically share memory; unrelated repos isolated.

Web — On-demand HTTP scrape + follow-link crawl. Pages chunk, embed, and land in the documents store under scope web:<host> for unified search.


Context economy

basemind tools return paths, line numbers, and signatures — not file bodies — so a structural answer costs a fraction of the tokens of reading source. The plugin ships this as the agent's default operating discipline (carried in the MCP server instructions, the basemind skill, and the SessionStart hook):

  • outline a file before opening it — then read only the span you need.
  • search_symbols instead of grep/rg for a definition.
  • find_references / find_callers instead of grepping call sites.
  • workspace_grep instead of shelling out to ripgrep for regex over content.
  • rescan after edits instead of reconnecting the server.
  • Don't re-read a file basemind already mapped.

The plugin also ships a PreToolUse guard hook that reaches the agent at the moment it reaches for search: by default (BASEMIND_GUARD=nudge) it points Grep/Glob calls at the matching basemind tool, once per session. Set BASEMIND_GUARD=redirect to enforce it (the call is blocked with a pointer to the basemind tool) or BASEMIND_GUARD=off to disable.

The live statusline surfaces the payoff: estimated tokens saved vs a grep + read baseline.


Feature table

Pillar What it does MCP tools Backend
Code intelligence Outlines, symbol search, refs/callers/callees, call graphs, impl lookup, dependents, in-tree regex outline, search_symbols, workspace_grep, find_references, find_callers, call_graph, find_implementations, dependents, list_files, status, repo_info tree-sitter × 300+ langs · Fjall LSM index · content-addressed blob store
Git intelligence Symbol-level history, blame, churn, recent changes, structural diffs across revs symbol_history, blame_file, blame_symbol, hot_files, recent_changes, commits_touching, find_commits_by_path, diff_outline, diff_file, working_tree_status gix + sha-keyed disk cache
Document RAG Ingest + semantic search over 90+ file formats — PDFs, Office (Excel/Word/HWP/iWork), HTML, XML, email, archives, images. Adds OCR (Tesseract + PaddleOCR), cross-encoder reranker, keyword extraction (YAKE/RAKE), NER (gline-rs ONNX + LLM), extractive + abstractive summarization, layout detection, page auto-rotate, redaction, language detection. All ONNX models bundled — no system install needed. search_documents kreuzberg + LanceDB
Shared memory Per-repo scoped key-value + semantic memory. Clones of the same git origin URL automatically share memory; unrelated repos isolated. memory_put, memory_get, memory_list, memory_search, memory_delete LanceDB + Fjall, scope-keyed
Web crawl On-demand HTTP scrape + link-following crawl. Crawled pages route through the documents pipeline (chunk → embed → LanceDB) under scope web:<host>. web_scrape, web_crawl, web_map kreuzcrawl (native HTTP, no chromium)
Admin Live rescan + telemetry dashboard rescan, telemetry_summary

Quickstart

Claude Code

These are two separate steps — run both, in order:

/plugin marketplace add Goldziher/basemind   # 1. register the marketplace (makes the plugin available)
/plugin install basemind@basemind            # 2. install the plugin (registers the MCP server)

Adding the marketplace alone does not give you any tools — it only makes the plugin available to install. You must run the second command (or pick Install for the basemind plugin in the /plugin menu) to register the MCP server. If no basemind tools appear after a restart, you almost certainly stopped after step 1; open /plugin, go into the basemind marketplace, and Install the plugin.

Restart the session after installing. The basemind binary installs automatically on first use (via npx, uvx, or direct download with checksum verification) — no manual cargo install needed.

Statusline

To enable the live statusline, run /bm-statusline once. This is a one-time opt-in because Claude Code plugins cannot set the main statusline — it is a platform limitation, not a basemind choice:

  • The plugin manifest (plugin.json) has no statusLine field.
  • A plugin-shipped settings.json honors only agent and subagentStatusLine; any statusLine key is silently ignored.
  • Hooks communicate via stdout/stderr/exit codes only — a SessionStart hook cannot write to ~/.claude/settings.json, so it can only nudge you to run /bm-statusline.

/bm-statusline works because Claude (the agent) performs the settings edit on your behalf, writing an absolute path into ~/.claude/settings.json ($HOME/~ are not expanded in the statusLine command field). After that it persists across sessions.

Output: ◆ basemind ● 1,247 files · 23m ago │ 47 calls · 14k saved. Counts render bright; the state dot is green (serve active / scan < 1 h), amber (idle or scan 1–24 h), or red (no serve and stale index). When a document/memory/web index is present, a third segment appears: │ 312 docs · 18 mem · 4 sites. Narrow terminals collapse to ◆ basemind ● 1.2k · 23m │ 47c · 14k saved.

Any MCP client

cargo install basemind --features full --locked

Then add to your MCP config:

{
  "mcpServers": {
    "basemind": {
      "command": "basemind",
      "args": ["serve"]
    }
  }
}

Supported harnesses: Claude Code · Cursor · Codex (CLI + App) · Gemini · OpenCode · Factory Droid · GitHub Copilot CLI · Continue · Cline. Each harness has install instructions in the Installation section below.

CLI only

basemind scan                     # index the working tree
basemind query outline path/file.rs  # inspect structure
basemind query symbol "parseQuery"   # find by name
basemind watch                    # live re-index on file change

Why basemind, specifically

vs grep / ripgrep

What ripgrep does well: blazing-fast line matching. What it misses:

  • Grep returns 50+ hits in docs, tests, comments, variable names — agent wastes context filtering noise.
  • No scope awareness: parseQuery() and parseQuery string both match; semantic signals lost.
  • Every query re-scans the disk; no pre-computed structures to leverage.

basemind: semantic-quality answers at grep speed via tree-sitter + indexed call sites.

vs vector-only RAG (LangChain / LlamaIndex DIY stacks)

What vector RAG does well: fuzzy document semantic search. What it misses:

  • Pure embeddings lose exact structure — which function calls which, which class implements which interface.
  • No line/column resolution — agent can't map vector hits back to code symbols.
  • No git history integration — "what changed recently?" and "who wrote this?" require separate systems.

basemind: code structure + git history + vector memory + document search all in one, unified scope.

vs context7 / openai-codex / Aider's repo-map

What these do well: generate code-map summaries. What they miss:

  • Static snapshots — stale after the first edit.
  • No semantic indexing — every lookup re-parses or re-scans.
  • Human-focused output (markdown) instead of agent-facing structure (JSON tools).

basemind: live-updated index with sub-millisecond MCP tools, built for agents not humans.

vs GitHub native search

What GitHub does well: repository-wide fuzzy text search. What it misses:

  • Cloud-only — your code leaves the machine, latency is network-bound.
  • No local-editor integration — agent can't query in-progress edits before commit.
  • No cross-language polyglot support — each language's search tuned separately.

basemind: local-only, always-fresh index of your working tree, 300+ languages in one sweep.


Performance

Measured on Apple Silicon, release build, --features full, default eager_l2 = true. Cold filesystem cache adds ~50% to first scan; numbers below are warm steady-state.

Scan throughput

Repo Files Language mix Time
tokio 859 Rust 0.2 s
react 7 061 TS / JSX 2.2 s
django 7 061 Python 2.5 s
requests 2 195 Python 0.7 s
gin 1 217 Go 1.0 s
ripgrep 12 851 Rust 4.0 s
ripgrep-shallow 12 851 Rust 0.16 s
TypeScript compiler 81 324 TS / JS / JSON ~22 s

The TypeScript compiler is the worst case — 81k files scanned in 22 seconds. Most real repos sit between tokio and ripgrep. Re-scans skip unchanged content hashes, so warm rescans on edited working trees are typically dominated by the changed-set size, not repo size.

Per-tool MCP latency

Against the 81k-file TypeScript index:

Latency Tools
< 1 ms outline, list_files, find_references, find_callers, find_implementations, hot_files, repo_info
3–6 ms search_symbols, call_graph
4–10 ms recent_changes, commits_touching, find_commits_by_path, symbol_history, diff_outline, diff_file
20–25 ms status
30–40 ms blame_file, blame_symbol
40–200 ms workspace_grep
~200 ms search_documents
350–600 ms working_tree_status

basemind preloads L1 outlines into RAM on serve start, so code-map queries hit no disk. The Fjall LSM inverted index handles ref/caller/impl lookups without scanning blobs. Git tools track gix walk cost; Fjall-backed tools dominate only on enormous histories.


Configuration

Full config lives at schema/basemind-config-v1.schema.json. Minimal example:

# .basemind/basemind.toml
file_watch_glob = "**/*.{rs,ts,tsx,py,go}"
eager_l2 = true

[documents]
enabled = true

Per-query MCP overrides:

{
  "query": "what does kreuzberg do?",
  "reranker_enabled": true,
  "reranker_preset": "bge-reranker-base"
}

Environment variables map mechanically: --llm-api-keyBASEMIND_LLM_API_KEY. Every MCP tool accepts per-query overrides that win over file/env/CLI layers.


Installation

Channel Command Platforms Features
Homebrew brew install Goldziher/tap/basemind macOS, Linux base
npm npm install -g basemind any Node 14+ platform base
pip pip install basemind any Python 3.8+ platform base
cargo cargo install basemind --locked any Rust platform base
cargo (full) cargo install basemind --features full --locked any Rust platform documents + memory + crawl
GH releases Download binary from releases macOS · Linux · Windows base
Harness Install command
Claude Code /plugin marketplace add Goldziher/basemind then /plugin install basemind@basemind
Cursor See Cursor docs for plugin install flow; basemind manifest at .cursor-plugin/plugin.json
Codex CLI /plugins then search for basemind
Codex App Plugins panel → Coding category → basemind → +
Gemini CLI gemini extensions install https://github.com/Goldziher/basemind
OpenCode Add { "plugin": ["basemind-opencode@latest"] } to opencode.json
Factory Droid droid plugin --help (manifest at .claude-plugin/marketplace.json)
GitHub Copilot CLI copilot plugin --help (same manifest)
Generic MCP See "Any MCP client" section above