# basemind
**Give your AI coding agent a brain for your repo.**
basemind is a code-map MCP server: it indexes your codebase into a queryable map
so AI coding agents — Claude Code, Cursor, Continue, anything that speaks
[MCP](https://modelcontextprotocol.io) — get instant semantic answers about your
code. **Where is this defined? Who calls it? When did it change? What's churning?**
Sub-millisecond queries. 300+ languages out of the box. Local-only. Built in Rust.
[](LICENSE)
[](https://crates.io/crates/basemind)
[](https://www.npmjs.com/package/basemind)
[](https://pypi.org/project/basemind/)
[](https://github.com/Goldziher/basemind/actions/workflows/ci.yml)
---
## Why your agent needs this
Today, agents read code by **grepping blind**. Ask Claude "who calls `parseQuery`?"
and it ripgreps the string — you get hits in docs, tests, comments, and 14 unrelated
files. The agent burns context filtering noise, then guesses.
LSPs are the semantic answer, but they're single-language, slow to start, and
useless across a polyglot monorepo.
**basemind is the missing layer.** One index, every language, semantic-quality answers
at grep speed — exposed to the agent over MCP as concrete tools (`find_callers`,
`find_references`, `outline`, `symbol_history`, `blame_symbol`, `hot_files`, …)
instead of "go grep again."
---
## 30-second setup
**Install** (pick one):
```bash
brew install Goldziher/tap/basemind # macOS, Linux
npm install -g basemind # any Node 14+ platform
pip install basemind # any Python 3.8+ platform
cargo install basemind --locked # build from source
```
**Index your repo:**
```bash
cd /path/to/your/repo
basemind scan
```
**Wire it into Claude Code** — drop this into `~/.claude.json` (or your project's
`.mcp.json`):
```json
{
"mcpServers": {
"basemind": {
"command": "basemind",
"args": ["serve"],
"cwd": "/abs/path/to/your/repo"
}
}
}
```
Done. Restart Claude Code, and your agent has eight code-map tools and twelve
git tools at its fingertips.
> Same JSON shape works for Cursor, Continue, Cline, and any other MCP client.
---
## What your agent gets
### Code-map tools
| `outline` | "Give me this file's structure" — symbols, line/col, signatures, imports. One call replaces five Reads. |
| `search_symbols` | "Find anything named `useAuth`" — substring match across every indexed symbol, kind-filterable. |
| `find_references` | "Where is `parseQuery` called?" — indexed call-site lookup. No regex noise. |
| `find_callers` | "Who calls `User.save()`?" — resolves the definition first, then scans. |
| `dependents` | "What imports this module?" — reverse import lookup. |
| `list_files` | "What files are in `src/auth/`?" — indexed path + language filters. |
| `status` | "What languages does this repo use?" — file count + language breakdown. |
| `repo_info` | Branch, HEAD, workdir at a glance. |
### Git-aware tools
| `symbol_history` | "When did `validateToken` actually change?" — tree-sitter × git, comment/format-stable diffs. |
| `blame_file` / `blame_symbol` | "Who wrote this and why?" — line-range or symbol-scoped blame. |
| `hot_files` | "What's been churning?" — top-K most-changed files in the last N commits. |
| `recent_changes` | "What changed recently on this branch?" |
| `commits_touching` | "Show me every commit that touched `auth.rs`." |
| `diff_outline` | "What symbols differ between `main` and `HEAD`?" — structural diff. |
| `diff_file` | "Give me the unified diff for `auth.rs` across these revs." |
| `working_tree_status` | "What's staged / unstaged / untracked right now?" |
Every tool returns JSON. Responses are capped (`limit`, default 100, max 1000) so
the agent's context doesn't explode.
---
## Performance
A 39 270-file TypeScript repo. Apple Silicon, release build:
| Cold scan (full index) | 12.4 s |
| Cached scan (no changes) | 1.6 s |
| MCP server startup | 3.1 s, 77 MB RSS |
| `status` query | 1.2 ms |
| `outline` (1571 symbols) | 1.9 ms |
| `search_symbols` | 1–3 ms |
| `find_references("spawn")` (tokio) | < 5 ms |
basemind preloads L1 outlines into RAM on `serve` start, so cross-file queries
are sub-millisecond. The Fjall LSM inverted index handles ref/caller lookups
without scanning blobs.
---
## Languages
**300+ tree-sitter grammars** ship via
[tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack).
basemind dynamically loads them on first use and caches them locally.
**First-class outlines** — full signatures, kinds, decorators, calls, imports,
docstrings — ship for:
> **Rust · Python · TypeScript · TSX · JavaScript · Go**
**Best-effort outlines** via the TSLP `tags.scm` fallback — covers ~100 grammars
including **Kotlin, C#, Swift, C++, Scala, Solidity, Lua, Ruby, PHP, Java**, …
Languages without an upstream `tags.scm` (JSON, YAML, TOML) still parse and
appear in `list_files`; they just don't expose symbols.
---
## Why basemind, specifically
- **Built for agents, not humans.** Every tool exists because an agent needs it,
not because it makes a cute terminal demo.
- **Semantic quality, grep speed.** Tree-sitter parses → content-addressed blobs →
Fjall LSM inverted index → sub-millisecond MCP responses.
- **Polyglot by default.** One index, every language. No LSP-per-language
zoo. No "we don't support that yet."
- **Local-only.** No SaaS. No telemetry. No cloud round-trip. Your code never
leaves the machine.
- **Deterministic.** Content-addressed blobs (blake3), stable hashes,
reproducible across machines.
- **Pure Rust.** One static binary. No Python runtime, no Node runtime, no JVM.
`basemind serve` adds < 80 MB to your agent's stack.
---
## CLI
basemind is also a CLI — useful for piping into shell tools, CI checks, or
just inspecting a repo without spinning up an MCP server.
```text
basemind init # write .basemind/basemind.toml with defaults
basemind scan # index the working tree
basemind scan --staged # index what's in git's staging area
basemind scan --rev <REV> # index a commit / branch / sha
basemind watch # long-running watcher; index on file change
basemind serve [--view <name>] # MCP stdio server for agents
basemind query outline <path> [--l2] # symbols, imports (+ docs/calls with --l2)
basemind query symbol <needle> [--kind K] # substring search across symbols
basemind query dependents <module> # reverse-lookup via imports
basemind hook install # install pre-commit hook (--staged scan)
basemind lang {list, install, clean} # manage downloaded tree-sitter grammars
basemind cache clear # drop .basemind/git-cache/
```
Global flags: `-q/--quiet`, `-v/--verbose`, `--no-color` (NO_COLOR honored).
---
## Architecture
A short tour. See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for the long
version.
- **Scanner** (`src/scanner.rs`) — rayon-parallel walker over the gitignore-aware
file set. Extracts L1 (symbols + imports), L2 (calls + docs), L3 (structural
hashes) per file.
- **Content-addressed blobs** (`src/store.rs`) — msgpack at
`.basemind/blobs/<blake3>.{l1,l2,l3}.msgpack`. Two files with identical content
share the same blob. Re-scan skips unchanged hashes.
- **Inverted index** (`src/index/`) — pure-Rust [Fjall](https://github.com/fjall-rs/fjall)
LSM keyspace at `.basemind/views/<view>/index.fjall/`. Six keyspaces drive
symbol search, reference lookup, dependents.
- **MCP surface** (`src/mcp/`) — stdio JSON-RPC via `rmcp`. Tool descriptions are
the routing surface for agents; semantics (substring vs prefix, scope-aware vs
name-only, capped) are stated honestly.
- **Git layer** (`src/git.rs`, `src/git_cache.rs`) — `gix`-backed blame, log,
diff, status. Sha-keyed disk cache (`.basemind/git-cache/`) makes warm queries
free.
### Views
A _view_ is a code map for a snapshot of the repo. Each view has its own index
under `.basemind/views/<view>/`; blobs are shared in `.basemind/blobs/`.
- **`working`** (default) — the on-disk working tree
- **`staged`** — git staging area; what's about to be committed
- **`rev-<sha7>`** — whatever you scanned with `basemind scan --rev <REV>`
They coexist — running one doesn't clobber the others. The pre-commit hook
installed by `basemind hook install` indexes `staged`, so the hook reflects
exactly what's being committed.
### Live refresh
Run `basemind watch` in one terminal and `basemind serve` in another: the server
watches the index, rebuilds its in-RAM map off-thread, and atomically swaps.
Queries reflect filesystem changes within ~150 ms with no `serve` restart.
---
## Hardening
basemind ships with a real-OSS hardening harness — 8 upstream repos (ripgrep,
tokio, microsoft/TypeScript, facebook/react, django, requests, gin, plus a
shallow ripgrep variant) cloned, scanned, and MCP-swept on every release. Canary
assertions catch regressions before they ship:
```sh
./scripts/harden.sh # ~10 minutes; produces /tmp/basemind-harden/results.ndjson
```
The harness is `#[ignore]`-gated from normal `cargo test`. Invoked nightly and
on-dispatch from CI.
---
## Development
```sh
git clone https://github.com/Goldziher/basemind && cd basemind
task setup # cargo fetch + prek install
task check # lint + test
task build # release binary
```
Pre-commit hooks via [prek](https://github.com/j178/prek) cover Rust
(`cargo fmt`/`clippy`/`sort`/`machete`/`deny`/`rustdoc-lint`), markdown, shell,
JSON/YAML/TOML, file-safety basics, and commit-message linting via
[gitfluff](https://github.com/Goldziher/gitfluff).
Contributing guidelines: see [`CONTRIBUTING.md`](CONTRIBUTING.md).
---
## License
[MIT](LICENSE).