<div align="center">
# indxr
**A fast codebase indexer and MCP server for AI coding agents.**
[](https://github.com/bahdotsh/indxr/actions/workflows/ci.yml)
[](https://crates.io/crates/indxr)
[](LICENSE)
</div>
AI coding agents waste thousands of tokens reading entire source files just to understand what's in them. indxr gives agents a structural map of your codebase — declarations, imports, relationships, and dependency graphs — so they can query for exactly what they need at a fraction of the token cost.
---
## Features
- **27 languages** — tree-sitter AST parsing for 8 languages, regex extraction for 19 more
- **18-tool MCP server** — live codebase queries over JSON-RPC: symbol lookup, file summaries, caller tracing, signature search, and more
- **Token-aware** — progressive truncation to fit context windows, ~5x reduction vs reading full files
- **Git structural diffing** — declaration-level diffs (`+` added, `-` removed, `~` changed) against any git ref
- **Dependency graphs** — file and symbol dependency visualization as DOT, Mermaid, or JSON
- **File watching** — continuous re-indexing as you edit, via `indxr watch` or `indxr serve --watch`
- **One-command agent setup** — `indxr init` configures Claude Code, Cursor, and Windsurf with MCP, instruction files, and hooks
- **Incremental caching** — mtime + xxh3 content hashing, sub-20ms indexing for most projects
- **Composable filters** — by path, kind, symbol name, visibility, and language
- **Three output formats** — Markdown (default), JSON, YAML at three detail levels
## Install
```bash
cargo install indxr
```
Or build from source:
```bash
git clone https://github.com/bahdotsh/indxr.git
cd indxr && cargo build --release
```
## Usage
```bash
indxr # index cwd → stdout
indxr ./my-project -o INDEX.md # index project → file
indxr -f json -l rust,python -o index.json # JSON, filter by language
indxr serve ./my-project # start MCP server
indxr serve ./my-project --watch # MCP server with auto-reindex
indxr watch ./my-project # watch & keep INDEX.md updated
indxr init # set up all agent configs
```
## Agent Setup
```bash
indxr init # set up for all agents
indxr init --claude # Claude Code only
indxr init --cursor # Cursor only
indxr init --windsurf # Windsurf only
```
| Claude Code | `.mcp.json`, `CLAUDE.md`, `.claude/settings.json` (PreToolUse hooks) |
| Cursor | `.cursor/mcp.json`, `.cursorrules` |
| Windsurf | `.windsurf/mcp.json`, `.windsurfrules` |
| All | `.gitignore` entry, `INDEX.md` (static index) |
Agents don't always pick MCP tools over file reads on their own. `indxr init` sets up reinforcement — PreToolUse hooks intercept `Read`/`Bash` calls and instruction files teach the exploration workflow.
## MCP Server
JSON-RPC 2.0 over stdin/stdout, 18 tools:
| `search_relevant` | Multi-signal relevance search across paths, names, signatures, and docs |
| `lookup_symbol` | Find declarations by name (case-insensitive substring) |
| `explain_symbol` | Signature, doc comment, relationships, metadata — no body |
| `get_file_summary` | Complete file overview without reading it |
| `batch_file_summaries` | Summarize multiple files in one call |
| `get_file_context` | File summary + reverse dependencies + related files |
| `get_public_api` | Public declarations with signatures for a file or directory |
| `get_callers` | Find who references a symbol across all files |
| `get_related_tests` | Find test functions by naming convention |
| `list_declarations` | List declarations in a file with optional filters |
| `search_signatures` | Search functions by signature pattern |
| `read_source` | Read source by symbol name or line range |
| `get_token_estimate` | Estimate tokens before reading |
| `get_tree` | Directory/file tree |
| `get_imports` | Import statements for a file |
| `get_stats` | File count, line count, language breakdown |
| `get_diff_summary` | Structural changes since a git ref |
| `regenerate_index` | Re-index and update INDEX.md |
List tools support `compact` mode for ~30% token savings. See [MCP Server docs](docs/mcp-server.md) for full parameter details.
## Output
Default format is Markdown at `signatures` detail level:
```markdown
# Codebase Index: my-project
> Generated: 2025-03-23 | Files: 42 | Lines: 8,234
> Languages: Rust (28), Python (10), TypeScript (4)
## Directory Structure
src/
main.rs
parser/
mod.rs
rust.rs
## src/main.rs
**Declarations:**
`pub fn main() -> Result<()>`
`pub struct App`
```
| `summary` | Directory tree + file list |
| `signatures` (default) | + declarations, imports |
| `full` | + doc comments, line numbers, body counts, metadata, relationships |
## Filtering
```bash
indxr --filter-path src/parser # subtree
indxr --kind function --public-only # public functions only
indxr --symbol "parse" # symbol name search
indxr -l rust,python # language filter
indxr --filter-path src/model --kind struct --public-only # combine
```
All filters compose. `--kind` accepts: `function`, `struct`, `class`, `trait`, `enum`, `interface`, `module`, `method`, `constant`, `impl`, `type`, `namespace`, `macro`, and more.
## Git Structural Diffing
```bash
indxr --since main
indxr --since v1.0.0
indxr --since HEAD~5
```
```
## Modified Files
### src/parser/mod.rs
+ `pub fn new_parser() -> Parser`
- `fn old_helper()`
~ `fn process(x: i32)` → `fn process(x: i32, y: i32)`
```
Markers: `+` added, `-` removed, `~` signature changed.
## Dependency Graph
```bash
indxr --graph dot # file-level DOT graph
indxr --graph mermaid # file-level Mermaid diagram
indxr --graph json # JSON graph
indxr --graph dot --graph-level symbol # symbol-level graph
indxr --graph mermaid --filter-path src/mcp # scoped to a directory
indxr --graph dot --graph-depth 2 # limit to 2 hops
```
| `file` (default) | File-to-file import relationships |
| `symbol` | Symbol-to-symbol relationships (trait impls, method calls) |
## Token Budget
```bash
indxr --max-tokens 4000
```
Truncation order: doc comments → private declarations → children → least-important files. Directory tree and public API surface are preserved first.
## Languages
8 tree-sitter (full AST) + 19 regex (structural extraction):
| tree-sitter | Rust, Python, TypeScript/TSX, JavaScript/JSX, Go, Java, C, C++ |
| regex | Shell, TOML, YAML, JSON, SQL, Markdown, Protobuf, GraphQL, Ruby, Kotlin, Swift, C#, Objective-C, XML, HTML, CSS, Gradle, CMake, Properties |
Detection is by file extension. Full details: [docs/languages.md](docs/languages.md)
## Performance
Parallel parsing via rayon. Incremental caching via mtime + xxh3.
| Small (indxr) | 47 | 19K | 17ms | 5ms |
| Medium (atuin) | 132 | 22K | 20ms | 6ms |
| Large (cloud-hypervisor) | 243 | 124K | 73ms | ~10ms |
## Documentation
| [CLI Reference](docs/cli-reference.md) | Complete flag and option reference |
| [Languages](docs/languages.md) | Per-language extraction details |
| [Output Formats](docs/output-formats.md) | Format and detail level reference |
| [Filtering](docs/filtering.md) | Path, kind, symbol, visibility filters |
| [Dependency Graph](docs/dep-graph.md) | File and symbol dependency visualization |
| [Git Diffing](docs/git-diffing.md) | Structural diff since any git ref |
| [Token Budget](docs/token-budget.md) | Truncation strategy and scoring |
| [Caching](docs/caching.md) | Cache format and invalidation |
| [MCP Server](docs/mcp-server.md) | MCP tools, protocol, and client setup |
| [Agent Integration](docs/agent-integration.md) | Usage with Claude, Codex, Cursor, Copilot, etc. |
## Contributing
Contributions welcome — feel free to open an issue or submit a PR.
## License
[MIT](LICENSE)