indxr 0.2.0

Fast codebase indexer for AI agents
indxr-0.2.0 is not a library.

indxr

A fast codebase indexer and MCP server for AI coding agents.

CI Crates.io License

AI coding agents waste thousands of tokens reading entire source files just to understand what's in them. indxr gives agents a structural map of your codebase — declarations, imports, relationships, and dependency graphs — so they can query for exactly what they need at a fraction of the token cost.


Features

  • 27 languages — tree-sitter AST parsing for 8 languages, regex extraction for 19 more
  • 18-tool MCP server — live codebase queries over JSON-RPC: symbol lookup, file summaries, caller tracing, signature search, and more
  • Token-aware — progressive truncation to fit context windows, ~5x reduction vs reading full files
  • Git structural diffing — declaration-level diffs (+ added, - removed, ~ changed) against any git ref
  • Dependency graphs — file and symbol dependency visualization as DOT, Mermaid, or JSON
  • File watching — continuous re-indexing as you edit, via indxr watch or indxr serve --watch
  • One-command agent setupindxr init configures Claude Code, Cursor, and Windsurf with MCP, instruction files, and hooks
  • Incremental caching — mtime + xxh3 content hashing, sub-20ms indexing for most projects
  • Composable filters — by path, kind, symbol name, visibility, and language
  • Three output formats — Markdown (default), JSON, YAML at three detail levels

Install

cargo install indxr

Or build from source:

git clone https://github.com/bahdotsh/indxr.git
cd indxr && cargo build --release

Usage

indxr                                        # index cwd → stdout
indxr ./my-project -o INDEX.md               # index project → file
indxr -f json -l rust,python -o index.json   # JSON, filter by language
indxr serve ./my-project                     # start MCP server
indxr serve ./my-project --watch             # MCP server with auto-reindex
indxr watch ./my-project                     # watch & keep INDEX.md updated
indxr init                                   # set up all agent configs

Agent Setup

indxr init                    # set up for all agents
indxr init --claude           # Claude Code only
indxr init --cursor           # Cursor only
indxr init --windsurf         # Windsurf only
Agent Files Created
Claude Code .mcp.json, CLAUDE.md, .claude/settings.json (PreToolUse hooks)
Cursor .cursor/mcp.json, .cursorrules
Windsurf .windsurf/mcp.json, .windsurfrules
All .gitignore entry, INDEX.md (static index)

Agents don't always pick MCP tools over file reads on their own. indxr init sets up reinforcement — PreToolUse hooks intercept Read/Bash calls and instruction files teach the exploration workflow.

MCP Server

JSON-RPC 2.0 over stdin/stdout, 18 tools:

Tool Description
search_relevant Multi-signal relevance search across paths, names, signatures, and docs
lookup_symbol Find declarations by name (case-insensitive substring)
explain_symbol Signature, doc comment, relationships, metadata — no body
get_file_summary Complete file overview without reading it
batch_file_summaries Summarize multiple files in one call
get_file_context File summary + reverse dependencies + related files
get_public_api Public declarations with signatures for a file or directory
get_callers Find who references a symbol across all files
get_related_tests Find test functions by naming convention
list_declarations List declarations in a file with optional filters
search_signatures Search functions by signature pattern
read_source Read source by symbol name or line range
get_token_estimate Estimate tokens before reading
get_tree Directory/file tree
get_imports Import statements for a file
get_stats File count, line count, language breakdown
get_diff_summary Structural changes since a git ref
regenerate_index Re-index and update INDEX.md

List tools support compact mode for ~30% token savings. See MCP Server docs for full parameter details.

Output

Default format is Markdown at signatures detail level:

# Codebase Index: my-project

> Generated: 2025-03-23 | Files: 42 | Lines: 8,234
> Languages: Rust (28), Python (10), TypeScript (4)

## Directory Structure
src/
  main.rs
  parser/
    mod.rs
    rust.rs

## src/main.rs

**Language:** Rust | **Size:** 1.2 KB | **Lines:** 45

**Declarations:**
`pub fn main() -> Result<()>`
`pub struct App`
Detail Level Content
summary Directory tree + file list
signatures (default) + declarations, imports
full + doc comments, line numbers, body counts, metadata, relationships

Filtering

indxr --filter-path src/parser              # subtree
indxr --kind function --public-only         # public functions only
indxr --symbol "parse"                      # symbol name search
indxr -l rust,python                        # language filter
indxr --filter-path src/model --kind struct --public-only  # combine

All filters compose. --kind accepts: function, struct, class, trait, enum, interface, module, method, constant, impl, type, namespace, macro, and more.

Git Structural Diffing

indxr --since main
indxr --since v1.0.0
indxr --since HEAD~5
## Modified Files

### src/parser/mod.rs
+ `pub fn new_parser() -> Parser`
- `fn old_helper()`
~ `fn process(x: i32)` → `fn process(x: i32, y: i32)`

Markers: + added, - removed, ~ signature changed.

Dependency Graph

indxr --graph dot                            # file-level DOT graph
indxr --graph mermaid                        # file-level Mermaid diagram
indxr --graph json                           # JSON graph
indxr --graph dot --graph-level symbol       # symbol-level graph
indxr --graph mermaid --filter-path src/mcp  # scoped to a directory
indxr --graph dot --graph-depth 2            # limit to 2 hops
Level Description
file (default) File-to-file import relationships
symbol Symbol-to-symbol relationships (trait impls, method calls)

Token Budget

indxr --max-tokens 4000

Truncation order: doc comments → private declarations → children → least-important files. Directory tree and public API surface are preserved first.

Languages

8 tree-sitter (full AST) + 19 regex (structural extraction):

Parser Languages
tree-sitter Rust, Python, TypeScript/TSX, JavaScript/JSX, Go, Java, C, C++
regex Shell, TOML, YAML, JSON, SQL, Markdown, Protobuf, GraphQL, Ruby, Kotlin, Swift, C#, Objective-C, XML, HTML, CSS, Gradle, CMake, Properties

Detection is by file extension. Full details: docs/languages.md

Performance

Parallel parsing via rayon. Incremental caching via mtime + xxh3.

Codebase Files Lines Cold Cached
Small (indxr) 47 19K 17ms 5ms
Medium (atuin) 132 22K 20ms 6ms
Large (cloud-hypervisor) 243 124K 73ms ~10ms

Documentation

Document Description
CLI Reference Complete flag and option reference
Languages Per-language extraction details
Output Formats Format and detail level reference
Filtering Path, kind, symbol, visibility filters
Dependency Graph File and symbol dependency visualization
Git Diffing Structural diff since any git ref
Token Budget Truncation strategy and scoring
Caching Cache format and invalidation
MCP Server MCP tools, protocol, and client setup
Agent Integration Usage with Claude, Codex, Cursor, Copilot, etc.

Contributing

Contributions welcome — feel free to open an issue or submit a PR.

License

MIT