# ImpactSense Parser
A multi-language static analysis tool written in Rust that parses source code using [Tree-Sitter](https://tree-sitter.github.io/tree-sitter/), extracts structural symbols (files, classes, functions, API endpoints), and builds a dependency graph in [Neo4j](https://neo4j.com/) for impact analysis.
Given a codebase, it answers questions like *"If I change this class, which functions and files are affected?"* by constructing a queryable graph of code relationships.
## Supported Languages
| Java | Full AST | Yes | Partial | Yes (imports) | Spring |
| C# | Full AST | Yes | Partial | No | ASP.NET |
| Go | Full AST | Yes | Partial | No | Chi/Gin/Echo |
| Erlang | Text | Module | Approximate| Yes | Cowboy |
| JavaScript | Full AST | No | No | No | No |
| TypeScript | Full AST | No | No | No | No |
| Python | Full AST | No | No | No | No |
| Rust | Full AST | No | No | No | No |
## Architecture
```
┌──────────────────────┐
│ CLI (main.rs) │
│ clap arg parsing │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ scanner.rs │
│ walkdir + rayon │
│ parallel file parse │
└──────────┬───────────┘
│
Vec<ParsedFile>
│
┌────────────────┼────────────────┐
▼ ▼
┌───────────────────┐ ┌────────────────────┐
│ JSON output │ │ graph.rs │
│ (--output-json) │ │ Neo4j persistence │
│ AST summaries │ │ (--push-to-neo4j) │
└───────────────────┘ └────────────────────┘
```
1. **Scan** — Recursively walks the target directory, identifies source files by extension, and filters by max file size.
2. **Parse** — Each file is parsed in parallel (via Rayon) using Tree-Sitter grammars, producing an AST per file.
3. **Extract** — Language-specific extractors pull out classes, functions, imports, call sites, API endpoints, and external API references.
4. **Persist** — Extracted symbols and relationships are written to Neo4j as a labeled property graph. Relationships are batched (3000 edges per flush) to reduce round-trips.
5. **Post-process** — `SAME_API` edges are created between internal `ApiEndpoint` nodes and `ExternalApi` nodes that share a normalized path.
## Prerequisites
- **Rust** (edition 2024) — install via [rustup](https://rustup.rs/)
- **Neo4j 5** — run via Docker (see below)
- **C compiler** — required by `build.rs` to compile the vendored Erlang Tree-Sitter grammar
## Installation
```bash
git clone http://git.redbus.com/sujal.v/impactdependency.git
cd impactdependency/parser
cargo build --release
```
The build step compiles the vendored Erlang grammar from `vendor/tree-sitter-erlang/` via `build.rs`.
## Neo4j Setup
Start a Neo4j 5 instance with Docker:
```bash
docker run -d \
--name neo4j-parser \
-p 7474:7474 \
-p 7687:7687 \
-e NEO4J_AUTH=neo4j/parser1234 \
neo4j:5
```
The Neo4j Browser will be available at `http://localhost:7474/`.
## Usage
### Basic — parse and output JSON
```bash
cargo run -- /path/to/repo --output-json parsed_output.json
```
### Parse and push to Neo4j
```bash
cargo run -- /path/to/repo \
--output-json parsed_output.json \
--push-to-neo4j
```
### Full options with custom Neo4j credentials
```bash
cargo run -- /path/to/repo \
--output-json parsed_output.json \
--push-to-neo4j \
--clean \
--neo4j-uri bolt://localhost:7687 \
--neo4j-user neo4j \
--neo4j-password myStrongPass123
```
### CLI Reference
| `ROOT` | path | *(required)* | Root directory to scan |
| `--output-json` | path | — | Write AST summaries to a JSON file |
| `--push-to-neo4j` | flag | `false` | Push the parsed graph into Neo4j |
| `--clean` | flag | `false` | Delete all existing nodes before pushing |
| `--neo4j-uri` | string | `bolt://localhost:7688` | Neo4j Bolt URI |
| `--neo4j-user` | string | `neo4j` | Neo4j username |
| `--neo4j-password` | string | `parser1234` | Neo4j password |
| `--follow-symlinks` | flag | `false` | Follow symbolic links during traversal |
| `--max-file-size` | bytes | 2 MiB | Skip files larger than this |
## Graph Schema
### Node Types
| `File` | `path`, `language`, `framework?`, `project_name?`, `is_test?` |
| `Module` | `name`, `path`, `language` (Erlang modules) |
| `Class` | `name`, `fqn`, `path`, `language?`, `project_name?` |
| `Function` | `name`, `fqn`, `path`, `language`, `arity?`, `return_type?`, `param_count?` |
| `ApiEndpoint` | `methods[]`, `path`, `norm_path?`, `framework?` |
| `ExternalApi` | `name`, `base_url?`, `path?`, `norm_path?`, `provider?` |
### Relationships
```
(:File)-[:DECLARES_MODULE]->(:Module)
(:File)-[:DECLARES_CLASS]->(:Class)
(:File)-[:DECLARES_FUNCTION]->(:Function)
(:Class)-[:DECLARES_FUNCTION]->(:Function)
(:Module)-[:DECLARES_FUNCTION]->(:Function)
(:File)-[:DEPENDS_ON_FILE]->(:File)
(:Function)-[:CALLS_FUNCTION]->(:Function)
(:Function)-[:USES_CLASS]->(:Class)
(:ApiEndpoint)-[:HANDLED_BY]->(:Function)
(:Function)-[:CALLS_EXTERNAL_API]->(:ExternalApi)
(:ApiEndpoint)-[:SAME_API]->(:ExternalApi)
```
## Example Queries
Once the graph is in Neo4j, you can run Cypher queries for impact analysis:
```cypher
// Which functions call OrderDetail.setAmenities?
MATCH (caller:Function)-[:CALLS_FUNCTION]->(target:Function {name: "setAmenities"})
WHERE target.fqn CONTAINS "OrderDetail"
RETURN caller.fqn, caller.path
// Which files depend on OrderDetail.java?
MATCH (f:File)-[:DEPENDS_ON_FILE]->(dep:File)
WHERE dep.path CONTAINS "OrderDetail.java"
RETURN f.path
// All functions reachable within 3 hops from a given function
MATCH path = (start:Function {name: "processOrder"})-[:CALLS_FUNCTION*1..3]->(downstream:Function)
RETURN downstream.fqn, length(path) AS depth
// API endpoints and their handler functions
MATCH (ep:ApiEndpoint)-[:HANDLED_BY]->(fn:Function)
RETURN ep.path, ep.methods, fn.fqn
```
## MCP Server Integration
The parser ships with a [FastMCP](https://github.com/jlowin/fastmcp) server so it can be invoked as a tool from Cursor IDE or any MCP-compatible client.
### Setup
```bash
cd parser/mcp
pip install -r requirements.txt
python main.py
```
The MCP server exposes a `parse_repository` tool with parameters matching the CLI arguments. It runs `cargo run` as a subprocess, pipes progress logs to stderr (to keep the JSON-RPC stdout channel clean), and returns the parse results.
### Tool: `parse_repository`
| `root_path` | string | Directory to parse |
| `follow_symlinks` | bool | Follow symlinks |
| `max_file_size` | int | Max file size in bytes |
| `push_to_neo4j` | bool | Push graph to Neo4j |
| `neo4j_uri` | string | Neo4j Bolt URI |
| `neo4j_user` | string | Neo4j username |
| `neo4j_password` | string | Neo4j password |
## Project Structure
```
parser/
├── Cargo.toml # Rust dependencies and build config
├── build.rs # Compiles vendored Erlang grammar (C → .a)
├── graph_schema.md # Neo4j node/relationship schema reference
├── src/
│ ├── main.rs # CLI entry point (clap)
│ ├── lib.rs # Language registry and Tree-Sitter wrapper
│ ├── scanner.rs # Directory walker + parallel parser
│ ├── graph.rs # Symbol extraction + Neo4j persistence
│ ├── edge.rs # Relationship type enum
│ ├── schema.rs # Node labels and property constants
│ ├── ir.rs # Intermediate representation for serialization
│ └── erlang.rs # FFI binding for vendored Erlang grammar
├── vendor/
│ └── tree-sitter-erlang/ # Vendored Erlang Tree-Sitter grammar (C source)
├── mcp/
│ ├── main.py # MCP server entry point
│ ├── app.py # FastMCP app definition
│ ├── services/
│ │ └── parser_service.py # Subprocess runner for cargo
│ ├── tools/
│ │ └── parser_tools.py # parse_repository tool definition
│ └── requirements.txt # Python dependencies
└── prompts/ # Prompt templates for MCP tool usage
```
## Known Limitations
- **Java imports** are filtered to `com.redbus.genai.*` by default — other internal packages are not tracked.
- **C# and Go** lack file-level dependency edges (`DEPENDS_ON_FILE`).
- **Erlang** uses regex-based text parsing instead of the Tree-Sitter AST for function extraction.
- **JS, TS, Python, Rust** only extract top-level functions — no classes, call graphs, or dependency edges.
- **Class inheritance** (`extends`/`implements`) is not tracked for any language.
- **Neo4j writes are sequential** per file, which can be slow for large codebases (10k+ files).
- **No incremental parsing in CLI** — the full codebase is re-parsed on every CLI run (MCP server supports incremental file-watcher updates).
See `shortcomings.txt` for a detailed analysis.
---
## Client-side library (in-memory graph)
Add from [crates.io](https://crates.io/crates/impactsense-parser):
```toml
[dependencies]
impactsense-parser = "0.1"
```
The `impactsense-parser` crate builds an **`InMemoryGraph`** in RAM with indexed queries for IDE/MCP use.
```rust
use impactsense_parser::pipeline::ScanOptions;
use impactsense_parser::parse_project;
use impactsense_parser::store::GraphStore;
let graph = parse_project("/path/to/repo", &ScanOptions::default())?;
let callers = graph.callers("com.example.OrderService.create");
let impact = graph.impact("com.example.OrderService.create", Default::default());
```
Export IR as JSON from the CLI:
```bash
cargo install impactsense-parser
impactsense-parser /path/to/repo --output-json project_ir.json
```
### Cargo features
| `neo4j` | yes | Neo4j persistence (`--push-to-neo4j`, webhook) |
| `compressor` | no | Reserved for RedCompressor integration |
---
## Cursor MCP setup
One install gives you both the CLI and the MCP server:
```bash
cargo install impactsense-parser
```
Binaries are placed in `~/.cargo/bin/`:
- `impactsense-parser` — CLI
- `impactsense-mcp` — MCP server for Cursor
Create `.cursor/mcp.json` in your project:
```json
{
"mcpServers": {
"impactsense": {
"command": "/Users/YOUR_USER/.cargo/bin/impactsense-mcp",
"args": ["--root", "${workspaceFolder}"]
}
}
}
```
Replace `YOUR_USER` with your username, or run `which impactsense-mcp` after install to get the exact path.
Restart Cursor. The server parses your open workspace once at startup, then keeps the graph updated as you edit files.
### MCP tools
| `find_symbol` | Search by name or FQN substring |
| `callers` / `callees` | Direct call graph neighbors |
| `file_dependencies` | Import/file deps for a path |
| `symbols_in_file` | Declared symbols in one file |
| `impact_analysis` | Transitive callers (bounded depth) |
| `graph_stats` | Node/edge counts |
The graph lives in MCP process memory. Restart MCP/Cursor to re-bootstrap after large branch switches.