ImpactSense Parser

A multi-language static analysis tool written in Rust that parses source code using Tree-Sitter, extracts structural symbols (files, classes, functions, API endpoints), and builds a dependency graph in Neo4j for impact analysis.

Given a codebase, it answers questions like "If I change this class, which functions and files are affected?" by constructing a queryable graph of code relationships.

Supported Languages

Language	Parsing	Classes	Call Graph	File Dependencies	API Endpoints
Java	Full AST	Yes	Partial	Yes (imports)	Spring
C#	Full AST	Yes	Partial	No	ASP.NET
Go	Full AST	Yes	Partial	No	Chi/Gin/Echo
Erlang	Text	Module	Approximate	Yes	Cowboy
JavaScript	Full AST	No	No	No	No
TypeScript	Full AST	No	No	No	No
Python	Full AST	No	No	No	No
Rust	Full AST	No	No	No	No

Architecture

                          ┌──────────────────────┐
                          │   CLI  (main.rs)     │
                          │   clap arg parsing   │
                          └──────────┬───────────┘
                                     │
                                     ▼
                          ┌──────────────────────┐
                          │  scanner.rs           │
                          │  walkdir + rayon      │
                          │  parallel file parse  │
                          └──────────┬───────────┘
                                     │
                          Vec<ParsedFile>
                                     │
                    ┌────────────────┼────────────────┐
                    ▼                                  ▼
         ┌───────────────────┐             ┌────────────────────┐
         │  JSON output       │             │  graph.rs           │
         │  (--output-json)   │             │  Neo4j persistence  │
         │  AST summaries     │             │  (--push-to-neo4j)  │
         └───────────────────┘             └────────────────────┘

Scan — Recursively walks the target directory, identifies source files by extension, and filters by max file size.
Parse — Each file is parsed in parallel (via Rayon) using Tree-Sitter grammars, producing an AST per file.
Extract — Language-specific extractors pull out classes, functions, imports, call sites, API endpoints, and external API references.
Persist — Extracted symbols and relationships are written to Neo4j as a labeled property graph. Relationships are batched (3000 edges per flush) to reduce round-trips.
Post-process — SAME_API edges are created between internal ApiEndpoint nodes and ExternalApi nodes that share a normalized path.

Prerequisites

Rust (edition 2024) — install via rustup
Neo4j 5 — run via Docker (see below)
C compiler — required by build.rs to compile the vendored Erlang Tree-Sitter grammar

Installation

git clone http://git.redbus.com/sujal.v/impactdependency.git
cd impactdependency/parser
cargo build --release

The build step compiles the vendored Erlang grammar from vendor/tree-sitter-erlang/ via build.rs.

Neo4j Setup

Start a Neo4j 5 instance with Docker:

docker run -d \
  --name neo4j-parser \
  -p 7474:7474 \
  -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/parser1234 \
  neo4j:5

The Neo4j Browser will be available at http://localhost:7474/.

Usage

Basic — parse and output JSON

cargo run -- /path/to/repo --output-json parsed_output.json

Parse and push to Neo4j

cargo run -- /path/to/repo \
  --output-json parsed_output.json \
  --push-to-neo4j

Full options with custom Neo4j credentials

cargo run -- /path/to/repo \
  --output-json parsed_output.json \
  --push-to-neo4j \
  --clean \
  --neo4j-uri bolt://localhost:7687 \
  --neo4j-user neo4j \
  --neo4j-password myStrongPass123

CLI Reference

Argument	Type	Default	Description
`ROOT`	path	(required)	Root directory to scan
`--output-json`	path	—	Write AST summaries to a JSON file
`--push-to-neo4j`	flag	`false`	Push the parsed graph into Neo4j
`--clean`	flag	`false`	Delete all existing nodes before pushing
`--neo4j-uri`	string	`bolt://localhost:7688`	Neo4j Bolt URI
`--neo4j-user`	string	`neo4j`	Neo4j username
`--neo4j-password`	string	`parser1234`	Neo4j password
`--follow-symlinks`	flag	`false`	Follow symbolic links during traversal
`--max-file-size`	bytes	2 MiB	Skip files larger than this

Graph Schema

Node Types

Label	Key Properties
`File`	`path`, `language`, `framework?`, `project_name?`, `is_test?`
`Module`	`name`, `path`, `language` (Erlang modules)
`Class`	`name`, `fqn`, `path`, `language?`, `project_name?`
`Function`	`name`, `fqn`, `path`, `language`, `arity?`, `return_type?`, `param_count?`
`ApiEndpoint`	`methods[]`, `path`, `norm_path?`, `framework?`
`ExternalApi`	`name`, `base_url?`, `path?`, `norm_path?`, `provider?`

Relationships

(:File)-[:DECLARES_MODULE]->(:Module)
(:File)-[:DECLARES_CLASS]->(:Class)
(:File)-[:DECLARES_FUNCTION]->(:Function)
(:Class)-[:DECLARES_FUNCTION]->(:Function)
(:Module)-[:DECLARES_FUNCTION]->(:Function)
(:File)-[:DEPENDS_ON_FILE]->(:File)
(:Function)-[:CALLS_FUNCTION]->(:Function)
(:Function)-[:USES_CLASS]->(:Class)
(:ApiEndpoint)-[:HANDLED_BY]->(:Function)
(:Function)-[:CALLS_EXTERNAL_API]->(:ExternalApi)
(:ApiEndpoint)-[:SAME_API]->(:ExternalApi)

Example Queries

Once the graph is in Neo4j, you can run Cypher queries for impact analysis:

// Which functions call OrderDetail.setAmenities?
MATCH (caller:Function)-[:CALLS_FUNCTION]->(target:Function {name: "setAmenities"})
WHERE target.fqn CONTAINS "OrderDetail"
RETURN caller.fqn, caller.path

// Which files depend on OrderDetail.java?
MATCH (f:File)-[:DEPENDS_ON_FILE]->(dep:File)
WHERE dep.path CONTAINS "OrderDetail.java"
RETURN f.path

// All functions reachable within 3 hops from a given function
MATCH path = (start:Function {name: "processOrder"})-[:CALLS_FUNCTION*1..3]->(downstream:Function)
RETURN downstream.fqn, length(path) AS depth

// API endpoints and their handler functions
MATCH (ep:ApiEndpoint)-[:HANDLED_BY]->(fn:Function)
RETURN ep.path, ep.methods, fn.fqn

MCP Server Integration

The parser ships with a FastMCP server so it can be invoked as a tool from Cursor IDE or any MCP-compatible client.

Setup

cd parser/mcp
pip install -r requirements.txt
python main.py

The MCP server exposes a parse_repository tool with parameters matching the CLI arguments. It runs cargo run as a subprocess, pipes progress logs to stderr (to keep the JSON-RPC stdout channel clean), and returns the parse results.

Tool: `parse_repository`

Parameter	Type	Description
`root_path`	string	Directory to parse
`follow_symlinks`	bool	Follow symlinks
`max_file_size`	int	Max file size in bytes
`push_to_neo4j`	bool	Push graph to Neo4j
`neo4j_uri`	string	Neo4j Bolt URI
`neo4j_user`	string	Neo4j username
`neo4j_password`	string	Neo4j password

Project Structure

parser/
├── Cargo.toml                 # Rust dependencies and build config
├── build.rs                   # Compiles vendored Erlang grammar (C → .a)
├── graph_schema.md            # Neo4j node/relationship schema reference
├── src/
│   ├── main.rs                # CLI entry point (clap)
│   ├── lib.rs                 # Language registry and Tree-Sitter wrapper
│   ├── scanner.rs             # Directory walker + parallel parser
│   ├── graph.rs               # Symbol extraction + Neo4j persistence
│   ├── edge.rs                # Relationship type enum
│   ├── schema.rs              # Node labels and property constants
│   ├── ir.rs                  # Intermediate representation for serialization
│   └── erlang.rs              # FFI binding for vendored Erlang grammar
├── vendor/
│   └── tree-sitter-erlang/    # Vendored Erlang Tree-Sitter grammar (C source)
├── mcp/
│   ├── main.py                # MCP server entry point
│   ├── app.py                 # FastMCP app definition
│   ├── services/
│   │   └── parser_service.py  # Subprocess runner for cargo
│   ├── tools/
│   │   └── parser_tools.py    # parse_repository tool definition
│   └── requirements.txt       # Python dependencies
└── prompts/                   # Prompt templates for MCP tool usage

Known Limitations

Java imports are filtered to com.redbus.genai.* by default — other internal packages are not tracked.
C# and Go lack file-level dependency edges (DEPENDS_ON_FILE).
Erlang uses regex-based text parsing instead of the Tree-Sitter AST for function extraction.
JS, TS, Python, Rust only extract top-level functions — no classes, call graphs, or dependency edges.
Class inheritance (extends/implements) is not tracked for any language.
Neo4j writes are sequential per file, which can be slow for large codebases (10k+ files).
No incremental parsing in CLI — the full codebase is re-parsed on every CLI run (MCP server supports incremental file-watcher updates).

See shortcomings.txt for a detailed analysis.

Client-side library (in-memory graph)

Add from crates.io:

[dependencies]
impactsense-parser = "0.1"

The impactsense-parser crate builds an InMemoryGraph in RAM with indexed queries for IDE/MCP use.

use impactsense_parser::pipeline::ScanOptions;
use impactsense_parser::parse_project;
use impactsense_parser::store::GraphStore;

let graph = parse_project("/path/to/repo", &ScanOptions::default())?;
let callers = graph.callers("com.example.OrderService.create");
let impact = graph.impact("com.example.OrderService.create", Default::default());

Export IR as JSON from the CLI:

cargo install impactsense-parser
impactsense-parser /path/to/repo --output-json project_ir.json

Cargo features

Feature	Default	Description
`neo4j`	yes	Neo4j persistence (`--push-to-neo4j`, webhook)
`compressor`	no	Reserved for RedCompressor integration

Cursor MCP setup

One install gives you both the CLI and the MCP server:

cargo install impactsense-parser

Binaries are placed in ~/.cargo/bin/:

impactsense-parser — CLI
impactsense-mcp — MCP server for Cursor

Create .cursor/mcp.json in your project:

{
  "mcpServers": {
    "impactsense": {
      "command": "/Users/YOUR_USER/.cargo/bin/impactsense-mcp",
      "args": ["--root", "${workspaceFolder}"]
    }
  }
}

Replace YOUR_USER with your username, or run which impactsense-mcp after install to get the exact path.

Restart Cursor. The server parses your open workspace once at startup, then keeps the graph updated as you edit files.

MCP tools

Tool	Description
`find_symbol`	Search by name or FQN substring
`callers` / `callees`	Direct call graph neighbors
`file_dependencies`	Import/file deps for a path
`symbols_in_file`	Declared symbols in one file
`impact_analysis`	Transitive callers (bounded depth)
`graph_stats`	Node/edge counts

The graph lives in MCP process memory. Restart MCP/Cursor to re-bootstrap after large branch switches.

impactsense-parser 0.1.0