impactsense-parser 0.1.0

Multi-language static analysis: parse codebases into an in-memory dependency graph for impact analysis
Documentation

ImpactSense Parser

A multi-language static analysis tool written in Rust that parses source code using Tree-Sitter, extracts structural symbols (files, classes, functions, API endpoints), and builds a dependency graph in Neo4j for impact analysis.

Given a codebase, it answers questions like "If I change this class, which functions and files are affected?" by constructing a queryable graph of code relationships.

Supported Languages

Language Parsing Classes Call Graph File Dependencies API Endpoints
Java Full AST Yes Partial Yes (imports) Spring
C# Full AST Yes Partial No ASP.NET
Go Full AST Yes Partial No Chi/Gin/Echo
Erlang Text Module Approximate Yes Cowboy
JavaScript Full AST No No No No
TypeScript Full AST No No No No
Python Full AST No No No No
Rust Full AST No No No No

Architecture

                          ┌──────────────────────┐
                          │   CLI  (main.rs)     │
                          │   clap arg parsing   │
                          └──────────┬───────────┘
                                     │
                                     ▼
                          ┌──────────────────────┐
                          │  scanner.rs           │
                          │  walkdir + rayon      │
                          │  parallel file parse  │
                          └──────────┬───────────┘
                                     │
                          Vec<ParsedFile>
                                     │
                    ┌────────────────┼────────────────┐
                    ▼                                  ▼
         ┌───────────────────┐             ┌────────────────────┐
         │  JSON output       │             │  graph.rs           │
         │  (--output-json)   │             │  Neo4j persistence  │
         │  AST summaries     │             │  (--push-to-neo4j)  │
         └───────────────────┘             └────────────────────┘
  1. Scan — Recursively walks the target directory, identifies source files by extension, and filters by max file size.
  2. Parse — Each file is parsed in parallel (via Rayon) using Tree-Sitter grammars, producing an AST per file.
  3. Extract — Language-specific extractors pull out classes, functions, imports, call sites, API endpoints, and external API references.
  4. Persist — Extracted symbols and relationships are written to Neo4j as a labeled property graph. Relationships are batched (3000 edges per flush) to reduce round-trips.
  5. Post-processSAME_API edges are created between internal ApiEndpoint nodes and ExternalApi nodes that share a normalized path.

Prerequisites

  • Rust (edition 2024) — install via rustup
  • Neo4j 5 — run via Docker (see below)
  • C compiler — required by build.rs to compile the vendored Erlang Tree-Sitter grammar

Installation

git clone http://git.redbus.com/sujal.v/impactdependency.git
cd impactdependency/parser
cargo build --release

The build step compiles the vendored Erlang grammar from vendor/tree-sitter-erlang/ via build.rs.

Neo4j Setup

Start a Neo4j 5 instance with Docker:

docker run -d \
  --name neo4j-parser \
  -p 7474:7474 \
  -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/parser1234 \
  neo4j:5

The Neo4j Browser will be available at http://localhost:7474/.

Usage

Basic — parse and output JSON

cargo run -- /path/to/repo --output-json parsed_output.json

Parse and push to Neo4j

cargo run -- /path/to/repo \
  --output-json parsed_output.json \
  --push-to-neo4j

Full options with custom Neo4j credentials

cargo run -- /path/to/repo \
  --output-json parsed_output.json \
  --push-to-neo4j \
  --clean \
  --neo4j-uri bolt://localhost:7687 \
  --neo4j-user neo4j \
  --neo4j-password myStrongPass123

CLI Reference

Argument Type Default Description
ROOT path (required) Root directory to scan
--output-json path Write AST summaries to a JSON file
--push-to-neo4j flag false Push the parsed graph into Neo4j
--clean flag false Delete all existing nodes before pushing
--neo4j-uri string bolt://localhost:7688 Neo4j Bolt URI
--neo4j-user string neo4j Neo4j username
--neo4j-password string parser1234 Neo4j password
--follow-symlinks flag false Follow symbolic links during traversal
--max-file-size bytes 2 MiB Skip files larger than this

Graph Schema

Node Types

Label Key Properties
File path, language, framework?, project_name?, is_test?
Module name, path, language (Erlang modules)
Class name, fqn, path, language?, project_name?
Function name, fqn, path, language, arity?, return_type?, param_count?
ApiEndpoint methods[], path, norm_path?, framework?
ExternalApi name, base_url?, path?, norm_path?, provider?

Relationships

(:File)-[:DECLARES_MODULE]->(:Module)
(:File)-[:DECLARES_CLASS]->(:Class)
(:File)-[:DECLARES_FUNCTION]->(:Function)
(:Class)-[:DECLARES_FUNCTION]->(:Function)
(:Module)-[:DECLARES_FUNCTION]->(:Function)
(:File)-[:DEPENDS_ON_FILE]->(:File)
(:Function)-[:CALLS_FUNCTION]->(:Function)
(:Function)-[:USES_CLASS]->(:Class)
(:ApiEndpoint)-[:HANDLED_BY]->(:Function)
(:Function)-[:CALLS_EXTERNAL_API]->(:ExternalApi)
(:ApiEndpoint)-[:SAME_API]->(:ExternalApi)

Example Queries

Once the graph is in Neo4j, you can run Cypher queries for impact analysis:

// Which functions call OrderDetail.setAmenities?
MATCH (caller:Function)-[:CALLS_FUNCTION]->(target:Function {name: "setAmenities"})
WHERE target.fqn CONTAINS "OrderDetail"
RETURN caller.fqn, caller.path

// Which files depend on OrderDetail.java?
MATCH (f:File)-[:DEPENDS_ON_FILE]->(dep:File)
WHERE dep.path CONTAINS "OrderDetail.java"
RETURN f.path

// All functions reachable within 3 hops from a given function
MATCH path = (start:Function {name: "processOrder"})-[:CALLS_FUNCTION*1..3]->(downstream:Function)
RETURN downstream.fqn, length(path) AS depth

// API endpoints and their handler functions
MATCH (ep:ApiEndpoint)-[:HANDLED_BY]->(fn:Function)
RETURN ep.path, ep.methods, fn.fqn

MCP Server Integration

The parser ships with a FastMCP server so it can be invoked as a tool from Cursor IDE or any MCP-compatible client.

Setup

cd parser/mcp
pip install -r requirements.txt
python main.py

The MCP server exposes a parse_repository tool with parameters matching the CLI arguments. It runs cargo run as a subprocess, pipes progress logs to stderr (to keep the JSON-RPC stdout channel clean), and returns the parse results.

Tool: parse_repository

Parameter Type Description
root_path string Directory to parse
follow_symlinks bool Follow symlinks
max_file_size int Max file size in bytes
push_to_neo4j bool Push graph to Neo4j
neo4j_uri string Neo4j Bolt URI
neo4j_user string Neo4j username
neo4j_password string Neo4j password

Project Structure

parser/
├── Cargo.toml                 # Rust dependencies and build config
├── build.rs                   # Compiles vendored Erlang grammar (C → .a)
├── graph_schema.md            # Neo4j node/relationship schema reference
├── src/
│   ├── main.rs                # CLI entry point (clap)
│   ├── lib.rs                 # Language registry and Tree-Sitter wrapper
│   ├── scanner.rs             # Directory walker + parallel parser
│   ├── graph.rs               # Symbol extraction + Neo4j persistence
│   ├── edge.rs                # Relationship type enum
│   ├── schema.rs              # Node labels and property constants
│   ├── ir.rs                  # Intermediate representation for serialization
│   └── erlang.rs              # FFI binding for vendored Erlang grammar
├── vendor/
│   └── tree-sitter-erlang/    # Vendored Erlang Tree-Sitter grammar (C source)
├── mcp/
│   ├── main.py                # MCP server entry point
│   ├── app.py                 # FastMCP app definition
│   ├── services/
│   │   └── parser_service.py  # Subprocess runner for cargo
│   ├── tools/
│   │   └── parser_tools.py    # parse_repository tool definition
│   └── requirements.txt       # Python dependencies
└── prompts/                   # Prompt templates for MCP tool usage

Known Limitations

  • Java imports are filtered to com.redbus.genai.* by default — other internal packages are not tracked.
  • C# and Go lack file-level dependency edges (DEPENDS_ON_FILE).
  • Erlang uses regex-based text parsing instead of the Tree-Sitter AST for function extraction.
  • JS, TS, Python, Rust only extract top-level functions — no classes, call graphs, or dependency edges.
  • Class inheritance (extends/implements) is not tracked for any language.
  • Neo4j writes are sequential per file, which can be slow for large codebases (10k+ files).
  • No incremental parsing in CLI — the full codebase is re-parsed on every CLI run (MCP server supports incremental file-watcher updates).

See shortcomings.txt for a detailed analysis.


Client-side library (in-memory graph)

Add from crates.io:

[dependencies]
impactsense-parser = "0.1"

The impactsense-parser crate builds an InMemoryGraph in RAM with indexed queries for IDE/MCP use.

use impactsense_parser::pipeline::ScanOptions;
use impactsense_parser::parse_project;
use impactsense_parser::store::GraphStore;

let graph = parse_project("/path/to/repo", &ScanOptions::default())?;
let callers = graph.callers("com.example.OrderService.create");
let impact = graph.impact("com.example.OrderService.create", Default::default());

Export IR as JSON from the CLI:

cargo install impactsense-parser
impactsense-parser /path/to/repo --output-json project_ir.json

Cargo features

Feature Default Description
neo4j yes Neo4j persistence (--push-to-neo4j, webhook)
compressor no Reserved for RedCompressor integration

Cursor MCP setup

One install gives you both the CLI and the MCP server:

cargo install impactsense-parser

Binaries are placed in ~/.cargo/bin/:

  • impactsense-parser — CLI
  • impactsense-mcp — MCP server for Cursor

Create .cursor/mcp.json in your project:

{
  "mcpServers": {
    "impactsense": {
      "command": "/Users/YOUR_USER/.cargo/bin/impactsense-mcp",
      "args": ["--root", "${workspaceFolder}"]
    }
  }
}

Replace YOUR_USER with your username, or run which impactsense-mcp after install to get the exact path.

Restart Cursor. The server parses your open workspace once at startup, then keeps the graph updated as you edit files.

MCP tools

Tool Description
find_symbol Search by name or FQN substring
callers / callees Direct call graph neighbors
file_dependencies Import/file deps for a path
symbols_in_file Declared symbols in one file
impact_analysis Transitive callers (bounded depth)
graph_stats Node/edge counts

The graph lives in MCP process memory. Restart MCP/Cursor to re-bootstrap after large branch switches.