impactsense-parser 0.1.0

Multi-language static analysis: parse codebases into an in-memory dependency graph for impact analysis
Documentation
sujal.v@SUJAL-V-MAC parser % cargo run -- ../src/main  --output-json parsed_output.json
NEO4J_URI=bolt://localhost:7688
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=parser1234
NEO4J_DATABASE=neo4j   # or parserdb after you create it
http://localhost:7475/browser/
File node
Properties:
path, language
Relationships (preferred over lists):
(:File)-[:DECLARES_CLASS]->(:Class)
(:File)-[:DECLARES_FUNCTION]->(:Function) (for top-level/static functions)
Class node
Properties:
name, fqn, path
Relationships:
(:Class)-[:DECLARES_FUNCTION]->(:Function) (instance and static methods)
docker run --hostname=0a436be43eda --env=NEO4J_AUTH=neo4j/parser1234 --env=PATH=/var/lib/neo4j/bin:/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --env=JAVA_HOME=/opt/java/openjdk --env=NEO4J_SHA256=c8231e90dfd1bfe04a943bc51df9493f96cdccdc8702e9eccaeb16c3495dedc0 --env=NEO4J_TARBALL=neo4j-community-5.26.18-unix.tar.gz --env=NEO4J_EDITION=community --env=NEO4J_HOME=/var/lib/neo4j --env=LANG=C.UTF-8 --volume=/data --volume=/logs --network=bridge --workdir=/var/lib/neo4j -p 7475:7474 -p 7688:7687 --restart=no --runtime=runc -d neo4j:5
docker run --hostname=6a2f09bbcf07 --env=NEO4J_AUTH=neo4j/test1234 --env=PATH=/var/lib/neo4j/bin:/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --env=JAVA_HOME=/opt/java/openjdk --env=NEO4J_SHA256=c8231e90dfd1bfe04a943bc51df9493f96cdccdc8702e9eccaeb16c3495dedc0 --env=NEO4J_TARBALL=neo4j-community-5.26.18-unix.tar.gz --env=NEO4J_EDITION=community --env=NEO4J_HOME=/var/lib/neo4j --env=LANG=C.UTF-8 --volume=/data --volume=/logs --network=bridge --workdir=/var/lib/neo4j -p 7476:7474 -p 7689:7687 --restart=no --runtime=runc -d neo4j:5
docker run -d \
  --name neo4j-parser-third \
  -p 7477:7474 \
  -p 7690:7687 \
  -e NEO4J_AUTH=neo4j/third1234 \
  neo4j:5
sujal.v@SUJAL-V-MAC parser % cargo run -- ../src/main --output-json parsed_output.json --push-to-neo4j
cargo run -- \
  /Users/sujal.v/Desktop/omega+capi/omega \
  --output-json parsed_output_batch_test.json \
  --push-to-neo4j \
  --neo4j-uri bolt://localhost:7689 \
  --neo4j-user neo4j \
  --neo4j-password test1234

Short answer
The schema design covers all the file/class/function combinations you care about; the implementation covers most of them, with some conservative limits.
Schema capability (design in neo4j-code-schema.md)
Supports:
File ↔ File (DEPENDS_ON_FILE)
File ↔ Class / Function (DECLARES_CLASS, DECLARES_FUNCTION)
Function ↔ Function (CALLS_FUNCTION)
And (optionally) Function/Class ↔ Class (USES_CLASS)
With this, you can express:
File → files, File → functions, Function → files, Class → functions/files, and all multi-hop combinations.
Current implementation in graph.rs
Implemented:
File, Class, Function nodes; DECLARES_*; DEPENDS_ON_FILE.
CALLS_FUNCTION for Java methods (intra-class / same-package, based on method_invocation).
Top-level Function nodes for non-Java languages (no call graph yet).
Not yet implemented:
USES_CLASS relationships.
Full Java resolution across classes/interfaces and all non-Java call graphs.
So: conceptually, yes, the schema supports all combinations; practically, your current code already answers many impact questions, especially for Java, but full “all-combinations, all-languages, fully precise” impact would need the USES_CLASS edges and richer call resolution to be added next.
Function → file / function impact
Q1: “If I change OrderDetail.setAmenities, which other functions in the system might break?”
Q2: “Which Java files contain functions that directly or indirectly call OrderDetail.setAmenities?”
Q3: “Show me all functions in CancellationInformationAgent and CaseDetailsAgent that call OrderDetail.setAmenities.”
Q4: “List all call chains ending at OrderDetail.setAmenities, starting from any *Agent class.”
File → file / function / class impact
Q5: “If OrderDetail.java changes, which other Java files are affected through imports and method calls?”
Q6: “Show all Function nodes declared in files that depend on OrderDetail.java.”
Q7: “Which orchestrator files depend (directly or indirectly) on OrderDetail.java?”
Q8: “If FAQAgentNew.java is modified, which utility classes and functions (EmbeddingService, GenericTools, AgentModelConfig, etc.) could be impacted?”
Class → function / file impact
Q9: “If the OrderDetail class changes (fields or behavior), which agent classes are at risk?”
Q10: “List all methods of OrderDetail that are used anywhere in DuringCustomerIssueEscalationAgent and PreCustomerIssueEscalationAgent.”
Q11: “Which files declare functions that use the BusOperatorCancellationResult class?”
Q12: “If I add a new field to BusOperatorCancellationResult, which functions across the codebase currently read or write its fields?”
Mixed file / class / function impact
Q13: “For a change in BusOperatorCancellationResult.isCapiCancellationSuccess, list affected functions in BusOperatorCancellation.java and all *EscalationAgent classes.”
Q14: “Starting from FAQAgentNew methods, show me all functions they call in helpers and utils packages, and the underlying model classes those functions touch.”
Q15: “If I refactor CancellationInformationAgent, which model classes (OrderDetail, CapiCancellationDetails, etc.) and their methods are actually used by its functions?”
Q16: “Given a change in GdsStatusResponse model, list all functions in GdsStatusFetcher and downstream orchestrators that might be impacted.”
Q17: “What parts of the system depend on com.redbus.genai.model.OrderDetail either directly or through other models and utils?”



 cargo run -- /Users/sujal.v/Desktop/omega+capi/omega  --output-json parsed_output.json
sujal.v@SUJAL-V-MAC parser % cargo run -- ../src/main --output-json parsed_output.json --push-to-neo4j


docker run -d \
  --name neo4j-parser \
  -p 7475:7474 \
  -p 7688:7687 \
  -e NEO4J_AUTH=neo4j/parser1234 \
  neo4j:5

  docker run -d \
  --name neo4j-parser-batch-test \
  -p 7476:7474 \
  -p 7689:7687 \
  -e NEO4J_AUTH=neo4j/test1234 \
  neo4j:5
neo4j-parser (auth neo4j/parser1234):
http://localhost:7475/
neo4j-parser-batch-test (auth neo4j/test1234):
http://localhost:7476/

--------------------------------------------------
High-level: what this command does
cargo run -- ../src/main --output-json parsed_output.json --push-to-neo4j
cargo run -- \
  /Users/sujal.v/Desktop/omega+capi/omega \
  --output-json parsed_output_batch_test.json \
  --push-to-neo4j \
  --neo4j-uri bolt://localhost:7689 \
  --neo4j-user neo4j \
  --neo4j-password test1234


1) Scan + parse the repo
- Walk ../src/main recursively and pick supported files (.java, .js, .ts, .tsx, .py, .rs, .go, .erl/.hrl, .cs).
- Read each file and parse it with Tree-Sitter in parallel → build ParsedFile { path, language, tree, source } list.

2) Write parser output to JSON
- For each ParsedFile, write a JsonFileSummary with { path, language, root_sexp }.
- Serialize all summaries into parsed_output.json for inspection/offline use.

3) Push a graph view into Neo4j
- Connect to Neo4j using bolt://localhost:7688, user neo4j, password parser1234.
- For every ParsedFile:
  - MERGE/UPDATE a (:File { path, language }) node.
  - If Java: create (:Class) and (:Function) nodes, DECLARES_* edges, basic CALLS_FUNCTION edges, and DEPENDS_ON_FILE edges based on internal imports.
  - If Erlang: create (:Module), (:Function), (:ApiEndpoint), (:ExternalApi) nodes, DECLARES_* edges, CALLS_FUNCTION, CALLS_EXTERNAL_API, and DEPENDS_ON_FILE edges.
  - Other languages: create top-level (:Function) nodes and (:File)-[:DECLARES_FUNCTION]->(:Function).

Why pushing to Neo4j feels slow
- It processes files one by one and for each file runs multiple separate Cypher queries (no batching), so there are many network round-trips.
- Java/Erlang files generate extra work: classes, functions, imports, endpoints, external APIs, and call-graph edges → more queries per file.
- With ~1500+ files this easily becomes tens of thousands of queries, so total time is dominated by round-trips and Neo4j’s own write cost, especially when Neo4j runs in Docker on a laptop.

Progress logs / stderr
- The Rust CLI prints progress (e.g. "Neo4j: processing file X/Y (...)") to stdout.
- When invoked via the FastMCP server, these lines are captured and forwarded to the MCP process's **stderr** (see parser/mcp/services/parser_service.py).
- Reason: MCP uses stdout strictly for JSON-RPC, so any non-JSON on stdout breaks the protocol. Stderr is the designated channel for human-readable logs and progress.
- To watch parsing / Neo4j progress while using the MCP parse_repository tool, look at:
  - The terminal where you run python parser/mcp/main.py, or
  - Cursor's MCP log / console view for the impact server.




  Production 
  docker run -d --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/myStrongPass123 \
  neo4j:5
  cargo run --   /home/ubuntu/redbus_repositories   --output-json parsed_output_production.json   --push-to-neo4j   --neo4j-uri bolt://10.5.10.12:7687   --neo4j-user neo4j   --neo4j-password myStrongPass123


production
  docker run -d --name neo4j-parser-clone \
  -p 7474:7474 \
  -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/test1234 \
  -v /home/ubuntu/neo4j-data:/data \
  neo4j:5


  sujal.v@SUJAL-V-MAC parser % rsync -avh --progress -e "ssh -i /Users/sujal.v/Desktop/ImpactSenseProduction/rb-capi-qa-key-pair.pem" neo4j-data.tar.gz ubuntu@10.5.10.12:/home/ubuntu/
Transfer starting: 1 files
neo4j-data.tar.gz
       54034053 100%    1.11MB/s   00:00:46 (xfer#1, to-check=0/1)

sent 50545k bytes  received 11112 bytes  1045k bytes/sec
total size is 54034k  speedup is 1.07
sujal.v@SUJAL-V-MAC parser % 





main()
  └─ scan_and_parse(config)                          [scanner.rs]
       ├─ discover_files(config)
       │    └─ language_from_extension(path)          → LanguageId::Erlang for .erl/.hrl
       └─ [parallel via Rayon, per file]:
            ├─ parse_once(Erlang, source)             [lib.rs]
            │    └─ erlang_language()                  [erlang.rs] → FFI to C grammar
            └─ is_test_file(path)                     → bool
                 ↓
            produces ParsedFile { path, Erlang, tree, source, is_test }

  └─ persist_files_to_neo4j(cfg, root, files, clean) [graph.rs]
       └─ [per Erlang file, up to 8 concurrent]:
            ├─ CREATE (:File) node
            └─ persist_erlang_structure(graph, file, source, project)
                 │
                 ├── extract_erlang_module_name(source)
                 │     or guess_erlang_module_name_from_path(path)
                 │   → CREATE (:Module), (:File)-[:DECLARES_MODULE]->(:Module)
                 │
                 ├── extract_erlang_functions(module, source)
                 │   → CREATE (:Function), (:File)-[:DECLARES_FUNCTION]->(:Function)
                 │                          (:Module)-[:DECLARES_FUNCTION]->(:Function)
                 │
                 ├── extract_erlang_api_endpoints(source)
                 │   → CREATE (:ApiEndpoint)-[:HANDLED_BY]->(:Function{init/2, handle/2})
                 │
                 ├── extract_external_http_urls(source)
                 │   → CREATE (:ExternalApi)
                 │     (:Function)-[:CALLS_EXTERNAL_API]->(:ExternalApi)  [all fns × all urls]
                 │
                 ├── extract_erlang_called_function_names(source)
                 │   → CREATE (:Function)-[:CALLS_FUNCTION]->(:Function) [intra-module only]
                 │
                 └── extract_erlang_called_modules(source)
                       └── guess_erlang_file_path_from_module(path, mod)
                     → CREATE (:File)-[:DEPENDS_ON_FILE]->(:File)