impactsense-parser 0.1.0

# ImpactSense Parser - Critical Analysis & Shortcomings
# =====================================================
# Date: February 2026
# Scope: C#, Erlang, Go, Java parsing and Neo4j integration

================================================================================
EXECUTIVE SUMMARY
================================================================================

The ImpactSense project parses source code using Tree-Sitter and builds a
dependency graph in Neo4j. It supports Java, C#, Erlang, Go, JavaScript,
TypeScript, Python, and Rust. This document outlines critical shortcomings
that limit accuracy and completeness for real-world impact analysis.

================================================================================
1. JAVA PARSING SHORTCOMINGS
================================================================================

[HIGH] Hardcoded Namespace Filter
---------------------------------
Location: graph.rs lines 1211-1235
Problem: Only imports starting with "com.redbus.genai.*" are tracked.
         All other internal packages are completely ignored.
Impact: Cross-package dependencies outside this namespace are invisible.
Fix: Make the namespace filter configurable or remove it entirely.

[HIGH] No Inheritance/Interface Tracking
----------------------------------------
Problem: Class hierarchies (extends) and interface implementations are not
         tracked. No EXTENDS or IMPLEMENTS relationship types exist.
Impact: Cannot analyze impact when base class or interface changes.
Fix: Add AST traversal for extends/implements clauses.

[MEDIUM] Wildcard Imports Ignored
---------------------------------
Problem: "import com.redbus.genai.utils.*" won't resolve individual types.
Impact: Type resolution fails for wildcard-imported classes.
Fix: Expand wildcards by scanning the target package directory.

[MEDIUM] No Annotation Processing
---------------------------------
Problem: Annotations beyond Spring mappings (@GetMapping, etc.) are ignored.
         Examples: @Autowired, @Override, @Transactional
Impact: Framework-specific impact analysis is incomplete.
Fix: Add annotation relationship tracking.

[MEDIUM] No Constructor Tracking
--------------------------------
Problem: Constructor declarations and invocations are not captured.
Impact: Object creation dependencies are missed.
Fix: Handle "constructor_declaration" AST nodes.

[MEDIUM] No Field Tracking
--------------------------
Problem: Field declarations and field access patterns not tracked.
Impact: Data flow analysis is impossible.
Fix: Add FIELD_OF and ACCESSES_FIELD relationships.

[LOW] No Lambda/Method Reference Handling
-----------------------------------------
Problem: Modern Java patterns like stream.map(OrderDetail::getName) miss
         call graph edges.
Impact: Functional-style code has incomplete call graphs.
Fix: Handle "method_reference" and "lambda_expression" AST nodes.

================================================================================
2. C# PARSING SHORTCOMINGS
================================================================================

[HIGH] No "using" Statement Tracking
------------------------------------
Location: graph.rs - persist_csharp_structure function
Problem: No DEPENDS_ON_FILE edges are created for C# files.
         Unlike Java, C# imports are completely ignored.
Impact: File-level dependency analysis is non-functional for C#.
Fix: Parse "using" directives and create dependency edges.

[HIGH] No Cross-Namespace Resolution
------------------------------------
Problem: Type FQN resolution only uses the current namespace.
         Types from other namespaces via "using" are not resolved.
Impact: USES_CLASS edges point to non-existent types.
Fix: Build import map from "using" statements like Java implementation.

[HIGH] Oversimplified Type Resolution
-------------------------------------
Location: graph.rs - extract_csharp_used_classes function
Problem: Assumes all types are in the same namespace as the current file.
Impact: Many false FQNs are generated.
Fix: Implement proper namespace resolution with using directives.

[MEDIUM] No Partial Class Support
---------------------------------
Problem: "partial class" declarations across files are not unified.
Impact: Class appears as multiple separate entities.
Fix: Merge partial class nodes based on FQN.

[MEDIUM] Attribute Detection is Line-Based
------------------------------------------
Location: graph.rs - extract_csharp_api_endpoints function
Problem: Multi-line attributes are missed because parsing is line-by-line.
Impact: API endpoints with complex attributes are not detected.
Fix: Use AST-based attribute extraction.

[LOW] No Async/Await Tracking
-----------------------------
Problem: Async call chains are not specially handled.
Impact: Task continuations are invisible in call graph.
Fix: Track async method patterns.

================================================================================
3. GO PARSING SHORTCOMINGS
================================================================================

[HIGH] No Import Statement Tracking
-----------------------------------
Location: graph.rs - persist_go_structure function
Problem: No DEPENDS_ON_FILE edges are created for Go files.
Impact: Cannot trace dependencies between Go files/packages.
Fix: Parse import statements and resolve to file paths.

[HIGH] No Cross-Package Resolution
----------------------------------
Problem: Calls to other packages (e.g., utils.Helper()) aren't resolved
         to their declaring file or function.
Impact: Cross-package call graph is broken.
Fix: Build import map and resolve package-qualified calls.

[MEDIUM] Package-Only FQN
-------------------------
Problem: FQN is "package.FuncName" but Go's actual FQN includes module
         path (e.g., "github.com/org/repo/pkg.Func").
Impact: FQNs are not globally unique across projects.
Fix: Parse go.mod for module path and use full import paths.

[MEDIUM] Interface Implementation Not Tracked
---------------------------------------------
Problem: Go's implicit interface satisfaction is not detected.
Impact: Cannot answer "which types implement interface X?"
Fix: Compare method signatures against interface definitions.

[MEDIUM] No Embedded Struct Tracking
------------------------------------
Problem: Struct composition via embedding is ignored.
Impact: Embedded method promotion is invisible.
Fix: Track embedded types and create composition relationships.

[LOW] No init() Function Handling
---------------------------------
Problem: Special init() functions are not distinguished.
Impact: Initialization order analysis is impossible.
Fix: Tag init functions with special property.

================================================================================
4. ERLANG PARSING SHORTCOMINGS
================================================================================

[HIGH] Textual Parsing Instead of AST
-------------------------------------
Location: graph.rs - extract_erlang_functions function
Problem: Uses regex/string parsing instead of Tree-Sitter AST.
Impact: Many valid Erlang constructs are missed or misparsed.
Fix: Use Tree-Sitter Erlang grammar for all extraction.

[HIGH] No Macro Expansion
-------------------------
Problem: Macros like ?MODULE or ?FUNCTION_NAME are not resolved.
Impact: Macro-heavy code has broken references.
Fix: Pre-process macro definitions or track macro usage.

[MEDIUM] Inaccurate Arity Calculation
-------------------------------------
Location: graph.rs lines 1354-1361
Problem: Nested tuples/lists in parameters may count incorrectly.
         Simply counting commas doesn't handle nested structures.
Impact: Function FQNs may have wrong arity.
Fix: Properly parse parameter structure.

[MEDIUM] No -spec Type Extraction
---------------------------------
Problem: Function types from -spec annotations are ignored.
Impact: return_type and param_types properties are always empty.
Fix: Parse -spec directives and populate type properties.

[MEDIUM] No -behaviour Detection
--------------------------------
Problem: OTP behaviors (gen_server, supervisor) are not tracked.
Impact: Cannot query for all gen_servers or supervisors.
Fix: Parse -behaviour attributes and create relationship.

[MEDIUM] Conservative Call Graph Over-Approximation
---------------------------------------------------
Location: graph.rs lines 1099-1111
Problem: If a function name appears ANYWHERE in the module, a call edge
         is created from EVERY function to it.
Impact: Many false positive call edges.
Fix: Track call sites within function bodies specifically.

================================================================================
5. SCHEMA & RELATIONSHIP SHORTCOMINGS
================================================================================

[HIGH] Missing EXTENDS/IMPLEMENTS Edges
---------------------------------------
Problem: No class hierarchy tracking for any language.
Impact: Cannot analyze impact when base class changes.
Fix: Add EXTENDS and IMPLEMENTS relationship types.

[HIGH] DEPENDS_ON_FILE Only for Java/Erlang
-------------------------------------------
Problem: C# and Go have no file-level dependency edges.
Impact: File impact analysis fails for 2 of 4 main languages.
Fix: Implement import tracking for C# and Go.

[MEDIUM] No ANNOTATED_WITH Relationship
---------------------------------------
Problem: Annotations/attributes are not tracked as relationships.
Impact: Cannot query "all @Transactional methods" etc.
Fix: Add annotation relationship type.

[MEDIUM] No FIELD_ACCESS Relationship
-------------------------------------
Problem: No tracking of which fields are read/written by which functions.
Impact: Data flow analysis is impossible.
Fix: Track field access patterns.

[MEDIUM] No OVERRIDES Relationship
----------------------------------
Problem: Method overrides in class hierarchies are not tracked.
Impact: Cannot trace polymorphic dispatch.
Fix: Add override detection for OOP languages.

================================================================================
6. PERFORMANCE SHORTCOMINGS
================================================================================

[HIGH] Non-Batched Node Creation
--------------------------------
Location: graph.rs - persist_java_structure, persist_csharp_structure, etc.
Problem: Each class/function node is created with a separate Cypher query.
         Only relationships use batching (BATCH_FLUSH_THRESHOLD = 3000).
Impact: Very slow for large codebases (10000+ files).
Fix: Batch node creation similar to relationship batching.

[MEDIUM] Sequential File Processing
-----------------------------------
Location: graph.rs - persist_files_to_neo4j function
Problem: Despite parallel parsing, Neo4j writes are sequential per file.
Impact: Underutilizes Neo4j's write capacity.
Fix: Use concurrent writes with connection pooling.

[MEDIUM] No Transaction Batching
--------------------------------
Problem: Each query runs in its own implicit transaction.
Impact: High transaction overhead.
Fix: Group multiple queries into explicit transactions.

[MEDIUM] No Index Creation
--------------------------
Problem: No explicit indexes on frequently-queried properties.
Impact: Slow queries on large graphs.
Fix: Create indexes on fqn, path, and name properties.

================================================================================
7. CONSERVATIVE OVER-APPROXIMATION
================================================================================

[HIGH] External API Linking Too Broad
-------------------------------------
Location: graph.rs lines 634-641
Problem: ALL methods in a file are linked to every external API URL found
         anywhere in that file's source code.
Impact: Massive false positives in CALLS_EXTERNAL_API edges.
Fix: Track which function body contains the URL reference.

[RESOLVED] Erlang Intra-Module Calls
------------------------------------
Previous problem: Every function was assumed to potentially call every other
                  function whose name appeared as a call site anywhere in the
                  module.
Current behavior: CALLS_FUNCTION edges are derived from AST call nodes and
                  mapped to enclosing function_clause caller + concrete callee
                  signature/arity.
Result: N*M over-approximation removed for intra-module call graph edges.

================================================================================
8. MISSING FEATURES
================================================================================

[HIGH] No Incremental Parsing
-----------------------------
Problem: Must re-parse entire codebase on each run.
Impact: Very slow for large codebases with small changes.
Fix: Track file hashes and only re-parse changed files.

[HIGH] No Git Integration
-------------------------
Problem: Cannot determine what changed between commits.
Impact: Cannot answer "what is impacted by this PR?"
Fix: Integrate with git diff to identify changed files.

[MEDIUM] No Dead Code Detection
-------------------------------
Problem: Cannot identify unused functions/classes.
Impact: Cannot clean up codebase safely.
Fix: Add reachability analysis from entry points.

[MEDIUM] No Cyclic Dependency Detection
---------------------------------------
Problem: Graph has no built-in cycle detection.
Impact: Cannot identify problematic circular dependencies.
Fix: Add cycle detection algorithm.

[MEDIUM] No Call Depth Tracking
-------------------------------
Problem: No way to query "functions within N calls of X".
Impact: Hard to scope impact analysis.
Fix: Add depth property to call graph traversals.

================================================================================
9. LANGUAGE SUPPORT MATRIX
================================================================================

Language    | Parsing | Classes | Call Graph | Dependencies | Endpoints
------------|---------|---------|------------|--------------|----------
Java        | ✓       | ✓       | Partial    | Partial*     | ✓
C#          | ✓       | ✓       | Partial    | NONE         | ✓
Go          | ✓       | ✓       | Partial    | NONE         | ✓
Erlang      | Text**  | Module  | Approx     | ✓            | ✓
JavaScript  | ✓       | NONE    | NONE       | NONE         | NONE
TypeScript  | ✓       | NONE    | NONE       | NONE         | NONE
Python      | ✓       | NONE    | NONE       | NONE         | NONE
Rust        | ✓       | NONE    | NONE       | NONE         | NONE

* Java dependencies only for com.redbus.genai.* namespace
** Erlang uses text parsing instead of AST

================================================================================
10. SECURITY CONCERNS
================================================================================

[HIGH] Hardcoded Default Credentials
------------------------------------
Location: main.rs - CLI argument defaults
Problem: Default Neo4j password is "parser1234"
Impact: Accidental exposure of development credentials.
Fix: Remove default password, require explicit configuration.

[MEDIUM] Credentials in Command Line
------------------------------------
Problem: Passwords passed as CLI args are visible in process list.
Impact: Credential exposure in multi-user systems.
Fix: Support environment variables or config files for credentials.

[MEDIUM] No SSL/TLS Configuration
---------------------------------
Problem: Only "bolt://" protocol, no "bolt+s://" option.
Impact: Credentials sent in clear text on network.
Fix: Add SSL/TLS configuration options.

================================================================================
11. TESTING GAPS
================================================================================

[HIGH] Minimal Test Coverage
----------------------------
Location: lib.rs lines 158-170
Problem: Only ONE trivial test exists in entire codebase.
Impact: No confidence in correctness of any extraction logic.
Fix: Add comprehensive tests for:
     - Each language's symbol extraction
     - Call graph accuracy
     - Neo4j integration
     - Edge case handling
     - API endpoint detection

================================================================================
12. PRIORITY RECOMMENDATIONS
================================================================================

HIGH PRIORITY (Critical for Production Use):
1. Remove hardcoded namespace filter for Java imports
2. Add "using" statement tracking for C#
3. Add "import" tracking for Go
4. Use Tree-Sitter AST for Erlang instead of text parsing
5. Add class inheritance/interface tracking for all OOP languages
6. Create Neo4j indexes for fqn, path, and name properties
7. Batch node creation along with relationship creation
8. Add comprehensive test suite

MEDIUM PRIORITY (Significant Improvements):
1. Add incremental parsing with file hash tracking
2. Implement git diff integration for change detection
3. Implement proper FQN resolution using import maps
4. Add field access tracking for data flow analysis
5. Narrow external API linking to specific function bodies
6. Add EXTENDS/IMPLEMENTS relationship types

LOW PRIORITY (Nice to Have):
1. Add visualization support
2. Build pre-made impact analysis queries
3. Add dead code detection
4. Support for annotation/decorator tracking
5. Lambda and method reference handling

================================================================================
END OF DOCUMENT
================================================================================