impactsense-parser 0.1.0

Multi-language static analysis: parse codebases into an in-memory dependency graph for impact analysis
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
## Graph schema: node declarations and relationships

This document describes the Neo4j graph produced by the ImpactSense parser
(`src/graph.rs`). The schema is language- and framework-agnostic so Java,
Erlang, C#, Go, JS/TS, Python, and Rust can share one graph model.

**Source of truth in code:** `src/schema.rs` (node labels/properties),
`src/edge.rs` (relationship types), `src/ir.rs` (serialized IR), and
`persist_files_to_neo4j` in `src/graph.rs`.

---

## Stable keys (MERGE / idempotency)

| Label         | Stable key                          |
|---------------|-------------------------------------|
| `File`        | `path`                              |
| `Module`      | `name` + `path`                     |
| `Class`       | `fqn`                               |
| `Property`    | `fqn`                               |
| `Function`    | `fqn`                               |
| `Behaviour`   | `name`                              |
| `Callback`    | `fqn`                               |
| `ApiEndpoint` | `path`                              |
| `ExternalApi` | `base_url` + `norm_path` (batched writes) |

C# partial types (CRM-3591): `path` on `Class`, `Property`, and `Function`
is set on **ON CREATE** only; later partial files refresh metadata on
**ON MATCH** but do not overwrite `path`.

---

## File node

- **Label**: `File`
- **Purpose**: One source file in a repository.
- **Properties**:
  - **`path`** (string, required): Repo-relative path.
  - **`language`** (string, required): e.g. `"erlang"`, `"java"`, `"c_sharp"`,
    `"typescript"`, `"python"`, `"rust"`, `"go"`, `"javascript"`.
  - **`framework`** (string, optional): e.g. `"cowboy"`, `"spring"`.
  - **`project_name`** (string, optional): Logical service/repo name.
  - **`is_test`** (boolean, optional): Test file detected from path/name
    heuristics (see [Test file detection]#test-file-detection).

---

## Module node

- **Label**: `Module`
- **Purpose**: Logical grouping of functions (primarily Erlang).
- **Properties**:
  - **`name`** (string, required): e.g. `"omega_app"`.
  - **`path`** (string, required): Declaring file path.
  - **`language`** (string, required): Usually `"erlang"`.
  - **`framework`** (string, optional): e.g. `"cowboy"`.
  - **`project_name`** (string, optional).
  - **`code_bytes`** (byte array, optional): Zstd-compressed full module
    source (RedCompressor wire format).

---

## Class node

- **Label**: `Class`
- **Purpose**: Named type — Java/C# class or interface, Go struct, C#
  struct/enum/record.
- **Properties**:
  - **`fqn`** (string, required): Fully qualified name.
  - **`name`** (string, required): Simple type name.
  - **`path`**, **`language`**, **`project_name`**: Same role as on `File`.
  - **`kind`** (string, optional): C# — `"class"`, `"interface"`, `"struct"`,
    `"enum"`, `"record"`.
  - **`code_bytes`** (byte array, optional): Zstd-compressed type declaration.

---

## Property node

- **Label**: `Property`
- **Purpose**: C# property (CRM-3587). Getters/setters are separate
  `Function` nodes for call-graph alignment.
- **Properties**:
  - **`fqn`** (string, required): Typically `{ClassFQN}.{propertyName}`.
  - **`name`** (string, required).
  - **`path`**, **`language`**, **`project_name`**: Same role as on `Class`.
  - **`declared_type`** (string, optional): Raw type text from declaration.
  - **`code_bytes`** (byte array, optional): Zstd-compressed property snippet.

---

## Function node

- **Label**: `Function`
- **Purpose**: Callable function or method.
- **Properties**:
  - **`name`** (string, required).
  - **`fqn`** (string, required):
    - Erlang: `"module:name/arity"`.
    - Java: `"com.example.pkg.ClassName.methodName"`.
    - C#: `"MyApp.Services.OrderService.GetOrder"`.
    - Python/JS/TS/Rust: file-scoped FQN from path + logical name.
  - **`path`** (string, required).
  - **`language`** (string, required).
  - **`framework`** (string, optional).
  - **`project_name`** (string, optional).
  - **`arity`** (integer, optional): Erlang always set; others optional.
  - **`return_type`** (string, optional).
  - **`param_count`** (integer, optional).
  - **`param_types`** (list of string, optional).
  - **`modifiers`** (list of string, optional): e.g. `["public", "static", "async"]` (C#).
  - **`code_bytes`** (byte array, optional): Zstd-compressed body snippet via
    RedCompressor `POST /v1/compress`.

---

## Behaviour node

- **Label**: `Behaviour`
- **Purpose**: OTP or custom behaviour contract (Erlang), e.g. `gen_server`.
- **Properties**:
  - **`name`** (string, required): Behaviour name.
  - **`path`** (string, optional): File declaring a custom behaviour.
  - **`language`** (string, optional).
  - **`project_name`** (string, optional).

---

## Callback node

- **Label**: `Callback`
- **Purpose**: Callback contract declared by a behaviour (Erlang OTP).
- **Properties**:
  - **`name`** (string, required): e.g. `"handle_call"`.
  - **`fqn`** (string, required): e.g. `"gen_server:handle_call/3"`.
  - **`arity`** (integer, required).
  - **`optional`** (boolean): Whether the callback is optional for the behaviour.
  - **`language`** (string, optional).
  - **`project_name`** (string, optional).

---

## ApiEndpoint node

- **Label**: `ApiEndpoint`
- **Purpose**: HTTP/RPC endpoint exposed by the system.
- **Properties**:
  - **`methods`** (list of string, required): e.g. `["GET"]`, `["GET","POST"]`.
  - **`path`** (string, required): Canonical path template.
  - **`norm_path`** (string, optional): Normalized path for cross-service matching.
  - **`protocol`** (string, optional): e.g. `"http"`, `"https"`.
  - **`framework`** (string, optional): e.g. `"cowboy"`, `"spring"`, `"gin"`.
  - **`project_name`** (string, optional).

Connected to handlers via `(:ApiEndpoint)-[:HANDLED_BY]->(:Function)`.

---

## ExternalApi node

- **Label**: `ExternalApi`
- **Purpose**: Remote API or service called from internal code.
- **Properties**:
  - **`name`** (string, optional): Logical name; often derived from host.
  - **`base_url`** (string, optional): e.g. `"https://api.vendorx.com"`.
  - **`path`** (string, optional): URL path component.
  - **`norm_path`** (string, optional): Normalized path for `SAME_API` matching.
  - **`protocol`** (string, optional).
  - **`provider`** (string, optional).
  - **`service`** (string, optional).

**MERGE key in batched writes:** `{ base_url, norm_path }`.

Connected from functions via `(:Function)-[:CALLS_EXTERNAL_API]->(:ExternalApi)`.

---

## Relationship types

Enumerated in `src/edge.rs` as `RelType`. Note: `Function→Class` and
`Class→Class` both use the relationship name **`USES_CLASS`** — distinguish
by node labels in Cypher.

### Structural

| Relationship       | From      | To        | Meaning                                      |
|--------------------|-----------|-----------|----------------------------------------------|
| `DECLARES_MODULE`  | File      | Module    | File declares module (Erlang)                |
| `DECLARES_CLASS`   | File      | Class     | File declares type                           |
| `DECLARES_FUNCTION`| File      | Function  | File declares top-level function             |
| `DECLARES_FUNCTION`| Class     | Function  | Class declares method                        |
| `DECLARES_FUNCTION`| Module    | Function  | Module groups functions (Erlang)             |
| `DECLARES_PROPERTY`| Class     | Property  | Class declares C# property                   |

### Dependencies and call graph

| Relationship        | From     | To          | Meaning                                     |
|---------------------|----------|-------------|---------------------------------------------|
| `DEPENDS_ON_FILE`   | File     | File        | Import/include/using dependency             |
| `DEPENDS_ON_FILE`   | Module   | File        | Erlang module depends on another module file|
| `CALLS_FUNCTION`    | Function | Function    | Call graph edge                             |
| `USES_CLASS`        | Function | Class       | Function references a type                  |
| `USES_CLASS`        | Class    | Class       | Inheritance / interface (Java, C#) or Go embedding |

### API and external systems

| Relationship         | From        | To          | Meaning                                    |
|----------------------|-------------|-------------|--------------------------------------------|
| `HANDLED_BY`         | ApiEndpoint | Function    | Endpoint handled by function               |
| `CALLS_EXTERNAL_API` | Function    | ExternalApi | Function calls external service            |
| `SAME_API`           | ApiEndpoint | ExternalApi | Internal endpoint matches external call (post-process, by `norm_path`) |

### Erlang OTP (behaviour contracts)

| Relationship          | From      | To        | Meaning                                  |
|-----------------------|-----------|-----------|------------------------------------------|
| `IMPLEMENTS_BEHAVIOUR`| Module    | Behaviour | Module implements OTP behaviour          |
| `DECLARES_BEHAVIOUR`  | File      | Behaviour | File declares custom behaviour           |
| `DECLARES_CALLBACK`   | Behaviour | Callback  | Behaviour declares callback contract     |
| `IMPLEMENTS_CALLBACK` | Function  | Callback  | Function implements callback             |
| `EXTENDS_BEHAVIOUR`   | Behaviour | Behaviour | Behaviour extends another                |
| `OVERRIDES_CALLBACK`  | Function  | Callback  | Function explicitly overrides callback   |

---

## Post-processing: SAME_API

After all files are persisted, one global Cypher pass runs:

```cypher
MATCH (ep:ApiEndpoint)
MATCH (ext:ExternalApi)
WHERE ep.norm_path IS NOT NULL
  AND ext.norm_path IS NOT NULL
  AND ep.norm_path = ext.norm_path
MERGE (ep)-[:SAME_API]->(ext)
```

---

## Incremental graph updates

On incremental parse (`cleanup_incremental_targets_in_neo4j`), nodes with
matching `path` are deleted before re-upsert:

- `File`, `Module`, `Class`, `Function` where `path` is in `cleanup_targets`
  (deleted, modified, or renamed-old paths from git delta).

Property nodes tied to a deleted class may remain until the class path is
cleaned; C# properties use class linkage via `DECLARES_PROPERTY`.

---

## code_bytes compression

When RedCompressor is enabled (`--compress-codeblocks` or `--push-to-neo4j`),
symbol snippets are compressed via `POST /v1/compress` and stored as
`code_bytes` on:

- `Module`, `Class`, `Property`, `Function` (language-dependent coverage)

Default compressor URL: `http://10.166.1.220:8787` (override via
`REDCOMPRESSOR_URL` or `--compressor-url`).

---

## Language coverage matrix

| Capability              | Java | C# | Go | Erlang | Python | JS/TS | Rust |
|-------------------------|------|----|----|--------|--------|-------|------|
| File node               | Yes  | Yes| Yes| Yes    | Yes    | Yes   | Yes  |
| Class / struct          | Yes  | Yes| Yes|||||
| Property                || Yes||||||
| Function (methods)      | Yes  | Yes| Yes| Yes    | Top-level | Top-level | Top-level |
| Module                  |||| Yes    ||||
| Behaviour / Callback    |||| Yes    ||||
| ApiEndpoint             | Yes  | Yes| Yes| Yes    ||||
| ExternalApi             | Yes  | Yes| Yes| Yes    ||||
| DEPENDS_ON_FILE         | Partial* | Yes** | Yes*** | Yes | Yes | Yes ||
| CALLS_FUNCTION          | Partial| Partial| Partial| Yes**** | Intra-file | Intra-file ||
| USES_CLASS (Function)   | Yes  | Yes| Yes|||||
| USES_CLASS (Class)      | Yes  | Yes| Embed|||||
| code_bytes              | Yes  | Yes| Yes| Yes    | Yes    | Yes   | Yes  |

\* Java: imports filtered to `com.redbus.genai.*` only; target must be in scanned file set.

\** C#: from `using` namespaces (excludes `System.*` / `Microsoft.*`); resolved via batch namespace index.

\*** Go: from `import` statements resolved via `go.mod` module path and `replace` directives.

\**** Erlang: intra-module AST call sites; not cross-module unless resolved by FQN.

---

## Erlang / Cowboy walker

Extraction uses the Tree-Sitter Erlang AST (not line-based regex).

- **Module**: from `-module(Name).` AST or file stem fallback.
- **Function**: from top-level `function_clause` nodes; FQN `module:name/arity`.
- **CALLS_FUNCTION**: from AST call sites → enclosing caller + callee arity
  (no module-wide N×M over-approximation).
- **DEPENDS_ON_FILE**: from remote module calls; also
  `(:Module)-[:DEPENDS_ON_FILE]->(:File)` for behaviour/module deps.
- **OTP**: `-behaviour`, callback declarations, `IMPLEMENTS_BEHAVIOUR`,
  `DECLARES_CALLBACK`, `IMPLEMENTS_CALLBACK`, `EXTENDS_BEHAVIOUR`,
  `DECLARES_BEHAVIOUR`, `OVERRIDES_CALLBACK`.
- **ApiEndpoint**: Cowboy route tuples `{"/path", handler_module, ...}`.
- **ExternalApi**: HTTP(S) URL literals; **CALLS_EXTERNAL_API** links all
  functions in the module to each external URL found (conservative over-approximation).

---

## Java / Spring walker

- **Class / Function**: AST `class_declaration`, `interface_declaration`,
  `method_declaration`; FQN includes package.
- **ApiEndpoint**: Spring annotations (`@GetMapping`, `@PostMapping`,
  `@RequestMapping`, etc.); class-level base path supported.
- **DEPENDS_ON_FILE**: `import com.redbus.genai.*` only (configurable filter in code).
- **CALLS_FUNCTION**: method invocations with same-class / import-aware resolution.
- **USES_CLASS**: type references on functions; **Class→Class** for inheritance
  and `@Autowired`-style injected dependencies.
- **ExternalApi**: HTTP URL literals; linked from functions containing the URL.

---

## C# / ASP.NET walker

- **Namespace**: AST `namespace_declaration` / `file_scoped_namespace_declaration`.
- **Class / Property / Function**: batched MERGE; nested types supported;
  enums as `Class { kind: "enum" }`.
- **DEPENDS_ON_FILE**: from `using` namespace imports (non-system).
- **CALLS_FUNCTION**: invocations in methods, constructors, property accessors.
- **USES_CLASS**: type resolution via usings + C# batch index; inheritance via
  `base_list` (`Class→Class USES_CLASS`).
- **ApiEndpoint**: `[HttpGet]`, `[HttpPost]`, `[Route]`, etc.
- **ExternalApi**: HTTP URLs; **CALLS_EXTERNAL_API** scoped to functions whose
  body spans contain the URL.

---

## Go walker

- **Class** (struct): `type_declaration` / struct; FQN `package.StructName`.
- **Function**: top-level and methods with receiver in FQN.
- **DEPENDS_ON_FILE**: imports resolved via `go.mod` + known scanned paths.
- **CALLS_FUNCTION**: function/method calls; goroutine-spawned calls tracked separately.
- **USES_CLASS**: struct type usage; embedded structs via `Class→Class USES_CLASS`.
- **ApiEndpoint**: `http.HandleFunc`, Chi, Gin, Echo patterns.
- **ExternalApi**: HTTP URL literals.

---

## Python / JavaScript / TypeScript / Rust walker

Handled by `persist_non_java_functions`:

- **Function**: top-level functions only (Python skips functions inside classes).
- **DEPENDS_ON_FILE**: Python imports and JS/TS import specifiers resolved to
  known scanned files.
- **CALLS_FUNCTION**: intra-file calls (Python, JS/TS).
- **code_bytes**: supported when compression enabled.
- No `Class`, `ApiEndpoint`, or `ExternalApi` extraction for these languages today.

---

## Test file detection

Files flagged `is_test: true` based on:

- **Directories**: `/test/`, `/tests/`, `/__tests__/`, `/spec/`, `/src/test/`, `/t/`.
- **Filenames**: `test_*.py`, `*_test.go`, `*Test.java`, `*.spec.ts`, etc.

```cypher
MATCH (f:File {is_test: false})-[:DECLARES_FUNCTION]->(fn:Function)
RETURN fn.fqn, fn.path
```

---

## Example impact queries

**Functions calling a target method:**

```cypher
MATCH (caller:Function)-[:CALLS_FUNCTION]->(target:Function {name: "setAmenities"})
WHERE target.fqn CONTAINS "OrderDetail"
RETURN caller.fqn, caller.path
```

**Files depending on a changed file:**

```cypher
MATCH (f:File)-[:DEPENDS_ON_FILE]->(dep:File)
WHERE dep.path CONTAINS "OrderDetail.java"
RETURN f.path
```

**API endpoint to handler:**

```cypher
MATCH (ep:ApiEndpoint)-[:HANDLED_BY]->(fn:Function)
RETURN ep.path, ep.methods, fn.fqn
```

**Cross-service API link:**

```cypher
MATCH (ep:ApiEndpoint)-[:SAME_API]->(ext:ExternalApi)
RETURN ep.path, ext.base_url, ext.norm_path
```

**Erlang module implementing a behaviour:**

```cypher
MATCH (m:Module)-[:IMPLEMENTS_BEHAVIOUR]->(b:Behaviour {name: "gen_server"})
RETURN m.name, m.path
```

---

## Persistence notes

- Relationship writes are batched (`BATCH_FLUSH_THRESHOLD = 3000`).
- C# class/property/function nodes use batched MERGE (`CSHARP_NODE_BATCH_FLUSH_THRESHOLD = 500`).
- Erlang module writes run concurrently (capped in-flight tasks).
- Bootstrap mode supports `--clean` (`MATCH (n) DETACH DELETE n`).