## Graph schema: node declarations and relationships
This document describes the Neo4j graph produced by the ImpactSense parser
(`src/graph.rs`). The schema is language- and framework-agnostic so Java,
Erlang, C#, Go, JS/TS, Python, and Rust can share one graph model.
**Source of truth in code:** `src/schema.rs` (node labels/properties),
`src/edge.rs` (relationship types), `src/ir.rs` (serialized IR), and
`persist_files_to_neo4j` in `src/graph.rs`.
---
## Stable keys (MERGE / idempotency)
| `File` | `path` |
| `Module` | `name` + `path` |
| `Class` | `fqn` |
| `Property` | `fqn` |
| `Function` | `fqn` |
| `Behaviour` | `name` |
| `Callback` | `fqn` |
| `ApiEndpoint` | `path` |
| `ExternalApi` | `base_url` + `norm_path` (batched writes) |
C# partial types (CRM-3591): `path` on `Class`, `Property`, and `Function`
is set on **ON CREATE** only; later partial files refresh metadata on
**ON MATCH** but do not overwrite `path`.
---
## File node
- **Label**: `File`
- **Purpose**: One source file in a repository.
- **Properties**:
- **`path`** (string, required): Repo-relative path.
- **`language`** (string, required): e.g. `"erlang"`, `"java"`, `"c_sharp"`,
`"typescript"`, `"python"`, `"rust"`, `"go"`, `"javascript"`.
- **`framework`** (string, optional): e.g. `"cowboy"`, `"spring"`.
- **`project_name`** (string, optional): Logical service/repo name.
- **`is_test`** (boolean, optional): Test file detected from path/name
heuristics (see [Test file detection](#test-file-detection)).
---
## Module node
- **Label**: `Module`
- **Purpose**: Logical grouping of functions (primarily Erlang).
- **Properties**:
- **`name`** (string, required): e.g. `"omega_app"`.
- **`path`** (string, required): Declaring file path.
- **`language`** (string, required): Usually `"erlang"`.
- **`framework`** (string, optional): e.g. `"cowboy"`.
- **`project_name`** (string, optional).
- **`code_bytes`** (byte array, optional): Zstd-compressed full module
source (RedCompressor wire format).
---
## Class node
- **Label**: `Class`
- **Purpose**: Named type — Java/C# class or interface, Go struct, C#
struct/enum/record.
- **Properties**:
- **`fqn`** (string, required): Fully qualified name.
- **`name`** (string, required): Simple type name.
- **`path`**, **`language`**, **`project_name`**: Same role as on `File`.
- **`kind`** (string, optional): C# — `"class"`, `"interface"`, `"struct"`,
`"enum"`, `"record"`.
- **`code_bytes`** (byte array, optional): Zstd-compressed type declaration.
---
## Property node
- **Label**: `Property`
- **Purpose**: C# property (CRM-3587). Getters/setters are separate
`Function` nodes for call-graph alignment.
- **Properties**:
- **`fqn`** (string, required): Typically `{ClassFQN}.{propertyName}`.
- **`name`** (string, required).
- **`path`**, **`language`**, **`project_name`**: Same role as on `Class`.
- **`declared_type`** (string, optional): Raw type text from declaration.
- **`code_bytes`** (byte array, optional): Zstd-compressed property snippet.
---
## Function node
- **Label**: `Function`
- **Purpose**: Callable function or method.
- **Properties**:
- **`name`** (string, required).
- **`fqn`** (string, required):
- Erlang: `"module:name/arity"`.
- Java: `"com.example.pkg.ClassName.methodName"`.
- C#: `"MyApp.Services.OrderService.GetOrder"`.
- Python/JS/TS/Rust: file-scoped FQN from path + logical name.
- **`path`** (string, required).
- **`language`** (string, required).
- **`framework`** (string, optional).
- **`project_name`** (string, optional).
- **`arity`** (integer, optional): Erlang always set; others optional.
- **`return_type`** (string, optional).
- **`param_count`** (integer, optional).
- **`param_types`** (list of string, optional).
- **`modifiers`** (list of string, optional): e.g. `["public", "static", "async"]` (C#).
- **`code_bytes`** (byte array, optional): Zstd-compressed body snippet via
RedCompressor `POST /v1/compress`.
---
## Behaviour node
- **Label**: `Behaviour`
- **Purpose**: OTP or custom behaviour contract (Erlang), e.g. `gen_server`.
- **Properties**:
- **`name`** (string, required): Behaviour name.
- **`path`** (string, optional): File declaring a custom behaviour.
- **`language`** (string, optional).
- **`project_name`** (string, optional).
---
## Callback node
- **Label**: `Callback`
- **Purpose**: Callback contract declared by a behaviour (Erlang OTP).
- **Properties**:
- **`name`** (string, required): e.g. `"handle_call"`.
- **`fqn`** (string, required): e.g. `"gen_server:handle_call/3"`.
- **`arity`** (integer, required).
- **`optional`** (boolean): Whether the callback is optional for the behaviour.
- **`language`** (string, optional).
- **`project_name`** (string, optional).
---
## ApiEndpoint node
- **Label**: `ApiEndpoint`
- **Purpose**: HTTP/RPC endpoint exposed by the system.
- **Properties**:
- **`methods`** (list of string, required): e.g. `["GET"]`, `["GET","POST"]`.
- **`path`** (string, required): Canonical path template.
- **`norm_path`** (string, optional): Normalized path for cross-service matching.
- **`protocol`** (string, optional): e.g. `"http"`, `"https"`.
- **`framework`** (string, optional): e.g. `"cowboy"`, `"spring"`, `"gin"`.
- **`project_name`** (string, optional).
Connected to handlers via `(:ApiEndpoint)-[:HANDLED_BY]->(:Function)`.
---
## ExternalApi node
- **Label**: `ExternalApi`
- **Purpose**: Remote API or service called from internal code.
- **Properties**:
- **`name`** (string, optional): Logical name; often derived from host.
- **`base_url`** (string, optional): e.g. `"https://api.vendorx.com"`.
- **`path`** (string, optional): URL path component.
- **`norm_path`** (string, optional): Normalized path for `SAME_API` matching.
- **`protocol`** (string, optional).
- **`provider`** (string, optional).
- **`service`** (string, optional).
**MERGE key in batched writes:** `{ base_url, norm_path }`.
Connected from functions via `(:Function)-[:CALLS_EXTERNAL_API]->(:ExternalApi)`.
---
## Relationship types
Enumerated in `src/edge.rs` as `RelType`. Note: `Function→Class` and
`Class→Class` both use the relationship name **`USES_CLASS`** — distinguish
by node labels in Cypher.
### Structural
| `DECLARES_MODULE` | File | Module | File declares module (Erlang) |
| `DECLARES_CLASS` | File | Class | File declares type |
| `DECLARES_FUNCTION`| File | Function | File declares top-level function |
| `DECLARES_FUNCTION`| Class | Function | Class declares method |
| `DECLARES_FUNCTION`| Module | Function | Module groups functions (Erlang) |
| `DECLARES_PROPERTY`| Class | Property | Class declares C# property |
### Dependencies and call graph
| `DEPENDS_ON_FILE` | File | File | Import/include/using dependency |
| `DEPENDS_ON_FILE` | Module | File | Erlang module depends on another module file|
| `CALLS_FUNCTION` | Function | Function | Call graph edge |
| `USES_CLASS` | Function | Class | Function references a type |
| `USES_CLASS` | Class | Class | Inheritance / interface (Java, C#) or Go embedding |
### API and external systems
| `HANDLED_BY` | ApiEndpoint | Function | Endpoint handled by function |
| `CALLS_EXTERNAL_API` | Function | ExternalApi | Function calls external service |
| `SAME_API` | ApiEndpoint | ExternalApi | Internal endpoint matches external call (post-process, by `norm_path`) |
### Erlang OTP (behaviour contracts)
| `IMPLEMENTS_BEHAVIOUR`| Module | Behaviour | Module implements OTP behaviour |
| `DECLARES_BEHAVIOUR` | File | Behaviour | File declares custom behaviour |
| `DECLARES_CALLBACK` | Behaviour | Callback | Behaviour declares callback contract |
| `IMPLEMENTS_CALLBACK` | Function | Callback | Function implements callback |
| `EXTENDS_BEHAVIOUR` | Behaviour | Behaviour | Behaviour extends another |
| `OVERRIDES_CALLBACK` | Function | Callback | Function explicitly overrides callback |
---
## Post-processing: SAME_API
After all files are persisted, one global Cypher pass runs:
```cypher
MATCH (ep:ApiEndpoint)
MATCH (ext:ExternalApi)
WHERE ep.norm_path IS NOT NULL
AND ext.norm_path IS NOT NULL
AND ep.norm_path = ext.norm_path
MERGE (ep)-[:SAME_API]->(ext)
```
---
## Incremental graph updates
On incremental parse (`cleanup_incremental_targets_in_neo4j`), nodes with
matching `path` are deleted before re-upsert:
- `File`, `Module`, `Class`, `Function` where `path` is in `cleanup_targets`
(deleted, modified, or renamed-old paths from git delta).
Property nodes tied to a deleted class may remain until the class path is
cleaned; C# properties use class linkage via `DECLARES_PROPERTY`.
---
## code_bytes compression
When RedCompressor is enabled (`--compress-codeblocks` or `--push-to-neo4j`),
symbol snippets are compressed via `POST /v1/compress` and stored as
`code_bytes` on:
- `Module`, `Class`, `Property`, `Function` (language-dependent coverage)
Default compressor URL: `http://10.166.1.220:8787` (override via
`REDCOMPRESSOR_URL` or `--compressor-url`).
---
## Language coverage matrix
| File node | Yes | Yes| Yes| Yes | Yes | Yes | Yes |
| Class / struct | Yes | Yes| Yes| — | — | — | — |
| Property | — | Yes| — | — | — | — | — |
| Function (methods) | Yes | Yes| Yes| Yes | Top-level | Top-level | Top-level |
| Module | — | — | — | Yes | — | — | — |
| Behaviour / Callback | — | — | — | Yes | — | — | — |
| ApiEndpoint | Yes | Yes| Yes| Yes | — | — | — |
| ExternalApi | Yes | Yes| Yes| Yes | — | — | — |
| DEPENDS_ON_FILE | Partial* | Yes** | Yes*** | Yes | Yes | Yes | — |
| CALLS_FUNCTION | Partial| Partial| Partial| Yes**** | Intra-file | Intra-file | — |
| USES_CLASS (Function) | Yes | Yes| Yes| — | — | — | — |
| USES_CLASS (Class) | Yes | Yes| Embed| — | — | — | — |
| code_bytes | Yes | Yes| Yes| Yes | Yes | Yes | Yes |
\* Java: imports filtered to `com.redbus.genai.*` only; target must be in scanned file set.
\** C#: from `using` namespaces (excludes `System.*` / `Microsoft.*`); resolved via batch namespace index.
\*** Go: from `import` statements resolved via `go.mod` module path and `replace` directives.
\**** Erlang: intra-module AST call sites; not cross-module unless resolved by FQN.
---
## Erlang / Cowboy walker
Extraction uses the Tree-Sitter Erlang AST (not line-based regex).
- **Module**: from `-module(Name).` AST or file stem fallback.
- **Function**: from top-level `function_clause` nodes; FQN `module:name/arity`.
- **CALLS_FUNCTION**: from AST call sites → enclosing caller + callee arity
(no module-wide N×M over-approximation).
- **DEPENDS_ON_FILE**: from remote module calls; also
`(:Module)-[:DEPENDS_ON_FILE]->(:File)` for behaviour/module deps.
- **OTP**: `-behaviour`, callback declarations, `IMPLEMENTS_BEHAVIOUR`,
`DECLARES_CALLBACK`, `IMPLEMENTS_CALLBACK`, `EXTENDS_BEHAVIOUR`,
`DECLARES_BEHAVIOUR`, `OVERRIDES_CALLBACK`.
- **ApiEndpoint**: Cowboy route tuples `{"/path", handler_module, ...}`.
- **ExternalApi**: HTTP(S) URL literals; **CALLS_EXTERNAL_API** links all
functions in the module to each external URL found (conservative over-approximation).
---
## Java / Spring walker
- **Class / Function**: AST `class_declaration`, `interface_declaration`,
`method_declaration`; FQN includes package.
- **ApiEndpoint**: Spring annotations (`@GetMapping`, `@PostMapping`,
`@RequestMapping`, etc.); class-level base path supported.
- **DEPENDS_ON_FILE**: `import com.redbus.genai.*` only (configurable filter in code).
- **CALLS_FUNCTION**: method invocations with same-class / import-aware resolution.
- **USES_CLASS**: type references on functions; **Class→Class** for inheritance
and `@Autowired`-style injected dependencies.
- **ExternalApi**: HTTP URL literals; linked from functions containing the URL.
---
## C# / ASP.NET walker
- **Namespace**: AST `namespace_declaration` / `file_scoped_namespace_declaration`.
- **Class / Property / Function**: batched MERGE; nested types supported;
enums as `Class { kind: "enum" }`.
- **DEPENDS_ON_FILE**: from `using` namespace imports (non-system).
- **CALLS_FUNCTION**: invocations in methods, constructors, property accessors.
- **USES_CLASS**: type resolution via usings + C# batch index; inheritance via
`base_list` (`Class→Class USES_CLASS`).
- **ApiEndpoint**: `[HttpGet]`, `[HttpPost]`, `[Route]`, etc.
- **ExternalApi**: HTTP URLs; **CALLS_EXTERNAL_API** scoped to functions whose
body spans contain the URL.
---
## Go walker
- **Class** (struct): `type_declaration` / struct; FQN `package.StructName`.
- **Function**: top-level and methods with receiver in FQN.
- **DEPENDS_ON_FILE**: imports resolved via `go.mod` + known scanned paths.
- **CALLS_FUNCTION**: function/method calls; goroutine-spawned calls tracked separately.
- **USES_CLASS**: struct type usage; embedded structs via `Class→Class USES_CLASS`.
- **ApiEndpoint**: `http.HandleFunc`, Chi, Gin, Echo patterns.
- **ExternalApi**: HTTP URL literals.
---
## Python / JavaScript / TypeScript / Rust walker
Handled by `persist_non_java_functions`:
- **Function**: top-level functions only (Python skips functions inside classes).
- **DEPENDS_ON_FILE**: Python imports and JS/TS import specifiers resolved to
known scanned files.
- **CALLS_FUNCTION**: intra-file calls (Python, JS/TS).
- **code_bytes**: supported when compression enabled.
- No `Class`, `ApiEndpoint`, or `ExternalApi` extraction for these languages today.
---
## Test file detection
Files flagged `is_test: true` based on:
- **Directories**: `/test/`, `/tests/`, `/__tests__/`, `/spec/`, `/src/test/`, `/t/`.
- **Filenames**: `test_*.py`, `*_test.go`, `*Test.java`, `*.spec.ts`, etc.
```cypher
MATCH (f:File {is_test: false})-[:DECLARES_FUNCTION]->(fn:Function)
RETURN fn.fqn, fn.path
```
---
## Example impact queries
**Functions calling a target method:**
```cypher
MATCH (caller:Function)-[:CALLS_FUNCTION]->(target:Function {name: "setAmenities"})
WHERE target.fqn CONTAINS "OrderDetail"
RETURN caller.fqn, caller.path
```
**Files depending on a changed file:**
```cypher
MATCH (f:File)-[:DEPENDS_ON_FILE]->(dep:File)
WHERE dep.path CONTAINS "OrderDetail.java"
RETURN f.path
```
**API endpoint to handler:**
```cypher
MATCH (ep:ApiEndpoint)-[:HANDLED_BY]->(fn:Function)
RETURN ep.path, ep.methods, fn.fqn
```
**Cross-service API link:**
```cypher
MATCH (ep:ApiEndpoint)-[:SAME_API]->(ext:ExternalApi)
RETURN ep.path, ext.base_url, ext.norm_path
```
**Erlang module implementing a behaviour:**
```cypher
MATCH (m:Module)-[:IMPLEMENTS_BEHAVIOUR]->(b:Behaviour {name: "gen_server"})
RETURN m.name, m.path
```
---
## Persistence notes
- Relationship writes are batched (`BATCH_FLUSH_THRESHOLD = 3000`).
- C# class/property/function nodes use batched MERGE (`CSHARP_NODE_BATCH_FLUSH_THRESHOLD = 500`).
- Erlang module writes run concurrently (capped in-flight tasks).
- Bootstrap mode supports `--clean` (`MATCH (n) DETACH DELETE n`).