tokensave 4.0.6

Code intelligence tool that builds a semantic knowledge graph from Rust, Go, Java, Scala, TypeScript, Python, C, C++, Kotlin, C#, Swift, and many more codebases
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
# CodeGraph Language Verification Guide

You are verifying that CodeGraph fully supports a specific programming language. The user will give you a path to a real-world, popular open-source codebase cloned locally. Your job is to run a battery of realistic prompts against it using CodeGraph's API and verify the results are good enough to say that language is **covered and supported**.

A language is NOT verified until an LLM can reliably use CodeGraph's MCP tools to navigate that codebase — finding the right symbols, understanding call chains, exploring subsystems, and getting useful context for real tasks.

## Setup

### 1. Build and index

```bash
npm run build
rm -rf <codebase_path>/.codegraph
node dist/bin/codegraph.js init -iv <codebase_path>
```

The `-iv` flag gives verbose output showing extraction progress, node/edge counts, and timing.

### 2. Quick sanity check

```bash
# Verify nodes were extracted with proper qualified names
sqlite3 <codebase_path>/.codegraph/codegraph.db \
  "SELECT name, kind, qualified_name FROM nodes WHERE kind = 'method' LIMIT 10;"

# GOOD: file.go::StructName::method_name  (owner type present)
# BAD:  file.go::file.go::method_name     (owner type missing — needs getReceiverType)

# Check edge counts
sqlite3 <codebase_path>/.codegraph/codegraph.db \
  "SELECT kind, COUNT(*) FROM edges GROUP BY kind ORDER BY COUNT(*) DESC;"

# Check node kind distribution
sqlite3 <codebase_path>/.codegraph/codegraph.db \
  "SELECT kind, COUNT(*) FROM nodes GROUP BY kind ORDER BY COUNT(*) DESC;"
```

If methods are missing their owner type in `qualified_name`, fix that first (see [Adding getReceiverType](#adding-getreceivertype)) before proceeding with the full test battery.

## The Test Battery

Run **all** of the following test categories against the codebase. Use the Node.js API directly — the test scripts below are templates. Adapt the queries to match real types, methods, and subsystems in the codebase you're testing.

**Pass criteria for each test:** Does the result give an LLM enough correct information to answer the question or complete the task? Would you trust these results if you were the LLM?

---

### Test 1: `codegraph_explore` — Deep Exploration (MOST IMPORTANT)

This is the primary tool LLMs use. It must return relevant source code grouped by file, with correct relationships, for a natural language query. Test it with **at least 5 different query types**:

```bash
node -e "
const { CodeGraph } = require('./dist/index.js');
async function test() {
  const cg = await CodeGraph.open('<codebase_path>');

  const queries = [
    // A. Subsystem exploration — broad topic, should find the right files and key classes
    'How does the caching system work?',

    // B. Specific class/type deep dive — should return that class, its methods, and related types
    'CacheBuilder configuration and build process',

    // C. Cross-cutting concern — should find implementations across multiple files
    'How are errors handled and propagated?',

    // D. Data flow question — should trace through multiple layers
    'How does data flow from input to storage?',

    // E. Implementation detail — specific method behavior
    'How does eviction decide which entries to remove?',
  ];

  for (const query of queries) {
    console.log(\`\n========================================\`);
    console.log(\`QUERY: \${query}\`);
    console.log(\`========================================\`);

    const subgraph = await cg.findRelevantContext(query, {
      searchLimit: 8, traversalDepth: 3, maxNodes: 80, minScore: 0.2,
    });

    // Show entry points — these are what the LLM sees first
    console.log(\`\nEntry points (\${subgraph.roots.length}):\`);
    for (const rootId of subgraph.roots.slice(0, 8)) {
      const node = subgraph.nodes.get(rootId);
      if (node) console.log(\`  \${node.name} (\${node.kind}) — \${node.filePath}:\${node.startLine}\`);
    }

    // Show file distribution — are the right files surfacing?
    const fileGroups = new Map();
    for (const node of subgraph.nodes.values()) {
      if (!fileGroups.has(node.filePath)) fileGroups.set(node.filePath, []);
      fileGroups.get(node.filePath).push(node.name);
    }
    console.log(\`\nFiles (\${fileGroups.size}):\`);
    for (const [file, nodes] of [...fileGroups.entries()].sort((a,b) => b[1].length - a[1].length).slice(0, 8)) {
      console.log(\`  \${file} (\${nodes.length} symbols): \${nodes.slice(0, 6).join(', ')}\`);
    }

    // Show edge distribution — are relationships being captured?
    const edgeKinds = new Map();
    for (const edge of subgraph.edges) {
      edgeKinds.set(edge.kind, (edgeKinds.get(edge.kind) || 0) + 1);
    }
    console.log(\`\nEdges (\${subgraph.edges.length}):\`);
    for (const [kind, count] of [...edgeKinds.entries()].sort((a,b) => b - a)) {
      console.log(\`  \${kind}: \${count}\`);
    }

    console.log(\`\nTotal: \${subgraph.nodes.size} nodes, \${subgraph.edges.length} edges, \${fileGroups.size} files\`);
  }

  await cg.close();
}
test().catch(console.error);
"
```

**What to check for each query:**
- Do the entry points make sense for the question?
- Are the right files surfacing (not just test files or unrelated code)?
- Is there a mix of edge types (calls, contains, extends, implements) — not just `contains`?
- Does the node count feel right? Too few (<5) means search failed. Too many irrelevant ones means noise.

---

### Test 2: `codegraph_search` — Symbol Lookup

Test that searching for specific symbols returns the right results ranked correctly.

```bash
node -e "
const { CodeGraph } = require('./dist/index.js');
async function test() {
  const cg = await CodeGraph.open('<codebase_path>');

  const searches = [
    // A. Class by name
    { query: 'CacheBuilder', kinds: ['class'], desc: 'Find a specific class' },

    // B. Method on a specific type (the classic disambiguation test)
    { query: 'CacheBuilder build', kinds: ['method'], desc: 'Method on specific class' },

    // C. Common method name — should still find relevant ones
    { query: 'get', kinds: ['method'], desc: 'Common method name' },

    // D. Interface/trait
    { query: 'Cache', kinds: ['interface'], desc: 'Find an interface' },

    // E. Enum
    { query: 'Strength', kinds: ['enum'], desc: 'Find an enum' },
  ];

  for (const s of searches) {
    console.log(\`\n--- \${s.desc}: \"\${s.query}\" (kinds: \${s.kinds}) ---\`);
    const results = cg.searchNodes(s.query, { limit: 10, kinds: s.kinds });
    for (const r of results) {
      console.log(\`  \${r.score.toFixed(1)} | \${r.node.name} (\${r.node.kind}) | \${r.node.qualifiedName}\`);
    }
    if (results.length === 0) console.log('  *** NO RESULTS ***');
  }

  await cg.close();
}
test().catch(console.error);
"
```

**What to check:**
- Does the target symbol rank in the top 3?
- For common names like `get`, do the results include qualified names that help disambiguate?
- Are there zero-result queries? That's a bug.

---

### Test 3: `codegraph_callers` / `codegraph_callees` — Call Chain Tracing

Test that call relationships were extracted correctly.

```bash
node -e "
const { CodeGraph } = require('./dist/index.js');
async function test() {
  const cg = await CodeGraph.open('<codebase_path>');

  // Pick 3-4 important methods and check their call graphs
  const symbols = ['build', 'get', 'put', 'invalidate'];

  for (const sym of symbols) {
    // Find the symbol
    const results = cg.searchNodes(sym, { limit: 5, kinds: ['method'] });
    if (results.length === 0) { console.log(\`\${sym}: not found\`); continue; }

    const node = results[0].node;
    console.log(\`\n--- \${node.name} (\${node.qualifiedName}) ---\`);

    // Check callees (what does it call?)
    const callees = cg.getCallees(node.id);
    console.log(\`  Callees (\${callees.length}): \${callees.slice(0, 10).map(c => c.node.name).join(', ')}\`);

    // Check callers (what calls it?)
    const callers = cg.getCallers(node.id);
    console.log(\`  Callers (\${callers.length}): \${callers.slice(0, 10).map(c => c.node.name).join(', ')}\`);
  }

  await cg.close();
}
test().catch(console.error);
"
```

**What to check:**
- Do methods have callers AND callees? If a method has 0 of both, edge extraction may be broken.
- Do the callers/callees make sense? A `build()` method should call constructor-like things, and be called by setup/initialization code.
- Are the counts reasonable? A core method in a popular codebase should have multiple callers.

---

### Test 4: `codegraph_impact` — Change Impact Analysis

Test that the impact radius correctly identifies affected code.

```bash
node -e "
const { CodeGraph } = require('./dist/index.js');
async function test() {
  const cg = await CodeGraph.open('<codebase_path>');

  // Pick a core class or interface that many things depend on
  const results = cg.searchNodes('<CoreClass>', { limit: 1, kinds: ['class', 'interface'] });
  if (results.length === 0) { console.log('Not found'); return; }

  const node = results[0].node;
  console.log(\`Impact analysis for: \${node.name} (\${node.kind}) — \${node.filePath}\`);

  const impact = cg.getImpactRadius(node.id, 2);
  console.log(\`\nAffected nodes: \${impact.nodes.size}\`);
  console.log(\`Affected edges: \${impact.edges.length}\`);

  // Group by file
  const files = new Map();
  for (const n of impact.nodes.values()) {
    if (!files.has(n.filePath)) files.set(n.filePath, []);
    files.get(n.filePath).push(n.name);
  }
  console.log(\`Affected files: \${files.size}\`);
  for (const [file, nodes] of [...files.entries()].sort((a,b) => b[1].length - a[1].length).slice(0, 10)) {
    console.log(\`  \${file}: \${nodes.slice(0, 5).join(', ')}\`);
  }

  await cg.close();
}
test().catch(console.error);
"
```

**What to check:**
- Does changing a core interface/class show a wide impact radius?
- Are the affected files reasonable (things that import/extend/use it)?
- Is the impact radius non-empty? Zero impact on a core type means edges are missing.

---

### Test 5: Edge Extraction Quality

Directly verify that the major edge types are being extracted for this language.

```bash
node -e "
const { CodeGraph } = require('./dist/index.js');
async function test() {
  const cg = await CodeGraph.open('<codebase_path>');

  // Check overall edge distribution
  console.log('=== Edge distribution ===');
  // (Use sqlite3 query from sanity check above)

  // Find a class that extends another
  const classes = cg.searchNodes('', { limit: 100, kinds: ['class'] });
  let foundExtends = false, foundImplements = false;
  for (const r of classes) {
    const callees = cg.getCallees(r.node.id);
    // getCallees returns all outgoing edges, check for extends/implements
    // Better: use graph traversal
  }

  // Verify specific relationship types exist
  const checks = [
    { desc: 'contains edges (class → method)', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"contains\"' },
    { desc: 'calls edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"calls\"' },
    { desc: 'imports edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"imports\"' },
    { desc: 'extends edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"extends\"' },
    { desc: 'implements edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"implements\"' },
  ];
  // Run these via sqlite3 (shown in sanity check section)

  await cg.close();
}
test().catch(console.error);
"
```

```bash
sqlite3 <codebase_path>/.codegraph/codegraph.db "
  SELECT kind, COUNT(*) as cnt FROM edges GROUP BY kind ORDER BY cnt DESC;
"
```

**What to check:**
- `contains` should be the most common (structural hierarchy).
- `calls` should be plentiful — if near zero, call extraction is broken for this language.
- `imports` should exist — if zero, import parsing is broken.
- `extends` and `implements` should exist if the language has inheritance — if zero, `extractInheritance()` may not handle this language's AST.

---

### Test 6: Node Extraction Completeness

Verify all expected node kinds are being extracted.

```bash
sqlite3 <codebase_path>/.codegraph/codegraph.db "
  SELECT kind, COUNT(*) as cnt FROM nodes GROUP BY kind ORDER BY cnt DESC;
"
```

**What to check for each language:**

| Node Kind | Expected? | Notes |
|-----------|-----------|-------|
| `file` | Always | One per source file |
| `class` | If language has classes | |
| `method` | If language has methods | Should include owner type in `qualified_name` |
| `function` | If language has top-level functions | |
| `interface` | If language has interfaces/protocols | |
| `enum` | If language has enums | |
| `enum_member` | If language has enums | Values inside enums |
| `import` | Always | One per import statement |
| `variable` / `field` | Usually | Fields, constants, top-level vars |
| `struct` | If language has structs | Go, Rust, C, Swift |
| `trait` | If language has traits | Rust |

If an expected node kind has 0 count, the language extractor is missing that AST type.

---

### Test 7: Real-World LLM Prompts

This is the final and most important test. Simulate the kinds of questions a developer would actually ask an LLM that's using CodeGraph. For each prompt, run `findRelevantContext` (which powers `codegraph_explore`) and evaluate whether the returned context would let an LLM give a correct, complete answer.

**Run at least 5 of these prompt styles, adapted to the actual codebase:**

```bash
node -e "
const { CodeGraph } = require('./dist/index.js');
async function test() {
  const cg = await CodeGraph.open('<codebase_path>');

  const prompts = [
    // 1. \"How does X work?\" — subsystem understanding
    'How does the cache eviction policy work?',

    // 2. \"Where is X implemented?\" — symbol location
    'Where is the LRU eviction logic implemented?',

    // 3. \"What calls X?\" — usage discovery
    'What code triggers cache invalidation?',

    // 4. \"I want to change X, what breaks?\" — impact assessment
    'If I change the Cache interface, what else is affected?',

    // 5. \"How do X and Y interact?\" — cross-component relationships
    'How does CacheBuilder connect to LocalCache?',

    // 6. \"Show me the flow from A to B\" — data/control flow
    'What happens when a cache entry expires?',

    // 7. \"What are all the implementations of X?\" — polymorphism
    'What classes implement the Cache interface?',

    // 8. Bug investigation prompt
    'Cache entries are not being evicted when they should be — where should I look?',
  ];

  for (const prompt of prompts) {
    console.log(\`\n========================================\`);
    console.log(\`PROMPT: \${prompt}\`);
    console.log(\`========================================\`);

    const subgraph = await cg.findRelevantContext(prompt, {
      searchLimit: 8, traversalDepth: 3, maxNodes: 80, minScore: 0.2,
    });

    console.log(\`Result: \${subgraph.nodes.size} nodes, \${subgraph.edges.length} edges, \${subgraph.roots.length} entry points\`);

    console.log('Entry points:');
    for (const rootId of subgraph.roots.slice(0, 5)) {
      const node = subgraph.nodes.get(rootId);
      if (node) console.log(\`  \${node.name} (\${node.kind}) — \${node.filePath}:\${node.startLine}\`);
    }

    const fileGroups = new Map();
    for (const node of subgraph.nodes.values()) {
      if (!fileGroups.has(node.filePath)) fileGroups.set(node.filePath, []);
      fileGroups.get(node.filePath).push(node.name);
    }
    console.log('Top files:');
    for (const [file, nodes] of [...fileGroups.entries()].sort((a,b) => b[1].length - a[1].length).slice(0, 5)) {
      console.log(\`  \${file} (\${nodes.length}): \${nodes.slice(0, 5).join(', ')}\`);
    }

    // PASS/FAIL judgment
    const hasEntryPoints = subgraph.roots.length > 0;
    const hasEdges = subgraph.edges.length > 0;
    const hasMultipleFiles = fileGroups.size > 1;
    console.log(\`\\nVERDICT: \${hasEntryPoints && hasEdges && hasMultipleFiles ? 'PASS' : 'FAIL — needs investigation'}\`);
  }

  await cg.close();
}
test().catch(console.error);
"
```

**What to check for each prompt:**
- Does it return entry points? Zero entry points = total failure.
- Are the entry points **relevant** to the question? (Not just random symbols that happen to share a word.)
- Does it span multiple files? Most real questions involve cross-file understanding.
- Are relationships present? An LLM needs to understand how symbols connect, not just a list of names.
- Would **you** be able to answer the question from this context?

---

## Diagnosing Failures

| Symptom | Likely Cause | Where to Fix |
|---------|-------------|--------------|
| Method missing owner type in `qualified_name` | Language needs `getReceiverType` | `src/extraction/languages/<lang>.ts` |
| `codegraph_explore` returns irrelevant files | Common names flooding FTS; co-location boost not helping | `src/db/queries.ts: findNodesByExactName`, `src/context/index.ts` |
| Zero `calls` edges | `callTypes` missing or wrong AST node type | `src/extraction/languages/<lang>.ts: callTypes` |
| Zero `extends`/`implements` edges | `extractInheritance()` doesn't handle this language's AST | `src/extraction/tree-sitter.ts: extractInheritance()` |
| Missing node kinds (no enums, no interfaces) | AST type not listed in extractor | `src/extraction/languages/<lang>.ts: enumTypes`, `interfaceTypes`, etc. |
| Search term dropped from query | Term is in the stop words list | `src/search/query-utils.ts: STOP_WORDS` |
| `qualified_name` missing class for nested methods | Extraction not walking parent stack correctly | `src/extraction/tree-sitter.ts: visitNode()` |
| Import edges missing | `extractImport` returns null for this syntax | `src/extraction/languages/<lang>.ts: extractImport` |
| C++ classes/structs/enums missing from macro namespaces | Macros like `NLOHMANN_JSON_NAMESPACE_BEGIN` cause tree-sitter to misparse namespace blocks as `function_definition` | `src/extraction/languages/c-cpp.ts: isMisparsedFunction` filters bad names; `src/extraction/tree-sitter.ts: visitFunctionBody` extracts structural nodes |
| C++ classes missing from `.h` headers | `.h` files default to `c` language which has `classTypes: []` | `src/extraction/grammars.ts: looksLikeCpp()` — content-based heuristic promotes `.h` files to `cpp` when C++ patterns detected |
| Ruby methods inside modules missing owner in `qualified_name` | Ruby `module` AST nodes not being extracted | `src/extraction/languages/ruby.ts: visitNode` hook extracts modules; `src/extraction/tree-sitter.ts: isInsideClassLikeNode` includes `module` kind |
| TypeScript abstract classes missing | `abstract_class_declaration` not in `classTypes` | `src/extraction/languages/typescript.ts: classTypes` — add `abstract_class_declaration` |
| Single-expression arrow functions silently dropped | `extractName` finds identifier in expression body instead of returning `<anonymous>` | `src/extraction/tree-sitter.ts: extractName` — skip identifier search for `arrow_function`/`function_expression` nodes |
| Kotlin interfaces/enums extracted as classes | `class_declaration` matches `classTypes` first; `interfaceTypes`/`enumTypes` never fire | `src/extraction/languages/kotlin.ts: classifyClassNode` detects `interface`/`enum` keywords in AST children |
| Kotlin functions have zero calls extracted | Tree-sitter grammar doesn't use field names, so `getChildByField(node, 'function_body')` returns null | `src/extraction/languages/kotlin.ts: resolveBody` finds body by type (`function_body`, `class_body`, `enum_class_body`) |
| Kotlin `navigation_expression` calls not resolved cleanly | `navigation_expression` fell through to `getNodeText` producing messy names with parentheses | `src/extraction/tree-sitter.ts: extractCall` — handle `navigation_expression` by extracting method name from `navigation_suffix > simple_identifier` |
| Kotlin `fun interface` declarations invisible | Tree-sitter-kotlin doesn't support `fun interface` syntax (Kotlin 1.4+), producing ERROR or misparse as `function_declaration` | `src/extraction/languages/kotlin.ts: visitNode` detects three misparse patterns: (1) ERROR node + lambda body, (2) function_declaration with `user_type("interface")` direct child + name in ERROR child, (3) function_declaration with ERROR child containing `user_type("interface")` + name. `isFunInterfaceNode` checks both direct and ERROR-nested `user_type` children |
| Kotlin class/interface methods missing when nested `fun interface` present | Tree-sitter misparsed parent body as ERROR (starting with `{`) + class_body (nested interface body); `resolveBody` found wrong body | `src/extraction/languages/kotlin.ts: resolveBody` prefers ERROR bodies starting with `{`; `visitNode` excludes body-like ERROR from `fun interface` detection |
| Svelte `$props()` destructuring produces ugly variable names | `let { x, y } = $props()` has `object_pattern` as variable name node; `getNodeText` returns full pattern | `src/extraction/tree-sitter.ts: extractVariable` skips `object_pattern`/`array_pattern` named declarators |
| Svelte template function calls invisible (e.g. `class={cn(...)}`) | SvelteExtractor only parsed `<script>` blocks, missing calls in template markup | `src/extraction/svelte-extractor.ts: extractTemplateCalls` scans `{expression}` blocks in template for call patterns |
| Svelte `$state`/`$derived` rune calls creating noise | Runes are compiler builtins, not real function calls | `src/extraction/svelte-extractor.ts` filters `SVELTE_RUNES` set from unresolved references |
| Object literal getters/setters extracted as standalone functions | `method_definition` inside `object` literals treated same as class methods | `src/extraction/tree-sitter.ts: extractMethod` skips `method_definition` nodes whose parent is `object`/`object_expression` |
| JavaScript `class extends` produces zero inheritance edges | JS tree-sitter uses `class_heritage → identifier` (bare), not `class_heritage → extends_clause → identifier` like TypeScript | `src/extraction/tree-sitter.ts: extractInheritance` — handle bare `identifier`/`type_identifier` children when parent is `class_heritage` |
| PHP traits extracted as classes | `trait_declaration` in `classTypes` but `extractClass` hardcodes `class` kind | `src/extraction/languages/php.ts: classifyClassNode` returns `'trait'` for `trait_declaration`; `src/extraction/tree-sitter-types.ts` adds `'trait'` to return type |
| PHP class properties missing (0 field nodes) | `extractField` looks for `variable_declarator` children; PHP uses `property_element > variable_name > name` | `src/extraction/tree-sitter.ts: extractField` — handle `property_element` children with `variable_name > name` path |
| PHP class constants skipped inside classes | `variableTypes` check has `!isInsideClassLikeNode()` guard, so `const_declaration` inside classes falls through | `src/extraction/languages/php.ts: visitNode` hook catches `const_declaration`, extracts `const_element > name` as `constant` kind |
| PHP `use TraitName` inside classes invisible | `use_declaration` nodes in class body not processed for edges | `src/extraction/languages/php.ts: visitNode` hook extracts trait names from `use_declaration` and creates `implements` unresolved references |

## After Fixing Issues

```bash
npm run build
rm -rf <codebase_path>/.codegraph
node dist/bin/codegraph.js init -iv <codebase_path>
# Re-run the failing tests from above
```

Always run the full test suite before marking a language as verified:

```bash
npm test
```

## Adding `getReceiverType`

**Only needed for languages where methods are top-level or outside their owner type in the AST.** If the language nests methods inside class/struct bodies (Python, Java, TypeScript, C#), the qualified name already includes the parent — verify with the sanity check before adding anything.

### 1. Add the hook to the language extractor

In `src/extraction/languages/<lang>.ts`, add `getReceiverType` to the extractor object:

```typescript
getReceiverType: (node, source) => {
  // Extract the owner type name from the method's AST node.
  // Return the type name string, or undefined if not applicable.
  //
  // The core extractMethod() in tree-sitter.ts will use this to set:
  //   qualifiedName = `${filePath}::${receiverType}::${methodName}`
},
```

### 2. Reference: Go implementation

```typescript
// src/extraction/languages/go.ts
getReceiverType: (node, source) => {
  const receiver = getChildByField(node, 'receiver');
  if (!receiver) return undefined;
  const text = getNodeText(receiver, source);
  const match = text.match(/\*?\s*([A-Za-z_][A-Za-z0-9_]*)\s*\)/);
  return match?.[1];
},
```

### 3. Where it's consumed

`src/extraction/tree-sitter.ts` in `extractMethod()`:

```typescript
const receiverType = this.extractor.getReceiverType?.(node, this.source);
if (receiverType) {
  extraProps.qualifiedName = `${this.filePath}::${receiverType}::${name}`;
}
```

## Key Files

| File | Role |
|------|------|
| `src/extraction/languages/<lang>.ts` | Language extractor — node types, call types, `getReceiverType` |
| `src/extraction/tree-sitter.ts` | Core extraction — `extractMethod()`, `extractCall()`, `extractInheritance()` |
| `src/extraction/tree-sitter-types.ts` | `LanguageExtractor` interface definition |
| `src/search/query-utils.ts` | `STOP_WORDS`, `extractSearchTerms`, `scorePathRelevance` |
| `src/db/queries.ts` | `searchNodesFTS` (BM25), `findNodesByExactName` (co-location boost) |
| `src/context/index.ts` | `findRelevantContext` — hybrid search + graph traversal |
| `src/mcp/tools.ts` | MCP tool handlers — `codegraph_explore` implementation |

## Language Status

### Verified

- [x] **Go** — `getReceiverType` extracts receiver from `func (sl *Type) method()`
- [x] **Swift** — NOT needed. Tree-sitter nests methods inside class/extension bodies
- [x] **Java** — NOT needed. Methods nested in class body. Verified against Guava
- [x] **Python** — NOT needed. Methods nested in class body. Verified against Flask
- [x] **Rust** — `getReceiverType` walks up to parent `impl_item` to extract type name. Also adds `contains` edges from struct to impl methods. Verified against Deno
- [x] **C** — NOT needed. No methods in C. Strong function/struct/enum extraction with excellent call edge density. Verified against Redis
- [x] **C++** — NOT needed for header-only libs. `isMisparsedFunction` hook filters macro-caused misparse artifacts (e.g. `NLOHMANN_JSON_NAMESPACE_BEGIN`). `visitFunctionBody` now extracts structural nodes (classes/structs/enums) inside macro-confused "function" bodies. Content-based `.h` detection (`looksLikeCpp` in `grammars.ts`) promotes C++ headers to `cpp` language so classes in `.h` files are extracted. Verified against nlohmann/json and gRPC. Note: out-of-class `Type::method()` definitions would need `getReceiverType` but are uncommon in header-only codebases.
- [x] **C#** — NOT needed. Methods nested in class body. Added `base_list` handling in `extractInheritance` for C#'s `: Parent, IInterface` syntax. Added `propertyTypes` support for C# `property_declaration` nodes. Fixed `extractField` to handle C#'s nested `variable_declaration > variable_declarator` structure. Verified against Jellyfin
- [x] **Ruby** — NOT needed for `getReceiverType`. Methods nested in class body. Added `visitNode` hook to extract Ruby `module` nodes (concerns, namespaces) with proper containment and qualified names. Methods inside modules get `Module::method` qualified names. Also wired up the `ExtractorContext` with `pushScope`/`popScope` for language hooks. Verified against Discourse
- [x] **TypeScript** — NOT needed for `getReceiverType`. Methods nested in class body. Added `abstract_class_declaration` to `classTypes` so abstract classes are properly extracted. Fixed single-expression arrow function extraction (`const fn = () => expr` was silently dropped because `extractName` picked up the body identifier instead of returning `<anonymous>` for parent name resolution). Verified against Grafana
- [x] **Dart** — NOT needed for `getReceiverType`. Methods nested in class body. Added bare call extraction for selector-based method calls (e.g. `object.method()`). Verified against Flutter
- [x] **Kotlin** — `getReceiverType` extracts receiver from extension functions `fun Type.method()`. Added `classifyClassNode` to distinguish interfaces/enums from classes (all use `class_declaration` AST node). Added `resolveBody` hook since Kotlin's tree-sitter grammar doesn't use field names. Added `navigation_expression` handling for method call extraction. Added `object_declaration` via `extraClassNodeTypes`. Added `delegation_specifier` handling in `extractInheritance` for Kotlin's `: Parent, Interface` syntax. Also fixed `extractInterface` to visit body children (interface methods were not being extracted). Added `visitNode` hook to handle `fun interface` (SAM) declarations — tree-sitter-kotlin doesn't support this Kotlin 1.4+ syntax, producing ERROR or function_declaration misparse; the hook detects both patterns and extracts the interface. Verified against Koin, LeakCanary
- [x] **Svelte** — Custom `SvelteExtractor` delegates `<script>` blocks to TS/JS parser; creates `component` nodes for each `.svelte` file. Added template expression call extraction: scans `{expression}` blocks in markup for function calls (e.g. `class={cn(...)}`), creating call edges from component to callees — increased Svelte call edges from 29 to 387. Filtered Svelte 5 rune calls (`$state`, `$props`, `$derived`, `$effect`, `$bindable`). Also fixed: destructured `$props()` patterns (e.g. `let { x, y } = $props()`) no longer extracted as ugly multi-line variable names (skip `object_pattern`/`array_pattern` in `extractVariable`). Object literal getter/setter methods no longer extracted as standalone functions. Verified against shadcn-svelte
- [x] **PHP** — NOT needed for `getReceiverType`. Methods nested in class body. Added `classifyClassNode` to distinguish traits from classes (`trait_declaration` → `trait` kind). Added `'trait'` to `classifyClassNode` return type in `tree-sitter-types.ts` and handling in visitor. Fixed PHP property extraction: `extractField` now handles `property_element > variable_name > name` AST structure (added 4,366 field nodes). Added `visitNode` hook for class constants (`const_declaration` inside classes was skipped by `variableTypes` guard) and trait `use` declarations (`use HasFactory, SoftDeletes;` creates `implements` edges — increased from 636 to 1,514). Verified against Laravel

### Needs Verification

(none currently)