tldr-core 0.1.6

# Adversarial Premortem: Reference-Counting Dead Code Detector

**Version:** 1.0
**Date:** 2026-02-12
**Author:** architect-agent (3-pass adversarial analysis)
**Status:** Pre-implementation risk assessment

## Overview

This document identifies failure modes, edge cases, and correctness risks in the
refcount-based dead code detector defined in `REFCOUNT_SPEC.md` and implemented
in `refcount.rs` + `dead.rs::dead_code_analysis_refcount()`. Each risk is
categorized by severity and includes a concrete mitigation. The "Already handled?"
column is verified by reading the actual implementation code.

---

## Pass 1: Edge Cases

These are inputs or conditions the algorithm may encounter that could cause
incorrect results (false positives = flagging live code as dead; false negatives =
missing truly dead code).

### 1.1 Function name matches a language keyword

| | |
|---|---|
| **Risk** | Python has built-in names that are also valid identifiers: `type`, `list`, `dict`, `set`, `filter`, `map`, `input`, `print`. A user-defined function `def type(x):` will have an inflated ref_count because tree-sitter parses every usage of `type(...)` as an `(identifier)` node. The function will be falsely rescued (false negative -- missed dead code). |
| **Severity** | **MEDIUM** |
| **Concrete example** | `def type(data): ...` defined once, never called. But `type(x)` appears 200 times in the codebase (calling the built-in). `ref_counts["type"] = 201`, so the function is rescued. |
| **Mitigation** | Add a per-language **built-in name exclusion list** to `is_rescued_by_refcount()`. When the function name is in the built-in list, do NOT rescue it -- instead fall through to the dead/possibly-dead classification. This is analogous to C3 (short name skip) but for semantically ambiguous names. Alternatively, mark these as **LOW confidence** rescues rather than silently rescuing. |
| **Already handled?** | **NO.** `is_rescued_by_refcount()` at `refcount.rs:125-155` only checks name length (< 3 chars). No built-in name filtering exists. The C3 short-name check partially helps (e.g., `id` is 2 chars and skipped) but misses `type` (4 chars), `list` (4 chars), `dict` (4 chars), `print` (5 chars). |

### 1.2 Tree-sitter fails to parse a file (malformed syntax)

| | |
|---|---|
| **Risk** | If a file has syntax errors, tree-sitter returns a tree with `ERROR` nodes. The `count_identifiers_in_tree()` function at `refcount.rs:55-110` walks ALL nodes including children of `ERROR` nodes. This could cause: (a) identifiers inside error regions to be miscounted, or (b) identifiers to be missed if the parse tree is severely corrupted. Both lead to wrong ref_counts. |
| **Severity** | **LOW** |
| **Concrete example** | A Python file with an unclosed parenthesis: `def foo(x:`. Tree-sitter may parse `foo` as an identifier inside an ERROR node, or may fail to produce it at all. Either way, the ref_count for `foo` is wrong by +/- 1. |
| **Mitigation** | Before counting identifiers, check `tree.root_node().has_error()`. If true, either (a) skip the file entirely and log a warning, or (b) still count but mark any function definitions extracted from this file as **LOW confidence**. Option (a) is safer -- a skipped file cannot produce false counts. The current `count_identifiers_in_tree` does not check for parse errors. |
| **Already handled?** | **NO.** `count_identifiers_in_tree()` at `refcount.rs:64` calls `tree.walk()` unconditionally. No error checking on the tree. |

### 1.3 Generated code (protobuf, codegen) inflating ref_counts

| | |
|---|---|
| **Risk** | Generated code (e.g., `*.pb.go`, `*_generated.rs`, `*_pb2.py`) often contains hundreds of identifiers that match user-defined function names. This inflates ref_counts, causing false negatives (dead code not detected because the generated file "references" the name). |
| **Severity** | **MEDIUM** |
| **Concrete example** | User defines `def serialize(data):` (never called). A protobuf-generated file `messages_pb2.py` contains `serialize` 50 times as method stubs. `ref_counts["serialize"] = 51`, rescuing the dead function. |
| **Mitigation** | Add a **file exclusion pattern list** for generated code. Common patterns: `*_pb2.py`, `*.pb.go`, `*_generated.*`, `*_gen.*`, `*.g.dart`, files containing `// Code generated by` or `# Generated by` in the first 5 lines. Skip these files during the `count_identifiers_in_tree()` aggregation pass. Alternatively, allow user configuration via `--exclude-patterns`. |
| **Already handled?** | **NO.** No file filtering exists in the refcount path. The aggregation layer (Phase 1 in the spec) has not been implemented yet -- this is the right time to add it. |

### 1.4 String interpolation containing function names

| | |
|---|---|
| **Risk** | In f-strings, template literals, and string interpolation, function names may appear as identifiers in the AST. For example, `f"Calling {process_data}..."` in Python puts `process_data` into the tree as an `(identifier)` node inside an `(interpolation)` node. This inflates ref_count, which is actually *correct behavior* (the string references the function). However, `f"process_data took {time}s"` does NOT produce an identifier -- it is a string literal. So this is not a risk of inflation, but a risk of **inconsistency**: some string references count, others do not. |
| **Severity** | **LOW** |
| **Concrete example** | `log(f"Running {handler}")` -- tree-sitter parses `handler` as an identifier inside interpolation. This is correct and actually helps (the function IS referenced). No action needed for this case. |
| **Mitigation** | No mitigation needed. This is actually a **strength** of the tree-sitter approach over grep -- it correctly identifies live references inside interpolation expressions. Document this as expected behavior. |
| **Already handled?** | **YES**, implicitly. Tree-sitter's depth-first walk at `refcount.rs:67-107` visits all identifier nodes regardless of parent context, including interpolation contexts. |

### 1.5 Macro-generated functions (Rust, Elixir, C/C++)

| | |
|---|---|
| **Risk** | Macros can generate function definitions that are invisible to tree-sitter because the macro is opaque. (a) If a macro generates a function definition, that function won't appear in `all_functions`, so it can never be flagged as dead (acceptable). (b) If a macro generates a *call* to a function, tree-sitter may see the macro invocation but not the expanded identifier. This means the callee gets ref_count that is too low, potentially causing a **false positive** (flagging a live function as dead). |
| **Severity** | **HIGH** |
| **Concrete example** | Rust: `lazy_static! { static ref HANDLER: fn() = my_handler; }`. Tree-sitter sees `lazy_static` and `HANDLER` and `my_handler` as identifiers -- so `my_handler` gets counted. But `derive` macros like `#[derive(Serialize)]` generate `serialize()` calls that tree-sitter does NOT see as identifiers. A manually implemented `fn serialize(&self)` on the same struct could be dead but rescued by the derive-generated reference. Actually wait -- derive macros expand at compile time, not in source. Tree-sitter only sees source. So `serialize` from a derive macro does NOT appear in tree-sitter output. |
| **Concrete risky example** | Rust: `register_handlers!(handle_auth, handle_login, handle_logout);` -- if the macro invocation puts these names inside the macro call arguments, tree-sitter DOES see them as identifiers. This is fine. But if the macro uses `concat_idents!` or string-based token generation, tree-sitter does NOT see the generated identifier. |
| **Mitigation** | For Rust: functions with `#[no_mangle]`, `#[export_name]`, or any proc-macro attribute should be excluded (already handled by C8 decorator exclusion). For C/C++: `#define`-based function definitions are visible to tree-sitter because the preprocessor runs before compilation, but tree-sitter parses raw source. So `#define DEFINE_HANDLER(name) void name() {}` -- tree-sitter sees `DEFINE_HANDLER` and `name` but NOT the expanded function. Mitigation: detect common macro invocation patterns and mark functions inside macro arguments as LOW confidence. |
| **Already handled?** | **PARTIALLY.** C8 (decorator exclusion) at `dead.rs:182-184` handles functions with decorators/attributes, which covers `#[test]`, `#[no_mangle]`, etc. But it does NOT handle (a) functions whose ONLY reference comes through a macro expansion, or (b) C/C++ `#define`-generated function bodies. |

### 1.6 Comments and string literals containing function names

| | |
|---|---|
| **Risk** | Tree-sitter's `(identifier)` nodes do NOT appear inside comments or plain string literals. So `# unused_func is deprecated` does NOT inflate ref_count for `unused_func`. This is correct. However, there is a subtle case: **doc-test code blocks** in Python (`>>> unused_func()`) or Rust (`/// # unused_func()`) may or may not parse as identifiers depending on the language grammar. |
| **Severity** | **LOW** |
| **Concrete example** | A Rust doc comment `/// Use [`my_func`] for details` -- tree-sitter for Rust does NOT parse doc comments as code. The `my_func` inside backticks is part of a `(line_comment)` node, not an `(identifier)`. So ref_count is correct. |
| **Mitigation** | No mitigation needed. Tree-sitter naturally excludes identifiers from comments and string literals. This is a strength over grep-based approaches. |
| **Already handled?** | **YES**, inherently by tree-sitter's grammar. `(identifier)` nodes never appear inside `(comment)` or `(string)` nodes. |

### 1.7 Function not present in ref_counts map at all

| | |
|---|---|
| **Risk** | If a function definition is extracted from a file that was NOT included in the identifier-counting pass (different file set, race condition, or file added after counting), the function name will not appear in `ref_counts` at all. `ref_counts.get(name)` returns `None`. In `is_rescued_by_refcount()`, this returns `false` (not rescued). In `dead_code_analysis_refcount()`, the function then falls through to dead/possibly-dead classification. This is actually CORRECT behavior -- a function with no references is dead. But it could also happen if the function name contains unusual characters that prevent matching. |
| **Severity** | **LOW** |
| **Concrete example** | Function `my_func` defined in `new_file.py`, but `new_file.py` was not included in the tree-sitter parsing pass. `ref_counts` has no entry for `my_func`. `is_rescued_by_refcount("my_func", &ref_counts)` returns `false`. The function is classified as dead. If it IS truly dead, this is correct. If it IS called from within `new_file.py` (which was missed), this is a false positive. |
| **Mitigation** | Ensure the file set for identifier counting is a SUPERSET of the file set for function extraction. The aggregation layer should use the same file list for both passes. Add an assertion or warning if a function's own file was not parsed for identifiers. |
| **Already handled?** | **NO**, but the risk is mitigated by design: the spec says (C11) both passes happen in the same loop over the same files. As long as the implementation follows this, the sets will be consistent. However, there is no runtime assertion to verify this. |

---

## Pass 2: Name Collisions

These are cases where the scope-agnostic name counting produces incorrect results
because the same identifier string refers to different entities in different contexts.

### 2.1 Go: `New`, `String`, `Error` across many structs

| | |
|---|---|
| **Risk** | Go convention produces many methods named `New`, `String`, `Error` across different types. `func (u *User) String() string` and `func (p *Product) String() string` are DIFFERENT functions but share the name `String`. If `User.String()` is dead but `Product.String()` is alive, the ref_count for `"String"` will be high, rescuing both. **False negative**: dead `User.String()` is not detected. |
| **Severity** | **HIGH** |
| **Concrete example** | 50 structs each implement `String()`. Only 10 are actually called. ref_count("String") = 60+ (50 definitions + 10+ calls). ALL 50 are rescued, including the 40 dead ones. |
| **Mitigation** | **Scope-qualified counting**: For Go, count `TypeName.MethodName` as a separate identifier from bare `MethodName`. The `collect_all_functions()` function at `dead.rs:347` already creates qualified names like `ClassName.method`. The ref_count map should also index by qualified name where possible. For Go, when tree-sitter encounters a `(call_expression (selector_expression object: (identifier) @type field: (field_identifier) @method))`, count `type.method` as a combined reference. |
| **Alternative mitigation** | Add Go-specific common names (`New`, `String`, `Error`, `Close`, `Init`, `Reset`, `Len`, `Less`, `Swap`) to the C3 "high collision risk" exclusion list. These names should not be rescued by bare-name refcount. They should only be rescued by QUALIFIED refcount or not at all. |
| **Already handled?** | **PARTIALLY.** `is_rescued_by_refcount()` at `refcount.rs:127-131` extracts the bare name from `MyClass.method` and checks `ref_counts` for the bare name. This CAUSES the problem -- `ref_counts["String"]` aggregates all `String` references across all types. The qualified name check at `refcount.rs:146-151` checks the full qualified name, but `ref_counts` is built from tree-sitter identifiers which are bare names, so `ref_counts["User.String"]` will be 0 (no tree-sitter node produces that compound name). The bare name fallback then kicks in and rescues incorrectly. |

### 2.2 Java/Kotlin: `get`/`set` bean conventions

| | |
|---|---|
| **Risk** | Java beans generate `getX()` and `setX()` methods for every field. The bare names `get` and `set` are not the issue (they are < 4 chars but >= 3 chars, so they pass the C3 length check). The issue is names like `getName`, `setName`, `getValue`, `setValue` -- each bean class produces these, and the bean convention means many classes share identical method names. |
| **Severity** | **MEDIUM** |
| **Concrete example** | 30 classes each have `getName()`. 25 are dead (fields accessed directly, getter unused). `ref_counts["getName"] = 35` (30 defs + 5 actual calls). All 30 rescued. 25 false negatives. |
| **Mitigation** | For Java/Kotlin: Add `get*`/`set*`/`is*` bean method patterns to the high-collision list. These methods should require QUALIFIED refcount (i.e., `ClassName.getName` appearing in `ref_counts`) to be rescued. Bare-name rescue for bean methods produces too many false negatives. |
| **Already handled?** | **NO.** The spec (C3) lists `get`, `set` as common names but only for EXACT matches. `getName`, `setValue`, etc. are not in any exclusion list. |

### 2.3 Rust: `new`, `from`, `into`, `default`, `fmt` trait conventions

| | |
|---|---|
| **Risk** | Rust convention means every struct has `fn new(...)`, many implement `From::from()`, `Default::default()`, `Display::fmt()`. These method names are shared across hundreds of impl blocks. A dead `MyStruct::new()` is rescued because `new` appears everywhere. |
| **Severity** | **HIGH** |
| **Concrete example** | 100 structs each have `fn new()`. 60 are unused. `ref_counts["new"] = 200+`. All rescued. 60 false negatives. |
| **Mitigation** | Two-pronged: (1) Add Rust convention names (`new`, `from`, `into`, `default`, `fmt`, `drop`, `deref`, `deref_mut`, `next`, `clone`, `eq`, `partial_cmp`, `cmp`, `hash`, `index`) to a "never rescue by bare name" list. (2) For these names, only rescue if the QUALIFIED name (`StructName.new`) appears in ref_counts with count > 1. Since tree-sitter produces bare identifiers, the qualified rescue will typically fail, so these methods fall through to dead/possibly-dead classification, which is correct because C6 (trait methods) should catch the legitimate ones. |
| **Already handled?** | **NO.** `new` is 3 chars (passes C3 length check). `from` is 4 chars. `default` is 7 chars. None are in any exclusion list. The spec acknowledges this risk in C3's "Per-Language High-Risk Common Names" table but the implementation does NOT enforce it. The table is documentation only -- no code reads it. |

### 2.4 Python decorators that rename functions

| | |
|---|---|
| **Risk** | Python decorators can rename functions: `@functools.wraps(original)` preserves the name, but custom decorators may not. `@register("custom_name")` can register a function under a different name than its definition. The function's AST name is `def handler():` but it is called as `dispatch["custom_name"]()`. Neither the definition name nor the registered name will match in ref_counts. |
| **Severity** | **LOW** |
| **Concrete example** | `@app.command(name="deploy") def deploy_handler():` -- the function is registered as `"deploy"` but defined as `deploy_handler`. `ref_counts["deploy_handler"] = 1` (only def). BUT: C8 (decorator exclusion) at `dead.rs:182-184` already skips all decorated functions. |
| **Mitigation** | Already mitigated by C8. Any function with a decorator is excluded from dead code analysis entirely, regardless of refcount. |
| **Already handled?** | **YES.** C8 exclusion at `dead.rs:182-184` skips `func_ref.has_decorator`. As long as the AST extractor correctly sets `has_decorator = true` for decorated functions, renamed functions are safe. |

### 2.5 Cross-file same-name collisions (scope-agnostic counting)

| | |
|---|---|
| **Risk** | Two files define a function with the same name. File A defines `process()` (dead). File B defines `process()` (alive, called 10 times). `ref_counts["process"] = 12` (2 defs + 10 calls). Dead `process()` in File A is rescued by alive `process()` in File B. **False negative.** |
| **Severity** | **HIGH** -- This is the fundamental limitation of scope-agnostic refcounting, acknowledged by the spec and shared by Vulture (Python) and the `unused` tool. |
| **Concrete example** | `utils/string_utils.py` has `def process(s):` (dead). `core/data_pipeline.py` has `def process(data):` (alive, called from 10 places). `ref_counts["process"] = 12`. Both rescued. |
| **Mitigation** | Three options, in increasing implementation cost: (1) **Accept the false negatives** -- this is what Vulture does. The spec acknowledges it: "scope-agnostic counting... causes false negatives but eliminates false positives." (2) **File-qualified counting**: Build `ref_counts` as `HashMap<(PathBuf, String), usize>` -- count `(file_a.py, "process")` separately from `(file_b.py, "process")`. This requires knowing which file each reference targets, which is NOT possible with bare identifier counting. (3) **Hybrid approach**: Use bare-name refcount as the primary check, but add a secondary pass that checks if ALL references to a name come from the same file as the definition -- if so, the function may be dead despite ref_count > 1 (references could be recursive self-calls). |
| **Already handled?** | **NO**, and this is BY DESIGN. The spec explicitly chooses scope-agnostic counting (Phase 2 in REFCOUNT_SPEC.md) for simplicity and speed. The trade-off is documented: fewer false positives, more false negatives. However, the magnitude of this issue for common names is larger than the spec suggests. |

### 2.6 PHP magic method names colliding with user methods

| | |
|---|---|
| **Risk** | PHP uses `__construct`, `__destruct`, `__call`, `__get`, `__set`, `__toString`, etc. These are analogous to Python dunders. If user code defines a method named `__call` (which IS a magic method), the C5 dunder check at `dead.rs:166-169` only matches `__X__` (double underscore on BOTH sides). PHP magic methods like `__construct` have double underscore only on the LEFT side. They will NOT be caught by the dunder check. |
| **Severity** | **MEDIUM** |
| **Concrete example** | `class Foo { function __construct() {} }` -- `__construct` starts with `__` but ends with `t`, not `__`. The dunder check `bare_name.starts_with("__") && bare_name.ends_with("__")` returns FALSE. The function is NOT excluded. If ref_count == 1, it is flagged as dead. But `__construct` is called implicitly by `new Foo()`. **False positive.** |
| **Mitigation** | Add a PHP-specific magic method exclusion list that fires BEFORE the dunder check. The list should include all PHP magic methods from the spec (C12, "PHP" section): `__construct`, `__destruct`, `__call`, `__callStatic`, `__get`, `__set`, `__isset`, `__unset`, `__sleep`, `__wakeup`, `__serialize`, `__unserialize`, `__toString`, `__invoke`, `__set_state`, `__clone`, `__debugInfo`. Check: `if language == PHP && name starts with "__"` then exclude. |
| **Already handled?** | **NO.** The dunder check at `dead.rs:167` requires BOTH `starts_with("__")` AND `ends_with("__")`. PHP magic methods only have leading `__`. They are NOT excluded. The spec documents the PHP magic methods in the "Additional Language-Specific Exclusion Patterns" section but the implementation does not enforce them. |

### 2.7 Ruby metaprogramming: `method_missing`, `define_method`, `send`

| | |
|---|---|
| **Risk** | Ruby can define methods dynamically via `define_method(:method_name)` and call methods via `send(:method_name)` or `public_send(:method_name)`. In both cases, the method name appears as a symbol literal (`:method_name`), not as an identifier. Tree-sitter for Ruby will parse `:method_name` as a `(symbol)` node, not an `(identifier)` node. The refcount system only counts `(identifier)` and `(constant)` nodes for Ruby (per `refcount.rs:34`). So dynamic method calls via symbols are invisible. |
| **Severity** | **MEDIUM** |
| **Concrete example** | `define_method(:calculate) { |x| x * 2 }` -- defines a method `calculate`. Later: `obj.send(:calculate, 5)` -- calls it. Tree-sitter sees `:calculate` as a `(symbol)`, NOT counted. `ref_counts["calculate"]` might be 0 or just the `define_method` call. If the method is not also called normally, it appears dead. **False positive.** |
| **Mitigation** | For Ruby, also count `(symbol)` nodes that match known function names. Or: add `(simple_symbol)` to the Ruby identifier node types in `identifier_node_types()`. Current Ruby types are `["identifier", "constant"]` at `refcount.rs:34`. Adding `"simple_symbol"` would capture `:method_name` references. Trade-off: this also counts non-method symbols, slightly inflating ref_counts (more false negatives). |
| **Already handled?** | **NO.** `identifier_node_types(Language::Ruby)` at `refcount.rs:34` returns `["identifier", "constant"]`. Symbols are not counted. |

---

## Pass 3: Performance + Correctness

### 3.1 Large repos (50k+ files): HashMap memory usage

| | |
|---|---|
| **Risk** | `count_identifiers_in_tree()` returns `HashMap<String, usize>` per file. The aggregation layer (not yet implemented) will merge these into a single global `HashMap<String, usize>`. For a 50k-file repo with an average of 500 unique identifiers per file, the global map could have ~1-5 million entries (many shared names). Each entry is ~50 bytes (String heap alloc + usize + HashMap overhead). Total: ~50-250 MB. This is acceptable for most systems. |
| **Severity** | **LOW** |
| **Concrete example** | Linux kernel: ~28k .c/.h files, ~800k unique identifiers. HashMap: ~40 MB. Acceptable. Monorepo with 200k files: ~5M unique identifiers. HashMap: ~250 MB. Borderline on CI machines with 512 MB RAM limits. |
| **Mitigation** | (1) Use `FxHashMap` from `rustc-hash` instead of `std::collections::HashMap` -- 2-3x faster for string keys, same memory. (2) Intern strings using a `StringInterner` crate to deduplicate string allocations (many filenames and identifiers are shared). (3) For truly massive repos, consider a two-pass approach: first collect function definition names, then scan identifiers only for those names (filtering during walk). |
| **Already handled?** | **NO.** `refcount.rs:61` uses `HashMap<String, usize>`. No FxHashMap, no string interning. The spec mentions FxHashMap as an optimization (Phase 2, line 853) but the implementation uses standard HashMap. |

### 3.2 Unicode identifiers

| | |
|---|---|
| **Risk** | Rust, Python 3, Go, and others support Unicode identifiers. `fn cafe\u{0301}()` (e with combining accent) vs `fn caf\u{00e9}()` (precomposed e-acute) are visually identical but different byte sequences. If a function is DEFINED with one form and CALLED with another, the string match in `ref_counts` will fail. The definition gets ref_count == 1 (only definition) and is flagged dead. **False positive.** |
| **Severity** | **LOW** (rare in practice -- most codebases use ASCII identifiers) |
| **Concrete example** | Python: `def na\u00efve():` (precomposed) called as `nai\u0308ve()` (decomposed). `ref_counts["na\u00efve"] = 1`, `ref_counts["nai\u0308ve"] = 1`. Both appear only once. Function flagged dead despite being called. |
| **Mitigation** | Apply NFC normalization (Unicode Normalization Form C) to all identifier strings before inserting into the ref_counts map. Rust's `unicode-normalization` crate provides this. Cost: negligible for ASCII strings (fast path), slight overhead for non-ASCII. |
| **Already handled?** | **NO.** `refcount.rs:78` does `text.to_string()` on raw UTF-8 bytes. No normalization. `std::str::from_utf8` at line 76 handles valid UTF-8 but does not normalize. |

### 3.3 Empty files and binary files mixed in

| | |
|---|---|
| **Risk** | Empty files produce an empty tree -- `count_identifiers_in_tree` returns an empty HashMap. This is correct and harmless. Binary files (images, compiled .pyc, .class files) may be passed to tree-sitter if the file-discovery layer does not filter them. Tree-sitter will fail to parse, returning a tree full of ERROR nodes. The `from_utf8` call at `refcount.rs:76` will fail for non-UTF8 sequences, causing those identifiers to be silently skipped (the `if let Ok(text)` check handles this). |
| **Severity** | **LOW** |
| **Concrete example** | A `.pyc` file in the project directory. Tree-sitter attempts to parse it as Python, gets garbage. Most nodes are ERROR. Any accidental identifier matches in binary data are filtered by `from_utf8` (binary data is unlikely to be valid UTF-8 at identifier boundaries). |
| **Mitigation** | The file-discovery layer should filter binary files before passing to the refcount system. Check for null bytes in the first 8KB of the file, or use file extension allowlists. The current implementation handles this gracefully (silent skip via `from_utf8` check), but it wastes time parsing binary files. |
| **Already handled?** | **PARTIALLY.** The `from_utf8` check at `refcount.rs:76` prevents binary data from being counted as identifiers. But the file is still fully parsed by tree-sitter, wasting CPU time. No binary-file pre-filter exists in the refcount module. |

### 3.4 Symlink cycles

| | |
|---|---|
| **Risk** | If the project contains symlink cycles (e.g., `src/link -> src/`), the file-discovery layer may loop infinitely or process the same file multiple times. Processing a file twice doubles its identifiers' ref_counts, potentially rescuing dead functions. **False negative.** |
| **Severity** | **LOW** (symlink cycles are rare and typically caught by file-discovery layers) |
| **Mitigation** | The file-discovery layer (not in `refcount.rs`) should track visited inodes or canonical paths to avoid processing the same file twice. This is a concern for the aggregation layer, not the per-file counting function. |
| **Already handled?** | **N/A** -- the aggregation layer does not exist yet. When implementing it, use `PathBuf::canonicalize()` and a `HashSet<PathBuf>` of processed files. |

### 3.5 Function definition name not matching ref_count key

| | |
|---|---|
| **Risk** | The function name in `FunctionRef.name` may not match the key used in `ref_counts`. Several mismatch patterns exist: (a) `FunctionRef.name = "MyClass.method"` (qualified) but `ref_counts` has `"method"` (bare) -- handled by `is_rescued_by_refcount`'s bare-name extraction. (b) `FunctionRef.name = "method"` but `ref_counts` has `"method"` -- exact match, works. (c) `FunctionRef.name = "MyClass.method"` but the actual tree-sitter identifier for the definition is `"method"` -- the bare-name check handles this. (d) Go: `FunctionRef.name = "(*T).Method"` but `ref_counts` has `"Method"` -- if the qualified name uses Go receiver syntax with `*` or parentheses, the bare-name extraction via `rsplit('.')` will produce `"Method"` from `"(*T).Method"`. This is correct. |
| **Severity** | **LOW** |
| **Concrete example** | Go function ref: `"(*UserService).GetUser"`. Bare name extraction: `rsplit('.') -> "GetUser"`. `ref_counts["GetUser"]` has the correct count. Works correctly. |
| **Mitigation** | The current `rsplit('.')` bare-name extraction handles the common cases. Edge case: Lua names like `module:method` use `:` not `.`. The `rsplit('.')` at `refcount.rs:128` will not split on `:`. |
| **Already handled?** | **MOSTLY YES.** `rsplit('.')` handles Python `Class.method`, Go `Type.Method`, Rust `Struct.method`. But **NOT** Lua's `module:method` syntax. For Lua, add `rsplit(':')` as a secondary check if `rsplit('.')` does not find a separator. |

### 3.6 `handle` / `Handler` prefix in `is_entry_point_name` is too aggressive

| | |
|---|---|
| **Risk** | The entry point check at `dead.rs:278` excludes ANY function starting with `handle` or `Handle`. This is extremely aggressive. Functions like `handlebar_escape()`, `handle_count` (a variable-like name), or `Handler` (the type name, not a function) are all excluded. This causes **false negatives** -- truly dead functions with "handle" prefix are never reported. |
| **Severity** | **MEDIUM** |
| **Concrete example** | `def handlebar_template_render(template):` -- never called, but excluded because it starts with `handle`. Dead code is silently missed. |
| **Mitigation** | Tighten the prefix patterns. Instead of `starts_with("handle")`, use `starts_with("handle_")` (with underscore) for snake_case languages and `starts_with("Handle") && name[6..].starts_with(uppercase)` for camelCase languages. Or: remove `handle*` from entry point patterns entirely and rely on the decorator check (C8) to catch framework handlers, since most handler functions have decorators like `@app.route`. |
| **Already handled?** | **YES**, this is the current behavior at `dead.rs:278-279`. It is "handled" in the sense that it is intentional, but the intent may be too broad. This is a design decision to revisit, not a bug. |

### 3.7 The `is_entry_point_name` list includes common verbs

| | |
|---|---|
| **Risk** | The standard entry point list at `dead.rs:226-245` includes very common names: `run`, `start`, `load`, `configure`, `request`, `response`, `error`, `invoke`, `call`, `execute`, `init`, `destroy`, `service`, `app`. Any function with these exact names is excluded from dead code reports. In a large codebase, many of these are genuinely dead helper functions, not entry points. |
| **Severity** | **MEDIUM** |
| **Concrete example** | `def execute(query, params):` -- a dead database helper function. Excluded because `execute` is in the standard patterns. Never reported as dead. |
| **Mitigation** | Make the entry point list language-aware. For example, `init` should be an entry point for Go (called automatically) but not necessarily for Python. `ServeHTTP` is Go-specific. `doGet`/`doPost` are Java-specific. The current list applies ALL patterns to ALL languages, producing false negatives for languages where a name is not an entry point. Add a `language` parameter to `is_entry_point_name()`. |
| **Already handled?** | **NO.** `is_entry_point_name()` at `dead.rs:224` does not take a `language` parameter. All patterns apply universally. |

### 3.8 Cyclic dead code clusters (mutual recursion)

| | |
|---|---|
| **Risk** | Two or more functions that call each other but are never called from live code. Example: `fn a() { b(); }` and `fn b() { a(); }` -- neither is reachable from `main`. ref_count("a") = 2 (def + call from b), ref_count("b") = 2 (def + call from a). Both are rescued. **False negative.** The call-graph-based `dead_code_analysis()` at `dead.rs:37` also misses this because both appear in `called_functions`. |
| **Severity** | **HIGH** |
| **Concrete example** | A deprecated state machine with 5 mutually-recursive functions. None called from anywhere else. All have ref_count >= 2. None detected as dead by either algorithm. |
| **Mitigation** | Implement a cycle-detection post-pass: (1) After the main refcount analysis, collect all functions with ref_count > 1 that are NOT in the definitely-alive set (i.e., not called from entry points, not called from functions with high fan-in). (2) For each such function, check if ALL its references come from OTHER functions in this "suspicious" set. (3) If yes, the entire cluster is dead. This is equivalent to finding strongly-connected components (SCCs) in the call graph that have no incoming edges from outside the SCC. Tarjan's algorithm is already available at `analysis/tarjan.rs`. |
| **Already handled?** | **NO.** Neither algorithm detects cyclic dead code. The spec acknowledges this in "Open Questions" item 1. `tarjan.rs` exists but is not used for dead code analysis. |

### 3.9 Inconsistent qualified name formats between definition and reference

| | |
|---|---|
| **Risk** | `collect_all_functions()` at `dead.rs:347` creates qualified names as `ClassName.method_name`. But tree-sitter identifier counting produces bare names only (`method_name`). The `is_rescued_by_refcount()` function at `refcount.rs:125` extracts the bare name via `rsplit('.')` and checks BOTH the bare name AND the full qualified name against `ref_counts`. Since `ref_counts` only has bare names, the qualified check always fails, and the bare check is the actual decision maker. This means ALL class methods are evaluated by their bare name, regardless of class context. |
| **Severity** | **Informational** (not a bug, but important to understand) |
| **Concrete example** | `MyClass.process` -- `is_rescued_by_refcount` checks `ref_counts["process"]` (the bare name). This aggregates ALL references to `process` across all classes. If ANY other class has a method called `process`, the ref_count is inflated. This is the same issue as 2.1/2.5 but seen from the implementation angle. |
| **Mitigation** | Same as 2.1/2.5 -- implement scope-qualified counting in the tree-sitter walk when class context is available. |
| **Already handled?** | **NO**, by design. See 2.5. |

---

## Summary: Severity Matrix

| ID | Risk | Severity | Type | Already Handled? |
|----|------|----------|------|-----------------|
| 1.1 | Language keyword names inflating refcount | MEDIUM | False Negative | NO |
| 1.2 | Tree-sitter parse errors | LOW | Correctness | NO |
| 1.3 | Generated code inflating refcount | MEDIUM | False Negative | NO |
| 1.4 | String interpolation identifiers | LOW | N/A (correct) | YES |
| 1.5 | Macro-generated functions | HIGH | False Positive/Negative | PARTIAL |
| 1.6 | Comments/strings containing names | LOW | N/A (correct) | YES |
| 1.7 | Function not in ref_counts map | LOW | False Positive | NO (by design) |
| 2.1 | Go `New`/`String`/`Error` collision | HIGH | False Negative | NO |
| 2.2 | Java bean `get*`/`set*` collision | MEDIUM | False Negative | NO |
| 2.3 | Rust `new`/`from`/`default` collision | HIGH | False Negative | NO |
| 2.4 | Python decorator renaming | LOW | N/A | YES (C8) |
| 2.5 | Cross-file same-name collision | HIGH | False Negative | NO (by design) |
| 2.6 | PHP magic methods (`__construct`) | MEDIUM | False Positive | NO |
| 2.7 | Ruby symbol-based method dispatch | MEDIUM | False Positive | NO |
| 3.1 | Large repo HashMap memory | LOW | Performance | NO |
| 3.2 | Unicode normalization | LOW | False Positive | NO |
| 3.3 | Empty/binary files | LOW | Performance | PARTIAL |
| 3.4 | Symlink cycles | LOW | False Negative | N/A |
| 3.5 | Qualified name format mismatch | LOW | Correctness | MOSTLY YES |
| 3.6 | `handle*` prefix too aggressive | MEDIUM | False Negative | YES (intentional) |
| 3.7 | Entry point list too broad | MEDIUM | False Negative | NO |
| 3.8 | Cyclic dead code clusters | HIGH | False Negative | NO |
| 3.9 | Qualified vs bare name in ref_counts | Informational | Architecture | NO (by design) |

---

## Recommended Priority Actions

### P0: Must Fix Before Shipping (HIGH severity, causing wrong results)

1. **[2.1, 2.3, 2.5] Per-language common-name exclusion from bare-name rescue.**
   Add a `HIGH_COLLISION_NAMES` map per language in `refcount.rs`. Names in this map
   are NOT rescued by bare-name refcount. They can only be rescued by qualified-name
   refcount or not at all. Implementation: ~30 lines in `is_rescued_by_refcount()`.

2. **[2.6] PHP magic method exclusion.**
   The dunder check uses `starts_with("__") && ends_with("__")` which misses PHP
   magic methods. Add a language-aware check: if PHP and `starts_with("__")`, exclude.
   Implementation: ~10 lines in `dead_code_analysis_refcount()`.

3. **[3.8] Cyclic dead code detection post-pass.**
   Use `tarjan.rs` to find SCCs in the call graph, then check if any SCC has no
   incoming edges from outside. Mark all functions in such SCCs as dead.
   Implementation: ~50 lines as a post-pass after the main refcount analysis.
   NOTE: This requires keeping the call graph available alongside ref_counts.

### P1: Should Fix (MEDIUM severity, significant false negatives)

4. **[1.1] Language built-in name exclusion list.**
   Python: `type`, `list`, `dict`, `set`, `map`, `filter`, `input`, `print`, `open`, `range`, `len`, `str`, `int`, `float`, `bool`, `bytes`, `tuple`, `object`, `super`, `property`, `staticmethod`, `classmethod`.
   Implementation: add to `is_rescued_by_refcount()` with a language parameter.

5. **[2.2] Java/Kotlin bean method pattern exclusion.**
   Methods matching `get[A-Z]*`, `set[A-Z]*`, `is[A-Z]*` should not be rescued by
   bare-name refcount. Implementation: ~15 lines.

6. **[3.7] Language-aware entry point list.**
   Split the entry point list by language so Go-specific entry points (`init`, `ServeHTTP`)
   don't suppress dead code detection in Python/Java.

7. **[2.7] Ruby symbol counting.**
   Add `"simple_symbol"` or equivalent to `identifier_node_types(Language::Ruby)`.

### P2: Nice to Have (LOW severity, edge cases)

8. **[1.2] Tree-sitter error node check.** Skip or warn on files with parse errors.
9. **[1.3] Generated code file exclusion.** Filter `*_pb2.py`, `*.pb.go`, `*_generated.*`.
10. **[3.1] FxHashMap optimization.** Replace `HashMap` with `FxHashMap`.
11. **[3.2] Unicode NFC normalization.** Normalize identifier strings.
12. **[3.5] Lua colon-separated method names.** Add `rsplit(':')` fallback.

---

## Test Cases to Add

For each P0/P1 item, here are concrete test cases:

```rust
// P0: [2.1] Go common name collision
#[test]
fn test_go_common_name_not_rescued() {
    let mut ref_counts = HashMap::new();
    ref_counts.insert("String".to_string(), 50); // 50 structs implement String()
    // "String" should NOT be rescued despite high refcount (Go common name)
    // Need language-aware is_rescued_by_refcount
}

// P0: [2.3] Rust common name collision
#[test]
fn test_rust_new_not_rescued_by_bare_name() {
    let mut ref_counts = HashMap::new();
    ref_counts.insert("new".to_string(), 100); // 100 structs have new()
    // "MyStruct.new" should NOT be rescued by bare name "new"
}

// P0: [2.6] PHP magic methods
#[test]
fn test_php_magic_methods_excluded() {
    // __construct should be excluded even though it doesn't match __X__ pattern
    // __call, __get, __set, __toString, etc.
}

// P0: [3.8] Cyclic dead code
#[test]
fn test_cyclic_dead_code_detected() {
    // fn a() { b(); } fn b() { a(); } -- neither reachable from main
    // Both should be flagged as dead despite ref_count > 1
}

// P1: [1.1] Python built-in name collision
#[test]
fn test_python_builtin_name_not_rescued() {
    let mut ref_counts = HashMap::new();
    ref_counts.insert("type".to_string(), 200); // built-in used everywhere
    // User-defined "type" function should NOT be rescued
}
```

---

**END OF ADVERSARIAL PREMORTEM**