## Agents
- Use `jq` to analyze json outputs from ast.
- [poolmanager.json](./poolmanager.json) - ast generated by solc (single-file build fixture for PoolManager).
## Prerequisites: AST Identity Model
Before working on any feature that touches goto-definition, references, call hierarchy,
implementation, rename, or the caching system, you **must** understand how file IDs and
node IDs work. Getting this wrong causes cross-build collisions that are extremely hard
to debug (a function in one build silently maps to a completely different function in
another build).
### The two ID types
| `FileId(u64)` | `types.rs:24` | unsigned 64-bit | `PathInterner` (canonical) | **Yes** — same path always gets same ID |
| `NodeId(i64)` | `types.rs:4` | signed 64-bit | solc (per-compilation) | **No** — same function can get different IDs |
There is also `SolcFileId(String)` — a string wrapper used as HashMap keys matching
solc's JSON output (e.g. `"0"`, `"34"`). It is the stringified form of a file ID.
**Node IDs are signed** because solc uses negative IDs for built-in symbols (`-1` for
`abi`, `-15` for `msg`, `-18` for `require`, `-28` for `this`).
### The `src` string format
Every AST node has a `src` field in the format `"offset:length:fileId"`:
- `offset` — byte offset from the start of the source file
- `length` — byte length of the source range
- `fileId` — which source file this location belongs to
Parsed by `SourceLoc::parse()` in `types.rs`. After canonicalization, `fileId` is a
canonical `FileId` from the `PathInterner`, not solc's original per-compilation ID.
### Why file IDs are unstable from solc
Solc assigns file IDs sequentially based on input order. If you compile `Foo.sol` first,
it gets ID 0. If you compile `Bar.sol` first, `Foo.sol` gets a different ID. A single-file
build of `PoolManager.sol` produces different file IDs than a full project build that
includes all 160 source files.
### How we solve file ID instability: PathInterner
`PathInterner` (`types.rs:718`) is a project-wide, append-only table that assigns canonical
`FileId` values from file paths. It lives on `ForgeLsp` behind `Arc<RwLock<PathInterner>>`.
**The invariant:** Once a path is interned, it keeps the same ID for the lifetime of the
session. Every `CachedBuild::new()` call (the only production constructor for fresh builds)
does this:
1. Calls `interner.build_remap(&solc_id_to_path)` — for each file in this compilation,
interns its path and builds a translation table `{solc_file_id → canonical_FileId}`.
2. Calls `canonicalize_node_info()` on every `NodeInfo` — rewrites the `fileId` component
in `src`, `name_location`, `name_locations`, and `member_location` strings.
3. Rewrites `external_refs` keys the same way.
4. Sets `id_to_path_map = interner.to_id_to_path_map()` — the canonical map.
After this, **all builds share the same file-ID space**. You can safely resolve any `src`
string from any build using any build's `id_to_path_map`. This is the foundation that
makes merging builds and cross-build `src` lookups safe.
**Key code path:** `goto.rs` → `CachedBuild::new()` → `build_remap()` + `canonicalize_node_info()`
### Why node IDs are unstable across compilations
Solc assigns node IDs as a monotonically increasing counter during AST construction. The
counter's value depends on how many nodes have been processed before a given declaration.
When the compilation closure changes (different files in scope), the same function gets a
different numeric ID.
**Concrete example from our debugging:**
- File build of `PoolManager.sol`: `swap` function = node ID 616
- Sub-cache build of a library: node ID 616 = a completely different function
- Searching all files for bare node ID 616 across builds would return the wrong function
This is explicitly documented in code:
- `references.rs:408`: "Node IDs are not stable across builds, but byte offsets within a file are."
- `lsp.rs:103`: "Each sub-cache has its own node ID space — matching across caches is done by absolute file path + byte offset, not by node ID."
### The stable anchor: file path + byte offset
The server uses `(absolute_file_path, byte_offset)` as the cross-build-safe identifier
for any source location. This pair is stable because:
- File paths don't change between compilations
- Byte offsets are properties of the source text, not the compilation
**The pattern for cross-build lookups:**
```
Step 1: In the originating build, resolve to (abs_path, byte_offset)
→ resolve_target_location() in references.rs:411
Step 2: In each target build, re-resolve to that build's node ID
→ byte_to_id(build.nodes, abs_path, byte_offset) in references.rs:131
```
`byte_to_id()` finds the innermost AST node at a byte position using span containment:
for every node in the file, checks `offset <= position < offset + length`, then picks the
narrowest (smallest `length`) match. This gives you the build-local `NodeId` for the same
source location.
### When bare node IDs ARE safe
Within a single build's data, node IDs are globally unique and safe to use freely.
All of these are safe:
- `build.nodes[abs_path][node_id]` — lookup within one build
- `build.decl_index[node_id]` — typed declaration lookup within one build
- `node_info.referenced_declaration` — following a reference within one build
- `build.base_function_implementation[node_id]` — equivalence lookup within one build
- `find_node_info(&build.nodes, node_id)` — search all files within one build
### When bare node IDs are DANGEROUS
Any time you hold a `NodeId` from build A and look it up in build B:
- `builds.iter().find_map(|b| find_node_info(&b.nodes, node_id))` — **WRONG**, leaks
node IDs across builds. A sub-cache may have a completely different function at the
same numeric ID.
- `other_build.decl_index.get(&node_id)` — **WRONG** unless you know both builds compiled
the same file and solc assigned the same IDs (which is true for file build vs project
build of the same file, but NOT for sub-caches).
### Node identity verification
A `NodeId` alone is ambiguous across builds, but a `NodeId` **plus its `NodeInfo`**
carries enough metadata to prove identity. Every node has:
- `name_location` — `"offset:length:canonicalFileId"`, a globally unique position
- The source text at that position — the node's name
Since canonical file IDs are stable (PathInterner) and byte offsets are properties of
the source text, checking `(file_path, name_offset, name_text)` is an O(1) identity
proof. This is implemented as `verify_node_identity()` in `call_hierarchy.rs`:
```rust
// O(1) identity check: does node_id in this build refer to the expected entity?
verify_node_identity(
&build.nodes,
node_id,
expected_abs_path, // which file
expected_name_offset, // byte offset of name_location
expected_name, // function/modifier/contract name
) -> bool
```
The check is: look up `build.nodes[abs_path][node_id]`, parse its `name_location`
offset, compare against the expected offset, then read the source bytes at that span
to confirm the name matches. If all three match (file + offset + name), this is
guaranteed to be the same source entity regardless of which compilation produced
the build.
### The resolve pattern (call hierarchy)
When iterating multiple builds to find a target function, use
`resolve_target_in_build()` (`call_hierarchy.rs`):
```rust
for build in &builds {
let ids = resolve_target_in_build(
build, node_id, target_abs, target_name, target_name_offset,
);
// ids is empty if this build doesn't contain the target,
// or contains the verified node ID(s) for the target.
}
```
This uses a two-tier strategy:
1. **Fast path (O(1)):** `verify_node_identity()` — if the numeric ID exists and
passes identity verification, accept it immediately.
2. **Slow path (O(n)):** `byte_to_id()` — if the ID doesn't exist or fails
verification (e.g. sub-cache with a different function at the same numeric ID),
re-resolve by byte offset using span containment.
This replaces the older pattern of `contains_key` + inline name/position scan.
Used in both `callHierarchy/incomingCalls` and `callHierarchy/outgoingCalls`.
### Deduplication across builds
When the same function appears in multiple builds (file build + project build both
contain `PoolManager.swap`), the results will have different `NodeId`s but the same
source position. Always dedup by **source position** (e.g. `selectionRange.start`),
never by node ID.
### TreeSitter as a complementary system
TreeSitter operates on the **live buffer text** (including unsaved edits) and is
completely independent of solc's AST and its IDs. It is used for:
- Dirty-file goto-definition fallback (when AST byte offsets are stale)
- Document symbols, semantic tokens, folding ranges, selection ranges
- Signature help (finding the enclosing call expression)
- Code actions, highlight, rename (identifier collection)
TreeSitter nodes are identified by byte ranges in the current buffer, not by any
persistent ID. They are always re-parsed from the current text.
### The three build types
| **File build** | `get_or_fetch_build()` → `CachedBuild::new()` | Target file + its imports | Shared with project build for same file |
| **Project build** | `ensure_project_cached_build()` | All src + test + script files | Shared with file builds for overlapping files |
| **Sub-cache** | `load_lib_cache()` → `from_reference_index()` | Library sub-project files | **Isolated** — different IDs for same functions |
The `builds` vector in LSP handlers is typically:
```rust
let mut builds = vec![&file_build];
if let Some(ref pb) = project_build { builds.push(pb); }
for sc in sub_caches.iter() { builds.push(sc); }
```
**Key rule:** File build and project build share node IDs for the same file.
Sub-caches do NOT — always use the scoped lookup pattern for sub-caches.
### Summary of safe patterns
| Cross-build function lookup | `resolve_target_in_build()` (verify + fallback) | Bare `NodeId` across builds |
| Cross-build node identity | `verify_node_identity(nodes, id, path, offset, name)` | `contains_key()` without validation |
| Cross-build `src` resolution | Any build's `id_to_path_map` (canonical) | Raw solc file IDs |
| Dedup across builds | Source position (`Range.start`) | Node ID comparison |
| Sub-cache node lookup | `verify_node_identity()` → `byte_to_id()` fallback | `find_node_info()` across all files |
| Within single build | Free use of `NodeId` everywhere | N/A |
### Fixture reference
- [poolmanager.json](./poolmanager.json) — single-file solc AST output for `PoolManager.sol`.
Node IDs in this fixture are from a file-level build. In a full project build, the same
functions will have the same IDs (same file), but a sub-cache build of a different library
will have a completely different mapping of IDs to functions.
Use `jq` to explore the AST:
```sh
# Find all FunctionDefinition nodes and their IDs
jq '[.. | objects | select(.nodeType == "FunctionDefinition") | {id, name}]' poolmanager.json
# Find a specific node's referencedDeclaration targets
# Show the source_id_to_path mapping (solc's per-compilation file IDs)
## Testing and Debugging
Always use `lsp-bench` as the first choice when you want to debug lsp methods and their output. The lsp-bench repo is https://github.com/mmsaki/lsp-bench (local clone path: `../lsp-bench`).
There are many examples on ./benchmarks on how to write a simple yaml config to your needs.
### lsp-bench tips for cross-build features
For features that depend on cross-file data (references, call hierarchy, implementation),
you need a full project index. Add this to your benchmark config:
```yaml
initializeSettings:
projectIndex:
fullProjectScan: true
```
Then use `waitForProgressToken` to wait for the index to complete:
```yaml
- method: callHierarchy/incomingCalls
waitForProgressToken: "solidity/projectIndexFull"
```
Phase 1 (`solidity/projectIndex`) covers src-only files. Phase 2 (`solidity/projectIndexFull`)
covers src + test + script. Cross-file incoming callers require phase 2.
## Building
Always build with `--release` flag
## Documentation Sync Rule
When adding or changing a struct field, LSP method, named data structure, or feature behavior in `src/`, update the corresponding reference page in `docs/pages/reference/` in the same commit.
Also keep these files in sync with each other whenever LSP methods or features change:
- `FEATURES.md` (root) and `docs/pages/docs/features.md` must always match
- `CHANGELOG.md` (root) and `docs/pages/changelog.md` must always match
After any doc changes, run `bun run docs:publish` to deploy to Cloudflare Pages.