# Documentation
This project produces four binaries:
| `lsp-bench` | `src/main.rs` | Run LSP benchmarks, produce JSON snapshots |
| `gen-readme` | `src/gen_readme.rs` | Read a JSON snapshot, generate `README.md` |
| `gen-analysis` | `src/gen_analysis.rs` | Read a JSON snapshot, generate analysis report |
| `gen-delta` | `src/gen_delta.rs` | Read a JSON snapshot, generate compact delta comparison table |
## Quick Start
```sh
git clone --recursive https://github.com/mmsaki/solidity-lsp-benchmarks.git
cd solidity-lsp-benchmarks
cargo build --release
./target/release/lsp-bench init # generates benchmark.yaml
```
Edit `benchmark.yaml` to add your servers and choose which benchmarks to run, then:
```sh
./target/release/lsp-bench # run benchmarks (generates README if configured)
```
To generate a README manually from a specific JSON snapshot:
```sh
./target/release/gen-readme benchmarks/2026-02-13T01-45-26Z.json
```
The generated config uses `examples/Counter.sol` (included in the repo) as the default benchmark target -- a small contract with NatSpec comments and intentional unused variables to trigger diagnostics.
## Prerequisites
Install any LSP servers you want to benchmark. You only need the ones listed in your config:
- [solidity-language-server](https://github.com/mmsaki/solidity-language-server): `cargo install solidity-language-server`
- [solc](https://docs.soliditylang.org/en/latest/installing-solidity.html)
- [nomicfoundation-solidity-language-server](https://github.com/NomicFoundation/hardhat-vscode): `npm i -g @nomicfoundation/solidity-language-server`
- [vscode-solidity-server](https://github.com/juanfranblanco/vscode-solidity): `npm i -g vscode-solidity-server`
- [solidity-ls](https://github.com/qiuxiang/solidity-ls): `npm i -g solidity-ls`
Servers not found on `$PATH` are automatically skipped during benchmarks.
## Commands
| `lsp-bench` | Run benchmarks from config |
| `lsp-bench init` | Generate a `benchmark.yaml` template (won't overwrite existing) |
## Configuration
Benchmarks are configured via a YAML file. By default, `lsp-bench` looks for `benchmark.yaml` in the current directory. Use `-c` to point to a different config.
### Generating a config
```sh
lsp-bench init # creates benchmark.yaml
lsp-bench init -c my-bench.yaml # creates at a custom path
```
This writes a commented template targeting `examples/Counter.sol` with placeholder server entries. Edit it to add your servers and (optionally) point to a different project/file.
### Config structure
```yaml
# Project root containing the Solidity files
project: examples
# Target file to benchmark (relative to project root)
file: Counter.sol
# Target position for position-based benchmarks (0-based, see below)
line: 21
col: 8
# Benchmark settings
iterations: 10
warmup: 2
timeout: 10 # seconds per request
index_timeout: 15 # seconds for server to index/warm up
output: benchmarks # directory for JSON results
# Which benchmarks to run
benchmarks:
- all
# Generate a report after benchmarks (omit to skip)
# report: REPORT.md
# report_style: delta # delta (default), readme, or analysis
# LSP servers to benchmark
servers:
- label: mmsaki
description: Solidity Language Server by mmsaki
link: https://github.com/mmsaki/solidity-language-server
cmd: solidity-language-server
args: []
- label: solc
description: Official Solidity compiler LSP
link: https://docs.soliditylang.org
cmd: solc
args: ["--lsp"]
```
### Config fields
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `project` | yes | -- | Path to the project root (e.g. a git submodule) |
| `file` | yes | -- | Solidity file to benchmark, relative to `project` |
| `line` | no | 102 | Target line for position-based benchmarks (0-based) |
| `col` | no | 15 | Target column for position-based benchmarks (0-based) |
| `iterations` | no | 10 | Number of measured iterations per benchmark |
| `warmup` | no | 2 | Number of warmup iterations (discarded) |
| `timeout` | no | 10 | Timeout per LSP request in seconds |
| `index_timeout` | no | 15 | Time for server to index/warm up in seconds |
| `output` | no | `benchmarks` | Directory for JSON result files |
| `benchmarks` | no | all | List of benchmarks to run (see below) |
| `report` | no | -- | Output path for the generated report (omit to skip report generation) |
| `report_style` | no | `delta` | Report format: `delta`, `readme`, or `analysis` |
| `response` | no | `80` | Response output: `full` (no truncation) or a number (truncate to N chars) |
| `servers` | yes | -- | List of LSP servers to benchmark |
### Selecting benchmarks
The `benchmarks` field controls which benchmarks to run. Use `all` to run everything, or list specific ones:
```yaml
# Run all benchmarks
benchmarks:
- all
# Or pick specific ones
benchmarks:
- initialize
- textDocument/diagnostic
- textDocument/definition
- textDocument/hover
```
If omitted, all benchmarks are run.
Valid benchmark names: `all`, `initialize`, `textDocument/diagnostic`, `textDocument/definition`, `textDocument/declaration`, `textDocument/hover`, `textDocument/references`, `textDocument/documentSymbol`, `textDocument/documentLink`.
### Response truncation
The `response` field controls how much of each LSP response is stored in the JSON output. By default, responses are truncated to 80 characters.
```yaml
# Full response, no truncation
response: full
# Truncate to 200 characters
response: 200
```
When omitted, defaults to 80.
This affects both the per-iteration `response` field in JSON output and the top-level `response` summary. Use `response: true` when you need to inspect the full LSP response for correctness (e.g. verifying Go to Definition returns the right location).
### Server fields
| `label` | yes | -- | Short name shown in results (e.g. `solc`) |
| `description` | no | `""` | Longer description for the README |
| `link` | no | `""` | URL to the server's project page |
| `cmd` | yes | -- | Command to spawn the server (also the binary name when using `commit`) |
| `args` | no | `[]` | Command-line arguments passed to `cmd` |
| `commit` | no | -- | Git ref (branch, tag, or SHA) to checkout and build from |
| `repo` | no | -- | Path to the git repo to build from (required when `commit` is set) |
### Building from commit
When `commit` is set on a server, `lsp-bench` will:
1. `git checkout <commit>` in the `repo` directory
2. `cargo build --release`
3. Use the built binary at `<repo>/target/release/<cmd>`
4. Restore the repo to its original branch/ref afterward
This is useful for comparing performance across branches or commits without manually building each one.
```yaml
servers:
- label: baseline
cmd: solidity-language-server
commit: main
repo: /path/to/solidity-language-server
- label: my-branch
cmd: solidity-language-server
commit: fix/position-encoding
repo: /path/to/solidity-language-server
```
The `cmd` field is used as the binary name inside `target/release/`. The `repo` field must point to a Rust project with a `Cargo.toml`. Both servers can share the same repo — `lsp-bench` builds them sequentially and restores the original ref after each build.
### Target position (line and col)
`line` and `col` use **0-based indexing**, matching the [LSP specification](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#position). This means they are offset by 1 from what your editor displays:
| Config value | Editor display |
|--------------|----------------|
| `line: 0` | line 1 |
| `line: 102` | line 103 |
| `col: 0` | column 1 |
| `col: 15` | column 16 |
To find the right values, open the file in your editor, place the cursor on the identifier you want to benchmark, and subtract 1 from both the line and column numbers.
For example, targeting `number` inside `setNumber` in Counter.sol:
```
line 22 (editor): number = newNumber;
col 9 (editor): ^
```
In the config, this becomes `line: 21`, `col: 8`.
Another example -- targeting `TickMath` in Pool.sol:
```
line 103 (editor): tick = TickMath.getTickAtSqrtPrice(sqrtPriceX96);
col 16 (editor): ^
```
In the config: `line: 102`, `col: 15`.
The position should land on an identifier that LSP methods can act on -- a type name, function call, variable, etc. This is used by `textDocument/definition`, `textDocument/declaration`, `textDocument/hover`, and `textDocument/references` benchmarks. The `initialize`, `textDocument/diagnostic`, `textDocument/documentSymbol`, and `textDocument/documentLink` benchmarks ignore the position.
### Example configs
**Minimal** -- single server, just initialize and diagnostics:
```yaml
project: examples
file: Counter.sol
line: 21
col: 8
benchmarks:
- initialize
- textDocument/diagnostic
servers:
- label: solc
cmd: solc
args: ["--lsp"]
```
**Quick iteration** -- fast feedback during development:
```yaml
project: examples
file: Counter.sol
line: 21
col: 8
iterations: 1
warmup: 0
timeout: 5
index_timeout: 10
benchmarks:
- initialize
- textDocument/hover
servers:
- label: mmsaki
cmd: solidity-language-server
```
**Full suite** -- all benchmarks against Uniswap V4-core:
```yaml
project: v4-core
file: src/libraries/Pool.sol
line: 102 # "TickMath" (editor line 103, col 16)
col: 15
iterations: 10
warmup: 2
output: benchmarks/v4-core
benchmarks:
- all
readme:
- benchmarks/v4-core/README.md
servers:
- label: mmsaki
cmd: solidity-language-server
- label: solc
cmd: solc
args: ["--lsp"]
```
**Per-commit comparison** -- benchmark two branches of the same server with a delta table:
```yaml
project: examples
file: Counter.sol
line: 21
col: 8
report: DELTA.md
servers:
- label: baseline
cmd: solidity-language-server
commit: main
repo: /path/to/solidity-language-server
- label: my-branch
cmd: solidity-language-server
commit: fix/position-encoding
repo: /path/to/solidity-language-server
```
**Long timeouts** -- for slow servers that need more indexing time:
```yaml
project: v4-core
file: src/libraries/Pool.sol
line: 102
col: 15
timeout: 30
index_timeout: 60
benchmarks:
- all
servers:
- label: nomicfoundation
description: Hardhat/Nomic Foundation Solidity Language Server
link: https://github.com/NomicFoundation/hardhat-vscode
cmd: nomicfoundation-solidity-language-server
args: ["--stdio"]
```
### Running benchmarks
```sh
lsp-bench # uses benchmark.yaml in current directory
lsp-bench -c pool.yaml # uses a different config file
lsp-bench -c configs/fast.yaml # config can be in any path
lsp-bench -s solc -s mmsaki # only run solc and mmsaki from config
```
### CLI overrides
Some config values can be overridden from the command line. CLI flags take precedence over the config file.
| `-c, --config <PATH>` | Config file path (default: `benchmark.yaml`) |
| `-n, --iterations <N>` | `iterations` |
| `-w, --warmup <N>` | `warmup` |
| `-t, --timeout <SECS>` | `timeout` |
| `-T, --index-timeout <SECS>` | `index_timeout` |
| `-s, --server <NAME>` | Filters `servers` list (substring match, repeatable) |
| `-f, --file <PATH>` | `file` |
| `--line <N>` | `line` |
| `--col <N>` | `col` |
```sh
lsp-bench -n 1 -w 0 # override iterations/warmup from config
lsp-bench -s solc -s mmsaki # only run solc and mmsaki from config
lsp-bench -T 30 # give servers 30s to index (overrides config)
```
## Methodology
### How benchmarks work
Each benchmark sends real LSP requests over JSON-RPC (stdio) and measures wall-clock response time. Every request includes an `id`, and the tool waits for the server to return a response with that same `id` before recording the time and moving on. Requests are **sequential** -- the next iteration only starts after the previous one completes (or times out).
### Two timeouts
There are two separate timeouts that serve different purposes:
- **Index timeout** (`index_timeout`, default 15s): How long the server gets to index the project after opening a file. This is the "warm up" phase where the server analyzes the codebase, builds its AST, resolves imports, etc. This only applies to the diagnostics wait step.
- **Request timeout** (`timeout`, default 10s): How long each individual LSP method call (definition, hover, etc.) gets to respond. Once a server has finished indexing, this is the budget for each request.
### Warmup iterations
Warmup iterations (`warmup`, default 2) run the exact same benchmark but **discard the timing results**. This eliminates one-time costs from the measurements:
- **JIT compilation**: Node.js-based servers (nomicfoundation, juanfranblanco, qiuxiang) use V8, which interprets code on first run and optimizes hot paths later. The first 1-2 calls may be slower.
- **Internal caches**: Some servers cache symbol tables or analysis results after the first request.
- **OS-level caches**: First file reads hit disk; subsequent reads hit the page cache.
For `initialize` and `textDocument/diagnostic` benchmarks, a fresh server is started for every iteration, so warmup has less effect. For method benchmarks (`textDocument/definition`, `textDocument/hover`, etc.), the server stays alive across iterations, so warmup helps measure steady-state performance.
Set `warmup: 0` in your config (or `-w 0` on the CLI) to measure real-world "first call" performance.
### Benchmark types
Benchmarks are named after their official LSP method names:
**initialize**: Starts a fresh server process and performs the LSP `initialize`/`initialized` handshake. Measures cold-start time. A fresh server is spawned for every iteration.
**textDocument/diagnostic**: Starts a fresh server, opens the target file, and waits for the server to publish diagnostics. Measures how long the server takes to analyze the file. Uses `index_timeout`. A fresh server is spawned for every iteration.
**textDocument/definition**, **textDocument/declaration**, **textDocument/hover**, **textDocument/references**: Starts a single server, opens the target file, waits for diagnostics (using `index_timeout`), then sends repeated LSP method requests at the target position (`line`/`col`). Only the method request time is measured -- the indexing phase is not included in the timings.
**textDocument/documentSymbol**, **textDocument/documentLink**: Same as above but these are document-level methods that don't use the target position.
### Result statuses
Each server gets one of three statuses per benchmark:
| **ok** | Server responded with valid, non-empty results. Latency stats (p50, p95, mean) are recorded. |
| **invalid** | Server responded, but the result was empty, null, or an error (e.g. `"Unknown method"`). The server doesn't support this feature. |
| **fail** | Server didn't respond in time (timeout), crashed (EOF), or couldn't be spawned. The error reason is recorded. |
### Statistics
For successful benchmarks, three latency metrics are reported:
- **p50** (median): The typical response time. Half of iterations were faster, half were slower.
- **p95**: The worst-case response time (excluding outliers). 95% of iterations were faster.
- **mean**: The arithmetic average across all measured iterations.
### Memory measurement
Each benchmark measures the server's **Resident Set Size (RSS)** -- the amount of physical memory the process is using. RSS is sampled via `ps -o rss= -p <pid>` after the server finishes indexing (post-diagnostics).
Memory is measured in all outcomes:
| `textDocument/diagnostic` (success) | After diagnostics complete, before the server is killed. Peak RSS across all iterations is recorded. |
| `textDocument/diagnostic` (timeout/crash) | Right before returning the failure. The server is still alive, so RSS reflects memory consumed while stuck. |
| Method benchmarks (success) | Once after indexing completes, before the request loop begins. |
| Method benchmarks (timeout/crash) | Right before returning the failure. |
| `initialize` | Not measured (process is too short-lived). |
This means even servers that timeout or crash will have their memory usage recorded. For example, a Node.js server that times out after 15 seconds of indexing will show how much memory it consumed before giving up.
The value is stored as `rss_kb` (kilobytes) in the JSON output. Both `gen-readme` and `gen-analysis` display it in megabytes.
## Generate README
After running benchmarks, generate the README from JSON data:
```sh
./target/release/gen-readme benchmarks/2026-02-13T01-45-26Z.json # write to README.md, print to stdout
./target/release/gen-readme benchmarks/2026-02-13T01-45-26Z.json results.md # custom output path
./target/release/gen-readme benchmarks/snapshot.json -q # write file only (quiet)
./target/release/gen-readme --help # show help
```
By default, `gen-readme` prints the generated README to stdout and writes the file. Use `-q` / `--quiet` to suppress stdout output.
To auto-generate after benchmarks, set `report` and `report_style: readme` in your config.
## Generate Analysis
Generate a detailed analysis report from benchmark JSON:
```sh
./target/release/gen-analysis benchmarks/v4-core/snapshot.json # write ANALYSIS.md, print to stdout
./target/release/gen-analysis benchmarks/v4-core/snapshot.json report.md # custom output path
./target/release/gen-analysis benchmarks/v4-core/snapshot.json --base mmsaki # head-to-head from mmsaki's perspective
./target/release/gen-analysis benchmarks/v4-core/snapshot.json -q # write file only (quiet)
./target/release/gen-analysis --help # show help
```
The analysis report is organized per-feature. Each LSP method gets its own section with all stats aggregated into a single table:
- **Capability Matrix** -- Global overview: which servers succeed, fail, timeout, or crash on each benchmark, with a success rate summary.
- **Per-feature sections** (one per benchmark, e.g. `initialize`, `textDocument/definition`, etc.) -- Each section contains a table with servers as rows and dynamic columns:
- **Status** -- ok, empty, no, timeout, crash
- **Mean** -- average latency
- **p50 / p95 / Spread / Spike** -- consistency metrics (shown when percentile data exists)
- **Min / Max / Range** -- per-iteration range (shown when iteration data exists)
- **Overhead** -- multiplier vs the fastest server (shown when >1 server succeeded)
- **RSS** -- memory usage in MB (shown when RSS data exists)
- **vs Base** -- head-to-head comparison against the base server (shown when >1 server)
- **Peak Memory (RSS)** -- Global summary of peak RSS per server across all benchmarks. Only shown when RSS data is present.
### CLI options
| `-o, --output <path>` | Output file path (default: `ANALYSIS.md`) |
| `--base <server>` | Server for head-to-head comparison (default: first server) |
| `-q, --quiet` | Don't print analysis to stdout |
## Generate Delta
Generate a compact delta comparison table from benchmark JSON:
```sh
./target/release/gen-delta benchmarks/snapshot.json # compare first two servers, print to stdout
./target/release/gen-delta benchmarks/snapshot.json -o DELTA.md # write to file
./target/release/gen-delta benchmarks/snapshot.json --base baseline --head pr # choose which servers to compare
./target/release/gen-delta benchmarks/snapshot.json -q -o DELTA.md # write file only (quiet)
./target/release/gen-delta --help # show help
```
The delta table shows a side-by-side comparison of two servers with a relative speed column:
```
| initialize | 4.05ms | 3.05ms | 1.3x faster |
| textDocument/diagnostic | 123.80ms | 124.10ms | 1.0x (tied) |
| textDocument/hover | 2.30ms | 2.21ms | 1.0x (tied) |
| textDocument/definition | 8.95ms | 8.90ms | 1.0x (tied) |
| textDocument/documentSymbol | 8.72ms | 12.40ms | 1.4x slower |
```
Delta thresholds: differences within 5% are reported as "tied".
By default, `gen-delta` compares the first two servers in the JSON. Use `--base` and `--head` to pick specific servers.
Delta is the default `report_style`. To auto-generate after benchmarks, just set `report: DELTA.md` in your config.
### CLI options
| `-o, --output <path>` | Output file path (default: stdout only) |
| `--base <server>` | Baseline server (default: first server) |
| `--head <server>` | Head server to compare (default: second server) |
| `-q, --quiet` | Don't print table to stdout |
## Output
`lsp-bench` produces JSON snapshots in the `output` directory (default `benchmarks/`):
- `<output>/<timestamp>.json` -- all runs go to the same directory
During a run, partial results are saved to `<output>/partial/` after each benchmark completes. These are cleaned up automatically when the full run finishes.
If `report` is set in the config, the report is automatically generated from the final JSON snapshot using the chosen `report_style` (default: `delta`).
### JSON structure
Each result stores per-iteration data in an `iterations` array. For successful benchmarks (`status: "ok"`), every iteration records its latency and the server's response:
```json
{
"server": "mmsaki",
"status": "ok",
"mean_ms": 8.8,
"p50_ms": 8.8,
"p95_ms": 10.1,
"rss_kb": 40944,
"response": "{ ... }",
"iterations": [
{ "ms": 8.80, "response": "{ \"uri\": \"file:///...TickMath.sol\", ... }" },
{ "ms": 8.45, "response": "{ \"uri\": \"file:///...TickMath.sol\", ... }" },
{ "ms": 8.55, "response": "{ \"uri\": \"file:///...TickMath.sol\", ... }" }
]
}
```
For `initialize` benchmarks, the response is `"ok"` for each iteration and `rss_kb` is omitted (process is too short-lived). For `textDocument/diagnostic` benchmarks, `rss_kb` is the peak RSS across all iterations (each iteration spawns a fresh server). For method benchmarks (`textDocument/definition`, `textDocument/hover`, etc.), `rss_kb` is measured once after indexing completes. The top-level `response` field duplicates the first iteration's response for backward compatibility.
Failed or unsupported benchmarks (`status: "fail"` or `"invalid"`) have no `iterations` array:
```json
{
"server": "solc",
"status": "invalid",
"response": "[]"
}
```
The per-iteration data enables warmup curve analysis, response consistency checks across iterations, and detection of performance degradation over time.
`gen-readme` reads a JSON snapshot and writes `README.md` with:
- Summary results table with medals and trophy
- Medal tally and overall winner
- Feature support matrix
- Detailed per-benchmark latency tables (mean/p50/p95)
- Collapsible response details showing actual server responses
## Example files
The repo includes test resources in `examples/`:
- **`examples/Counter.sol`** -- A simple Solidity contract with NatSpec doc comments and intentional unused variables (`unused`, `owner`, `old`, `temp`) that trigger diagnostics warnings from LSP servers. Used as the default benchmark target by `lsp-bench init`.
For larger benchmarks, the repo also includes [Uniswap V4-core](https://github.com/Uniswap/v4-core) as a git submodule at `v4-core/` (618-line `Pool.sol`). Clone with `--recursive` to include it.