# libgraphql-parser Benchmark Optimizations
Tracker for performance optimization opportunities in the lexer and parser.
Each entry documents the problem, fix, trade-offs, and (once implemented)
benchmark results.
---
## B1: `peek_char()` uses `remaining().chars().nth(0)` on every character [CRITICAL]
**Status:** Completed
**Priority:** 1 (highest bang-for-buck)
**File:** `src/token_source/str_to_graphql_token_source.rs`
**Date:** 2026-02-08
**Problem:** Every character peek constructs a `&str` slice via `remaining()`,
creates a `Chars` iterator, and walks to the nth element. Called millions of
times for large inputs (every `consume()`, `skip_whitespace()`, `lex_name()`,
`lex_comment()`, `lex_block_string()`, `next_token()`).
**Change made:** Replaced both `peek_char()` and `consume()` with ASCII fast
paths. `peek_char()` does direct byte indexing + `is_ascii()` check instead
of creating a `Chars` iterator. `consume()` skips `ch.len_utf8()` and
`ch.len_utf16()` calls for ASCII (known to be 1 byte / 1 code unit). Non-ASCII
falls back to full UTF-8 decoding.
**Trade-offs:** Adds ASCII vs non-ASCII branch; branch prediction strongly
favors ASCII. `peek_char_nth(n)` for n>0 still needs iterator approach.
**Benchmark results (clean run, both before/after on AC power):**
Lexer-only (isolated lexer performance, most reliable signal):
| lexer/github_schema | 8.089ms | 7.552ms | **-6.6%** |
| lexer/large_schema | 6.483ms | 6.234ms | **-3.8%** |
| lexer/starwars_schema | 40.60us | 37.20us | **-6.6%** |
| lexer/medium_schema | 1.381ms | 1.335ms | **-3.5%** |
| lexer/small_schema | 28.95us | 28.15us | **-2.9%** |
Full schema parse (lexer + parser combined):
| schema_parse/github | 23.01ms | 22.33ms | **-3.0%** |
| schema_parse/large | 24.71ms | 24.73ms | ~0% |
| schema_parse/medium | 4.961ms | 4.985ms | ~0% |
| schema_parse/starwars | 87.23us | 91.38us | +4.8% (*) |
| schema_parse/small | 44.34us | 43.83us | -1.3% |
Cross-parser comparison (libgraphql_parser only):
| compare_schema_parse/.../github | 22.89ms | 22.47ms | **-1.9%** |
| compare_schema_parse/.../large | 24.59ms | 24.61ms | ~0% |
| compare_schema_parse/.../medium | 4.997ms | 4.974ms | -0.5% |
| compare_schema_parse/.../starwars | 77.14us | 85.22us | +10.5% (*) |
| compare_schema_parse/.../small | 83.99us | 81.21us | -2.6% |
(*) The starwars parse regression is anomalous: the lexer for starwars
clearly improved by -6.6%, and control parsers (graphql_parser,
apollo_parser) showed 0-1.5% random drift. This appears to be
measurement noise on the small (~4KB) fixture where variance is high.
**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)
**Verdict:** Consistent 3-7% lexer improvement across all fixture sizes.
Full parse shows ~2-3% improvement on the largest real-world input
(github). Keeping.
---
## B2: `consume()` does per-character position tracking [HIGH]
**Status:** Completed (all 4 sub-optimizations kept)
**Priority:** 6
**File:** `src/token_source/str_to_graphql_token_source.rs`
**Date:** 2026-02-09
**Problem:** Every character consumed updates 5-6 fields (peek_char, newline
check, curr_col_utf8, curr_col_utf16, last_char_was_cr, curr_byte_offset).
For a name like `PullRequestReviewCommentConnection` that's 36 chars x 6 ops.
**Change made:** Implemented byte-scanning fast paths for 4 hot lexer
methods. Each scans raw bytes in a tight loop (one branch per byte)
and batch-updates position tracking once at the end. Shared helper
`compute_columns_for_span()` handles ASCII fast path for column
computation (ASCII byte count = char count = UTF-16 unit count).
The approach is safe for multi-byte UTF-8 because the sentinel bytes
(`"`, `\`, `\n`, `\r`) are all ASCII (<0x80) and can never appear as
continuation bytes in multi-byte UTF-8 sequences (which are >=0x80).
Sub-optimizations (each a separate commit):
1. **`lex_name()`** — Byte-scan `[_0-9A-Za-z]` pattern. Names are
ASCII-only by spec, no newlines, so column = byte count.
2. **`skip_whitespace()`** — Byte-scan ` `, `\t`, `\n`, `\r`, BOM.
Tracks newline positions and BOM count for column computation.
3. **`lex_comment()`** — Byte-scan to `\n`/`\r`/EOF. Comments are
single-line, so only column advances. Uses
`compute_columns_for_span()` for potential non-ASCII content.
4. **`lex_block_string()`** — Byte-scan for `"`, `\`, `\n`, `\r`
sentinels, skip everything else with `i += 1`. Tracks newlines
for position reconstruction via `compute_columns_for_span()`.
**Trade-offs:** More complex position tracking logic (batch vs
per-char). `skip_whitespace()` tracks BOM count for correct column
math. `lex_block_string()` uses `compute_columns_for_span()` which
iterates after-last-newline range (but has ASCII fast path).
**Benchmark results (back-to-back, both on AC power):**
Machine: Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
Rust: rustc 1.90.0-nightly (0d9592026 2025-07-19)
Controls: graphql_parser ±0-1.6%, apollo_parser ±0-1.9% — clean
measurement, all changes attributable to our code.
### Net results (all 4 sub-optimizations combined vs B5 baseline)
Schema parse:
| small | 43.0 µs | 37.8 µs | **-12.1%** |
| medium | 2.07 ms | 1.81 ms | **-12.6%** |
| large | 9.65 ms | 8.40 ms | **-12.9%** |
| starwars | 53.4 µs | 42.8 µs | **-19.5%** |
| github | 12.6 ms | 10.5 ms | **-16.9%** |
Executable parse:
| simple_query | 1.94 µs | 1.73 µs | **-10.9%** |
| complex_query | 35.8 µs | 31.7 µs | **-11.2%** |
| nested_depth_10 | 7.72 µs | 6.2 µs | **-19.8%** |
| nested_depth_30 | 28.1 µs | 18.5 µs | **-34.2%** |
| many_ops_50 | 141 µs | 131 µs | **-7.2%** |
Lexer-only (isolates lexer changes):
| small_schema | 28.5 µs | 25.8 µs | **-6.3%** |
| medium_schema | 1.35 ms | 1.22 ms | **-9.3%** |
| large_schema | 6.30 ms | 5.69 ms | **-9.6%** |
| starwars_schema | 37.6 µs | 29.9 µs | **-20.2%** |
| github_schema | 7.68 ms | 5.85 ms | **-23.8%** |
Cross-parser comparison (schema parse, after B2):
| small | **37.9 µs** | 47.1 µs | 48.8 µs |
| medium | **1.82 ms** | 2.09 ms | 2.24 ms |
| large | **8.41 ms** | 9.63 ms | 10.7 ms |
| starwars | **42.8 µs** | 52.9 µs | 58.4 µs |
| github | 10.5 ms | **9.46 ms** | 14.1 ms |
Cross-parser comparison (executable parse, after B2):
| simple | **1.74 µs** | 3.02 µs | 3.17 µs |
| complex | **31.7 µs** | 41.9 µs | 41.0 µs |
### Bisection — marginal contribution of each sub-optimization
Marginal % = this commit's incremental effect (cumulative minus
previous cumulative). Values within ±3% are within measurement
noise and marked with ~.
schema_parse marginals:
| small | **-5.9%** | ~ | ~ | **-7.0%** |
| medium | **-7.9%** | ~ | ~ | **-3.1%** |
| large | **-10.5%** | ~ | ~ | **-4.6%** |
| starwars | **-4.1%** | ~ | **-12.9%** | ~ |
| github | ~ | ~ | ~ | **-10.5%** |
executable_parse marginals:
| simple_query | **-6.8%** | ~ | ~ | ~ |
| complex_query | **-8.9%** | ~ | ~ | ~ |
| nested_depth_10 | **-7.9%** | **-10.1%** | ~ | ~ |
| nested_depth_30 | **-3.2%** | **-29.3%** | ~ | ~ |
| many_ops_50 | **-5.9%** | ~ | ~ | ~ |
lexer marginals:
| small_schema | ~ | ~ | ~ | ~ |
| medium_schema | **-4.1%** | ~ | ~ | **-3.3%** |
| large_schema | **-3.6%** | ~ | ~ | ~ |
| starwars_schema | ~ | ~ | **-20.8%** | ~ |
| github_schema | **-4.0%** | ~ | ~ | **-18.8%** |
### Per sub-optimization assessment
**lex_name:** Broad, consistent 4-11% improvement across schema and
executable parsing. Names are the most frequent token type — every
identifier, type name, field name, keyword benefits.
**skip_whitespace:** Dramatic 10-29% improvement on deeply-nested
executable parsing (depth_10, depth_30) where whitespace-heavy
indentation dominates. Negligible on other fixtures. The nested
fixtures have proportionally more whitespace due to deep indentation.
**lex_comment:** 13-21% improvement on starwars fixture (comment-
heavy). Negligible on other fixtures which have few `#` comments.
The starwars schema has extensive `#`-style comments throughout.
**lex_block_string:** 3-19% improvement on schema fixtures with
block string descriptions (github has 3,246 descriptions). The
github lexer improvement (-18.8%) is particularly striking. No
effect on executable parsing (queries don't typically contain block
strings).
**Verdict:** All 4 sub-optimizations kept. Each targets a different
token type and shows clear signal above noise on fixtures where that
token type is prevalent. No regressions detected on any fixture.
libgraphql-parser now leads graphql_parser and apollo_parser on all
schema fixtures except github (where graphql_parser still leads by
~10%). On executable parsing, libgraphql-parser leads by ~1.7-1.8x.
---
## B3: Block string parsing allocates heavily [HIGH]
**Status:** Completed
**Priority:** 2
**File:** `src/token/graphql_token_kind.rs`
**Date:** 2026-02-08
**Problem:** `parse_block_string()` is called for every description. For the
GitHub schema (~3,246 block strings), each call does:
1. `content.replace("\\\"\"\"", "\"\"\"")` — always allocates even when no
escaped triple quotes exist (common case)
2. `content.lines().collect::<Vec<&str>>()` — allocates a Vec
3. `Vec::with_capacity(lines.len())` of `String` — allocates Vec of Strings
4. `.to_string()` per line — heap allocation per line
5. `result_lines.remove(0)` — O(n) shift
6. `result_lines.join("\n")` — final allocation
~6 allocations per description x 3,246 = ~19,000+ heap allocations.
**Change made:** Rewrote `parse_block_string()` as a two-pass, low-allocation
algorithm:
- `Cow::Borrowed` fast path skips `replace()` when no `\"""` present
- Pass 1 iterates `str::lines()` lazily to compute common indent and
first/last non-blank line indices (no Vec allocation)
- Pass 2 iterates `str::lines()` again, writing stripped lines directly
into a single pre-allocated `String`
- No `Vec<String>`, no `remove(0)`, no `join()` — just one `String`
allocation for the entire result
**Trade-offs:** More complex two-pass logic (two `str::lines()` iterations
instead of one collect). Must preserve exact spec semantics for edge cases.
**Benchmark results (clean back-to-back runs):**
Schema parse (full parse, lexer + parser):
| schema_parse/github | 22.35ms | 21.55ms | **-3.6%** |
| schema_parse/large | 24.61ms | 23.96ms | **-2.6%** |
| schema_parse/medium | 4.917ms | 4.816ms | **-2.0%** |
| schema_parse/small | 43.70us | 41.84us | **-4.0%** |
| schema_parse/starwars | 90.61us | 93.14us | +2.6% (*) |
Cross-parser comparison (libgraphql_parser only):
| compare_.../libgraphql_.../github | 22.72ms | 21.35ms | **-3.0%** |
| compare_.../libgraphql_.../large | 24.49ms | 24.28ms | ~0% |
| compare_.../libgraphql_.../medium | 4.916ms | 4.834ms | ~0% |
| compare_.../libgraphql_.../small | 79.67us | 75.22us | **-6.0%** |
Executable parse (B3 benefits string-heavy queries):
| executable_parse/complex | 72.30us | 54.46us | **-25.4%** |
| compare_.../complex | 72.93us | 68.41us | **-4.3%** |
Lexer (expected: no change — B3 is parser-level):
| lexer/github_schema | 7.550ms | 7.526ms | ~0% |
| lexer/large_schema | 6.212ms | 6.220ms | ~0% |
(*) The starwars regression is anomalous: control parsers showed 0%
drift, and the starwars schema has few descriptions (B3 should be
irrelevant there). This appears to be measurement noise.
**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)
**Verdict:** Clear improvement on description-heavy inputs (github
-3%, complex query -25%). Lexer unaffected as expected. Keeping.
---
## B4: `name.into_owned()` forces heap allocation for every identifier [DEFERRED]
**Status:** Pending (deferred — requires significant architectural changes)
**Priority:** — (future work)
**File:** `src/graphql_parser.rs`, `src/ast.rs`
**Problem:** AST types use `String` for identifiers. Every name is converted
from `Cow::Borrowed(&str)` to owned `String` via `into_owned()`. For the
GitHub schema with ~70,000+ identifiers, that's ~70,000 heap allocations.
**Suggested fix (long-term):** Define native AST types with `Cow<'src, str>`.
**Suggested fix (short-term):** String interning / arena allocation.
**Trade-offs:** Major architectural refactor. Lifetime `'src` propagates
through all AST consumers.
**Est. impact:** HIGHEST — but deferred due to scope
---
## B5: Token clone in `expect()` [CRITICAL]
**Status:** Completed
**Priority:** 5
**File:** `src/graphql_parser.rs`, `src/graphql_token_stream.rs`
**Date:** 2026-02-09
**Problem:** `expect()` cloned the peeked token before consuming.
Clone included `GraphQLTokenKind` enum (with `Cow<str>`),
`SmallVec<[GraphQLTriviaToken; 2]>` trivia, `GraphQLSourceSpan`.
Called for every punctuator — tens of thousands of times per schema.
**Change made:** Replaced `Vec`+index buffer in `GraphQLTokenStream`
with `VecDeque` ring buffer. `consume()` now returns
`Option<GraphQLToken>` (owned) via O(1) `pop_front()`. Eliminated:
- Token clone in `expect()` (cloned full GraphQLToken per punctuator)
- `Cow<str>` clone in `expect_name_only()` (now moves from owned token)
- Span clone in `expect_keyword()` success path
- Span clone in `parse_description()` error path
Removed `compact_buffer()` (VecDeque naturally discards consumed
tokens) and `current_token()` (callers use `consume_token()` return
directly). Parser tracks `last_end_position: Option<SourcePosition>`
for EOF error anchoring.
**Trade-offs:** Changed `GraphQLTokenStream` API — removed
`current_token()` and `compact_buffer()`. All ~50
`self.token_stream.consume()` call sites updated to
`self.consume_token()` wrapper that tracks end position.
**Benchmark results (clean back-to-back, both on AC power):**
Machine: Apple M-series arm64, macOS
Rust: rustc 1.90.0-nightly (0d9592026 2025-07-19)
Controls confirm clean measurement: lexer benchmarks all within
±0.3–1.8% (no parser changes expected). Competitor parsers
(graphql_parser, apollo_parser) within ±0–3% noise.
Standalone schema_parse (libgraphql only):
| small | 42.1 µs | 42.1 µs | ~0% |
| medium | 5.25 ms | 2.04 ms | **-61.1%** |
| large | 20.3 ms | 9.50 ms | **-53.2%** |
| starwars | 92.7 µs | 52.1 µs | **-42.8%** |
| github | 21.7 ms | 12.4 ms | **-42.7%** |
Standalone executable_parse (libgraphql only):
| simple_query | 1.93 µs | 1.91 µs | -1.6% |
| complex_query | 70.8 µs | 34.9 µs | **-51.1%** |
| nested_depth_10 | 8.25 µs | 7.54 µs | **-9.0%** |
| nested_depth_30 | 61.1 µs | 27.6 µs | **-54.8%** |
| many_ops_50 | 198.7 µs | 138.3 µs | **-30.1%** |
Cross-parser comparison (schema parse, after B5):
| small | **42.0 µs** | 46.6 µs | 48.3 µs |
| medium | **2.05 ms** | 2.06 ms | 2.20 ms |
| large | **9.49 ms** | 9.48 ms | 10.6 ms |
| starwars | 52.1 µs | **52.6 µs** | 57.6 µs |
| github | 12.5 ms | **9.35 ms** | 13.9 ms |
Cross-parser comparison (executable parse, after B5):
| simple | **1.91 µs** | 3.03 µs | 3.13 µs |
| complex | **34.9 µs** | 41.3 µs | 40.6 µs |
**Verdict:** Massive improvement — original "MODERATE" estimate was
dramatically wrong. Token cloning was the dominant parser bottleneck.
libgraphql-parser is now competitive with or faster than both
graphql_parser and apollo_parser on most benchmarks. The original
2–2.5x gap is closed to 1.0–1.3x across all fixtures.
---
## B6: `starts_with()` in block string lexing [MODERATE]
**Status:** Skipped (no measurable improvement — reverted)
**Priority:** 4
**File:** `src/token_source/str_to_graphql_token_source.rs`
**Date:** 2026-02-09
**Problem:** Inside the block string lexer, every character checks:
```rust
self.remaining().starts_with("\\\"\"\"")
self.remaining().starts_with("\"\"\"")
```
`remaining()` creates a new slice each time. Adds up for long block strings.
Also replaced in `lex_string()` for the block string detection check.
**Change attempted:** Added `next_is_triple_quote()` and
`next_is_escaped_triple_quote()` helper methods that use direct byte
indexing into `self.source.as_bytes()`. Also removed unnecessary
`#[inline]` from `peek_char()` (inlining is already handled by LLVM
for crate-local calls).
**Benchmark results (two independent back-to-back runs):**
Lexer-only (B6 targets lexer — most direct signal):
| lexer/github_schema | 7.650ms | 7.741ms | 7.543ms | 7.562ms |
| lexer/large_schema | 6.287ms | 6.327ms | 6.277ms | 6.230ms |
| lexer/medium_schema | 1.328ms | 1.345ms | 1.321ms | 1.349ms |
| lexer/small_schema | 28.73us | 28.13us | 28.06us | 27.83us |
| lexer/starwars_schema | 37.60us | 37.83us | 37.10us | 37.20us |
Cross-parser comparison (controlled — all parsers in same run):
| compare_.../libgraphql_.../github | 21.48ms | 21.35ms | 21.30ms | 21.33ms |
| compare_.../libgraphql_.../large | 24.50ms | 24.36ms | 24.08ms | 24.11ms |
| compare_.../libgraphql_.../medium | 4.823ms | 4.746ms | 4.813ms | 4.878ms |
| compare_.../graphql_parser/github | 9.600ms | 9.434ms | 9.463ms | 9.392ms |
| compare_.../graphql_parser/large | 9.661ms | 9.587ms | 9.533ms | 9.541ms |
| compare_.../apollo_parser/github | 14.93ms | 14.01ms | 15.01ms | 14.04ms |
| compare_.../apollo_parser/large | 11.05ms | 10.74ms | 10.70ms | 10.74ms |
Control parsers show the same magnitude of drift as libgraphql_parser
across both runs, confirming B6 has no effect above the noise floor.
**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)
**Verdict:** No measurable performance change across two independent
back-to-back runs. `starts_with()` for short literal patterns is
already well-optimized by the compiler. Code changes reverted.
---
## B7: `shrink_to_fit()` in `compact_buffer()` [LOW-MODERATE]
**Status:** Completed
**Priority:** 3
**File:** `src/graphql_token_stream.rs`
**Date:** 2026-02-08
**Problem:** After every buffer compaction (once per top-level definition),
`shrink_to_fit()` may trigger a reallocation to shrink Vec capacity, only for
the buffer to grow again for the next definition. For 1000+ definitions,
that's 1000+ potential realloc cycles.
**Change made:** Removed `shrink_to_fit()` call. Buffer retains capacity
between definitions.
**Trade-offs:** Slightly higher peak memory (~few KB retained). Negligible.
**Benchmark results (clean back-to-back runs):**
Schema parse (full parse, lexer + parser):
| schema_parse/github | 21.55ms | 19.43ms | **-9.9%** |
| schema_parse/large | 24.34ms | 21.91ms | **-10.0%** |
| schema_parse/medium | 4.769ms | 4.155ms | **-12.9%** |
| schema_parse/small | 42.29us | 42.41us | ~0% |
| schema_parse/starwars | 91.03us | 89.64us | -1.3% |
Lexer (expected: no change — B7 is parser-level):
| lexer/github_schema | 7.679ms | 7.646ms | ~0% |
| lexer/large_schema | 6.430ms | 6.318ms | -1.7% |
Impact scales with number of top-level definitions: medium (~200
types) and large (~1000 types) showed the biggest gains. Small
schemas with few definitions showed no change, confirming that the
improvement comes from reduced realloc churn in the compaction loop.
**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)
**Verdict:** Unexpectedly large improvement (10-13% on medium/large
schemas). Much bigger than the estimated LOW-MODERATE. Keeping.
---
## B8: No `[profile.bench]` in workspace Cargo.toml [LOW — MEASUREMENT ONLY]
**Status:** Pending
**Priority:** 7
**File:** `Cargo.toml` (workspace root)
**Problem:** No benchmark-specific profile. Adding LTO and single codegen unit
helps cross-crate inlining.
**Suggested fix:**
```toml
[profile.bench]
lto = "thin"
codegen-units = 1
```
**Trade-offs:** Slower benchmark compilation. Affects all parsers equally in
comparative benchmarks.
**Est. impact:** LOW for relative comparisons, potentially MODERATE for absolute
---
## B9: Dual UTF-16 column tracking on every character [LOW]
**Status:** Pending (subsumed by B2 if adopted)
**Priority:** 9
**File:** `src/token_source/str_to_graphql_token_source.rs:202`
**Problem:** Every non-newline char updates `curr_col_utf16 += ch.len_utf16()`.
For ASCII `len_utf16()` always returns 1. Small per-char overhead.
**Suggested fix:** Make UTF-16 tracking opt-in via constructor flag, or defer
to B2's lazy position computation.
**Trade-offs:** API change; consumers needing UTF-16 must opt in.
**Est. impact:** LOW standalone — subsumed by B2