libgraphql-parser 0.0.4

A GraphQL parsing library to parse schema documents, executable documents, and documents that mix both together.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
# libgraphql-parser Benchmark Optimizations

Tracker for performance optimization opportunities in the lexer and parser.
Each entry documents the problem, fix, trade-offs, and (once implemented)
benchmark results.

Status legend: **Pending** | **Completed** | **Skipped**

---

## B1: `peek_char()` uses `remaining().chars().nth(0)` on every character [CRITICAL]

**Status:** Completed
**Priority:** 1 (highest bang-for-buck)
**File:** `src/token_source/str_to_graphql_token_source.rs`
**Date:** 2026-02-08

**Problem:** Every character peek constructs a `&str` slice via `remaining()`,
creates a `Chars` iterator, and walks to the nth element. Called millions of
times for large inputs (every `consume()`, `skip_whitespace()`, `lex_name()`,
`lex_comment()`, `lex_block_string()`, `next_token()`).

**Change made:** Replaced both `peek_char()` and `consume()` with ASCII fast
paths. `peek_char()` does direct byte indexing + `is_ascii()` check instead
of creating a `Chars` iterator. `consume()` skips `ch.len_utf8()` and
`ch.len_utf16()` calls for ASCII (known to be 1 byte / 1 code unit). Non-ASCII
falls back to full UTF-8 decoding.

**Trade-offs:** Adds ASCII vs non-ASCII branch; branch prediction strongly
favors ASCII. `peek_char_nth(n)` for n>0 still needs iterator approach.

**Benchmark results (clean run, both before/after on AC power):**

Lexer-only (isolated lexer performance, most reliable signal):

| Fixture               | Before   | After    | Change     |
|-----------------------|----------|----------|------------|
| lexer/github_schema   | 8.089ms  | 7.552ms  | **-6.6%**  |
| lexer/large_schema    | 6.483ms  | 6.234ms  | **-3.8%**  |
| lexer/starwars_schema | 40.60us  | 37.20us  | **-6.6%**  |
| lexer/medium_schema   | 1.381ms  | 1.335ms  | **-3.5%**  |
| lexer/small_schema    | 28.95us  | 28.15us  | **-2.9%**  |

Full schema parse (lexer + parser combined):

| Fixture              | Before   | After    | Change     |
|----------------------|----------|----------|------------|
| schema_parse/github  | 23.01ms  | 22.33ms  | **-3.0%**  |
| schema_parse/large   | 24.71ms  | 24.73ms  | ~0%        |
| schema_parse/medium  | 4.961ms  | 4.985ms  | ~0%        |
| schema_parse/starwars | 87.23us | 91.38us  | +4.8% (*)  |
| schema_parse/small   | 44.34us  | 43.83us  | -1.3%      |

Cross-parser comparison (libgraphql_parser only):

| Fixture                             | Before   | After    | Change     |
|-------------------------------------|----------|----------|------------|
| compare_schema_parse/.../github     | 22.89ms  | 22.47ms  | **-1.9%**  |
| compare_schema_parse/.../large      | 24.59ms  | 24.61ms  | ~0%        |
| compare_schema_parse/.../medium     | 4.997ms  | 4.974ms  | -0.5%      |
| compare_schema_parse/.../starwars   | 77.14us  | 85.22us  | +10.5% (*) |
| compare_schema_parse/.../small      | 83.99us  | 81.21us  | -2.6%      |

(*) The starwars parse regression is anomalous: the lexer for starwars
clearly improved by -6.6%, and control parsers (graphql_parser,
apollo_parser) showed 0-1.5% random drift. This appears to be
measurement noise on the small (~4KB) fixture where variance is high.

**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)

**Verdict:** Consistent 3-7% lexer improvement across all fixture sizes.
Full parse shows ~2-3% improvement on the largest real-world input
(github). Keeping.

---

## B2: `consume()` does per-character position tracking [HIGH]

**Status:** Completed (all 4 sub-optimizations kept)
**Priority:** 6
**File:** `src/token_source/str_to_graphql_token_source.rs`
**Date:** 2026-02-09

**Problem:** Every character consumed updates 5-6 fields (peek_char, newline
check, curr_col_utf8, curr_col_utf16, last_char_was_cr, curr_byte_offset).
For a name like `PullRequestReviewCommentConnection` that's 36 chars x 6 ops.

**Change made:** Implemented byte-scanning fast paths for 4 hot lexer
methods. Each scans raw bytes in a tight loop (one branch per byte)
and batch-updates position tracking once at the end. Shared helper
`compute_columns_for_span()` handles ASCII fast path for column
computation (ASCII byte count = char count = UTF-16 unit count).

The approach is safe for multi-byte UTF-8 because the sentinel bytes
(`"`, `\`, `\n`, `\r`) are all ASCII (<0x80) and can never appear as
continuation bytes in multi-byte UTF-8 sequences (which are >=0x80).

Sub-optimizations (each a separate commit):
1. **`lex_name()`** — Byte-scan `[_0-9A-Za-z]` pattern. Names are
   ASCII-only by spec, no newlines, so column = byte count.
2. **`skip_whitespace()`** — Byte-scan ` `, `\t`, `\n`, `\r`, BOM.
   Tracks newline positions and BOM count for column computation.
3. **`lex_comment()`** — Byte-scan to `\n`/`\r`/EOF. Comments are
   single-line, so only column advances. Uses
   `compute_columns_for_span()` for potential non-ASCII content.
4. **`lex_block_string()`** — Byte-scan for `"`, `\`, `\n`, `\r`
   sentinels, skip everything else with `i += 1`. Tracks newlines
   for position reconstruction via `compute_columns_for_span()`.

**Trade-offs:** More complex position tracking logic (batch vs
per-char). `skip_whitespace()` tracks BOM count for correct column
math. `lex_block_string()` uses `compute_columns_for_span()` which
iterates after-last-newline range (but has ASCII fast path).

**Benchmark results (back-to-back, both on AC power):**

Machine: Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
Rust: rustc 1.90.0-nightly (0d9592026 2025-07-19)

Controls: graphql_parser ±0-1.6%, apollo_parser ±0-1.9% — clean
measurement, all changes attributable to our code.

### Net results (all 4 sub-optimizations combined vs B5 baseline)

Schema parse:

| Fixture  | Before    | After     | Change       |
|----------|-----------|-----------|--------------|
| small    | 43.0 µs   | 37.8 µs   | **-12.1%**   |
| medium   | 2.07 ms   | 1.81 ms   | **-12.6%**   |
| large    | 9.65 ms   | 8.40 ms   | **-12.9%**   |
| starwars | 53.4 µs   | 42.8 µs   | **-19.5%**   |
| github   | 12.6 ms   | 10.5 ms   | **-16.9%**   |

Executable parse:

| Fixture          | Before    | After     | Change       |
|------------------|-----------|-----------|--------------|
| simple_query     | 1.94 µs   | 1.73 µs   | **-10.9%**   |
| complex_query    | 35.8 µs   | 31.7 µs   | **-11.2%**   |
| nested_depth_10  | 7.72 µs   | 6.2 µs    | **-19.8%**   |
| nested_depth_30  | 28.1 µs   | 18.5 µs   | **-34.2%**   |
| many_ops_50      | 141 µs    | 131 µs    | **-7.2%**    |

Lexer-only (isolates lexer changes):

| Fixture         | Before    | After     | Change       |
|-----------------|-----------|-----------|--------------|
| small_schema    | 28.5 µs   | 25.8 µs   | **-6.3%**    |
| medium_schema   | 1.35 ms   | 1.22 ms   | **-9.3%**    |
| large_schema    | 6.30 ms   | 5.69 ms   | **-9.6%**    |
| starwars_schema | 37.6 µs   | 29.9 µs   | **-20.2%**   |
| github_schema   | 7.68 ms   | 5.85 ms   | **-23.8%**   |

Cross-parser comparison (schema parse, after B2):

| Fixture  | libgraphql   | graphql_parser | apollo_parser |
|----------|--------------|----------------|---------------|
| small    | **37.9 µs**  | 47.1 µs        | 48.8 µs       |
| medium   | **1.82 ms**  | 2.09 ms        | 2.24 ms       |
| large    | **8.41 ms**  | 9.63 ms        | 10.7 ms       |
| starwars | **42.8 µs**  | 52.9 µs        | 58.4 µs       |
| github   | 10.5 ms      | **9.46 ms**    | 14.1 ms       |

Cross-parser comparison (executable parse, after B2):

| Fixture  | libgraphql   | graphql_parser | apollo_parser |
|----------|--------------|----------------|---------------|
| simple   | **1.74 µs**  | 3.02 µs        | 3.17 µs       |
| complex  | **31.7 µs**  | 41.9 µs        | 41.0 µs       |

### Bisection — marginal contribution of each sub-optimization

Marginal % = this commit's incremental effect (cumulative minus
previous cumulative). Values within ±3% are within measurement
noise and marked with ~.

schema_parse marginals:

| Fixture  | lex_name   | skip_ws  | lex_comment  | block_string |
|----------|------------|----------|--------------|--------------|
| small    | **-5.9%**  | ~        | ~            | **-7.0%**    |
| medium   | **-7.9%**  | ~        | ~            | **-3.1%**    |
| large    | **-10.5%** | ~        | ~            | **-4.6%**    |
| starwars | **-4.1%**  | ~        | **-12.9%**   | ~            |
| github   | ~          | ~        | ~            | **-10.5%**   |

executable_parse marginals:

| Fixture          | lex_name   | skip_ws     | lex_comment | block_string |
|------------------|------------|-------------|-------------|--------------|
| simple_query     | **-6.8%**  | ~           | ~           | ~            |
| complex_query    | **-8.9%**  | ~           | ~           | ~            |
| nested_depth_10  | **-7.9%**  | **-10.1%**  | ~           | ~            |
| nested_depth_30  | **-3.2%**  | **-29.3%**  | ~           | ~            |
| many_ops_50      | **-5.9%**  | ~           | ~           | ~            |

lexer marginals:

| Fixture         | lex_name   | skip_ws  | lex_comment  | block_string  |
|-----------------|------------|----------|--------------|---------------|
| small_schema    | ~          | ~        | ~            | ~             |
| medium_schema   | **-4.1%**  | ~        | ~            | **-3.3%**     |
| large_schema    | **-3.6%**  | ~        | ~            | ~             |
| starwars_schema | ~          | ~        | **-20.8%**   | ~             |
| github_schema   | **-4.0%**  | ~        | ~            | **-18.8%**    |

### Per sub-optimization assessment

**lex_name:** Broad, consistent 4-11% improvement across schema and
executable parsing. Names are the most frequent token type — every
identifier, type name, field name, keyword benefits.

**skip_whitespace:** Dramatic 10-29% improvement on deeply-nested
executable parsing (depth_10, depth_30) where whitespace-heavy
indentation dominates. Negligible on other fixtures. The nested
fixtures have proportionally more whitespace due to deep indentation.

**lex_comment:** 13-21% improvement on starwars fixture (comment-
heavy). Negligible on other fixtures which have few `#` comments.
The starwars schema has extensive `#`-style comments throughout.

**lex_block_string:** 3-19% improvement on schema fixtures with
block string descriptions (github has 3,246 descriptions). The
github lexer improvement (-18.8%) is particularly striking. No
effect on executable parsing (queries don't typically contain block
strings).

**Verdict:** All 4 sub-optimizations kept. Each targets a different
token type and shows clear signal above noise on fixtures where that
token type is prevalent. No regressions detected on any fixture.
libgraphql-parser now leads graphql_parser and apollo_parser on all
schema fixtures except github (where graphql_parser still leads by
~10%). On executable parsing, libgraphql-parser leads by ~1.7-1.8x.

---

## B3: Block string parsing allocates heavily [HIGH]

**Status:** Completed
**Priority:** 2
**File:** `src/token/graphql_token_kind.rs`
**Date:** 2026-02-08

**Problem:** `parse_block_string()` is called for every description. For the
GitHub schema (~3,246 block strings), each call does:
1. `content.replace("\\\"\"\"", "\"\"\"")` — always allocates even when no
   escaped triple quotes exist (common case)
2. `content.lines().collect::<Vec<&str>>()` — allocates a Vec
3. `Vec::with_capacity(lines.len())` of `String` — allocates Vec of Strings
4. `.to_string()` per line — heap allocation per line
5. `result_lines.remove(0)` — O(n) shift
6. `result_lines.join("\n")` — final allocation

~6 allocations per description x 3,246 = ~19,000+ heap allocations.

**Change made:** Rewrote `parse_block_string()` as a two-pass, low-allocation
algorithm:
- `Cow::Borrowed` fast path skips `replace()` when no `\"""` present
- Pass 1 iterates `str::lines()` lazily to compute common indent and
  first/last non-blank line indices (no Vec allocation)
- Pass 2 iterates `str::lines()` again, writing stripped lines directly
  into a single pre-allocated `String`
- No `Vec<String>`, no `remove(0)`, no `join()` — just one `String`
  allocation for the entire result

**Trade-offs:** More complex two-pass logic (two `str::lines()` iterations
instead of one collect). Must preserve exact spec semantics for edge cases.

**Benchmark results (clean back-to-back runs):**

Schema parse (full parse, lexer + parser):

| Fixture               | Before   | After    | Change    |
|-----------------------|----------|----------|-----------|
| schema_parse/github   | 22.35ms  | 21.55ms  | **-3.6%** |
| schema_parse/large    | 24.61ms  | 23.96ms  | **-2.6%** |
| schema_parse/medium   | 4.917ms  | 4.816ms  | **-2.0%** |
| schema_parse/small    | 43.70us  | 41.84us  | **-4.0%** |
| schema_parse/starwars | 90.61us  | 93.14us  | +2.6% (*) |

Cross-parser comparison (libgraphql_parser only):

| Fixture                            | Before   | After    | Change    |
|------------------------------------|----------|----------|-----------|
| compare_.../libgraphql_.../github  | 22.72ms  | 21.35ms  | **-3.0%** |
| compare_.../libgraphql_.../large   | 24.49ms  | 24.28ms  | ~0%       |
| compare_.../libgraphql_.../medium  | 4.916ms  | 4.834ms  | ~0%       |
| compare_.../libgraphql_.../small   | 79.67us  | 75.22us  | **-6.0%** |

Executable parse (B3 benefits string-heavy queries):

| Fixture                  | Before   | After    | Change       |
|--------------------------|----------|----------|--------------|
| executable_parse/complex | 72.30us  | 54.46us  | **-25.4%**   |
| compare_.../complex      | 72.93us  | 68.41us  | **-4.3%**    |

Lexer (expected: no change — B3 is parser-level):

| Fixture             | Before   | After    | Change |
|---------------------|----------|----------|--------|
| lexer/github_schema | 7.550ms  | 7.526ms  | ~0%    |
| lexer/large_schema  | 6.212ms  | 6.220ms  | ~0%    |

(*) The starwars regression is anomalous: control parsers showed 0%
drift, and the starwars schema has few descriptions (B3 should be
irrelevant there). This appears to be measurement noise.

**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)

**Verdict:** Clear improvement on description-heavy inputs (github
-3%, complex query -25%). Lexer unaffected as expected. Keeping.

---

## B4: `name.into_owned()` forces heap allocation for every identifier [DEFERRED]

**Status:** Pending (deferred — requires significant architectural changes)
**Priority:** — (future work)
**File:** `src/graphql_parser.rs`, `src/ast.rs`

**Problem:** AST types use `String` for identifiers. Every name is converted
from `Cow::Borrowed(&str)` to owned `String` via `into_owned()`. For the
GitHub schema with ~70,000+ identifiers, that's ~70,000 heap allocations.

**Suggested fix (long-term):** Define native AST types with `Cow<'src, str>`.
**Suggested fix (short-term):** String interning / arena allocation.

**Trade-offs:** Major architectural refactor. Lifetime `'src` propagates
through all AST consumers.

**Est. impact:** HIGHEST — but deferred due to scope

---

## B5: Token clone in `expect()` [CRITICAL]

**Status:** Completed
**Priority:** 5
**File:** `src/graphql_parser.rs`, `src/graphql_token_stream.rs`
**Date:** 2026-02-09

**Problem:** `expect()` cloned the peeked token before consuming.
Clone included `GraphQLTokenKind` enum (with `Cow<str>`),
`SmallVec<[GraphQLTriviaToken; 2]>` trivia, `GraphQLSourceSpan`.
Called for every punctuator — tens of thousands of times per schema.

**Change made:** Replaced `Vec`+index buffer in `GraphQLTokenStream`
with `VecDeque` ring buffer. `consume()` now returns
`Option<GraphQLToken>` (owned) via O(1) `pop_front()`. Eliminated:

- Token clone in `expect()` (cloned full GraphQLToken per punctuator)
- `Cow<str>` clone in `expect_name_only()` (now moves from owned token)
- Span clone in `expect_keyword()` success path
- Span clone in `parse_description()` error path

Removed `compact_buffer()` (VecDeque naturally discards consumed
tokens) and `current_token()` (callers use `consume_token()` return
directly). Parser tracks `last_end_position: Option<SourcePosition>`
for EOF error anchoring.

**Trade-offs:** Changed `GraphQLTokenStream` API — removed
`current_token()` and `compact_buffer()`. All ~50
`self.token_stream.consume()` call sites updated to
`self.consume_token()` wrapper that tracks end position.

**Benchmark results (clean back-to-back, both on AC power):**

Machine: Apple M-series arm64, macOS
Rust: rustc 1.90.0-nightly (0d9592026 2025-07-19)

Controls confirm clean measurement: lexer benchmarks all within
±0.3–1.8% (no parser changes expected). Competitor parsers
(graphql_parser, apollo_parser) within ±0–3% noise.

Standalone schema_parse (libgraphql only):

| Fixture  | Before    | After    | Change       |
|----------|-----------|----------|--------------|
| small    | 42.1 µs   | 42.1 µs  | ~0%          |
| medium   | 5.25 ms   | 2.04 ms  | **-61.1%**   |
| large    | 20.3 ms   | 9.50 ms  | **-53.2%**   |
| starwars | 92.7 µs   | 52.1 µs  | **-42.8%**   |
| github   | 21.7 ms   | 12.4 ms  | **-42.7%**   |

Standalone executable_parse (libgraphql only):

| Fixture          | Before    | After     | Change       |
|------------------|-----------|-----------|--------------|
| simple_query     | 1.93 µs   | 1.91 µs   | -1.6%        |
| complex_query    | 70.8 µs   | 34.9 µs   | **-51.1%**   |
| nested_depth_10  | 8.25 µs   | 7.54 µs   | **-9.0%**    |
| nested_depth_30  | 61.1 µs   | 27.6 µs   | **-54.8%**   |
| many_ops_50      | 198.7 µs  | 138.3 µs  | **-30.1%**   |

Cross-parser comparison (schema parse, after B5):

| Fixture  | libgraphql  | graphql_parser | apollo_parser |
|----------|-------------|----------------|---------------|
| small    | **42.0 µs** | 46.6 µs        | 48.3 µs       |
| medium   | **2.05 ms** | 2.06 ms        | 2.20 ms       |
| large    | **9.49 ms** | 9.48 ms        | 10.6 ms       |
| starwars | 52.1 µs     | **52.6 µs**    | 57.6 µs       |
| github   | 12.5 ms     | **9.35 ms**    | 13.9 ms       |

Cross-parser comparison (executable parse, after B5):

| Fixture | libgraphql  | graphql_parser | apollo_parser |
|---------|-------------|----------------|---------------|
| simple  | **1.91 µs** | 3.03 µs        | 3.13 µs       |
| complex | **34.9 µs** | 41.3 µs        | 40.6 µs       |

**Verdict:** Massive improvement — original "MODERATE" estimate was
dramatically wrong. Token cloning was the dominant parser bottleneck.
libgraphql-parser is now competitive with or faster than both
graphql_parser and apollo_parser on most benchmarks. The original
2–2.5x gap is closed to 1.0–1.3x across all fixtures.

---

## B6: `starts_with()` in block string lexing [MODERATE]

**Status:** Skipped (no measurable improvement — reverted)
**Priority:** 4
**File:** `src/token_source/str_to_graphql_token_source.rs`
**Date:** 2026-02-09

**Problem:** Inside the block string lexer, every character checks:
```rust
self.remaining().starts_with("\\\"\"\"")
self.remaining().starts_with("\"\"\"")
```
`remaining()` creates a new slice each time. Adds up for long block strings.
Also replaced in `lex_string()` for the block string detection check.

**Change attempted:** Added `next_is_triple_quote()` and
`next_is_escaped_triple_quote()` helper methods that use direct byte
indexing into `self.source.as_bytes()`. Also removed unnecessary
`#[inline]` from `peek_char()` (inlining is already handled by LLVM
for crate-local calls).

**Benchmark results (two independent back-to-back runs):**

Lexer-only (B6 targets lexer — most direct signal):

| Fixture               | Before (r1) | After (r1) | Before (r2) | After (r2) |
|-----------------------|-------------|------------|-------------|------------|
| lexer/github_schema   | 7.650ms     | 7.741ms    | 7.543ms     | 7.562ms    |
| lexer/large_schema    | 6.287ms     | 6.327ms    | 6.277ms     | 6.230ms    |
| lexer/medium_schema   | 1.328ms     | 1.345ms    | 1.321ms     | 1.349ms    |
| lexer/small_schema    | 28.73us     | 28.13us    | 28.06us     | 27.83us    |
| lexer/starwars_schema | 37.60us     | 37.83us    | 37.10us     | 37.20us    |

Cross-parser comparison (controlled — all parsers in same run):

| Fixture                           | Before (r1) | After (r1) | Before (r2) | After (r2) |
|-----------------------------------|-------------|------------|-------------|------------|
| compare_.../libgraphql_.../github | 21.48ms     | 21.35ms    | 21.30ms     | 21.33ms    |
| compare_.../libgraphql_.../large  | 24.50ms     | 24.36ms    | 24.08ms     | 24.11ms    |
| compare_.../libgraphql_.../medium | 4.823ms     | 4.746ms    | 4.813ms     | 4.878ms    |
| compare_.../graphql_parser/github | 9.600ms     | 9.434ms    | 9.463ms     | 9.392ms    |
| compare_.../graphql_parser/large  | 9.661ms     | 9.587ms    | 9.533ms     | 9.541ms    |
| compare_.../apollo_parser/github  | 14.93ms     | 14.01ms    | 15.01ms     | 14.04ms    |
| compare_.../apollo_parser/large   | 11.05ms     | 10.74ms    | 10.70ms     | 10.74ms    |

Control parsers show the same magnitude of drift as libgraphql_parser
across both runs, confirming B6 has no effect above the noise floor.

**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)

**Verdict:** No measurable performance change across two independent
back-to-back runs. `starts_with()` for short literal patterns is
already well-optimized by the compiler. Code changes reverted.

---

## B7: `shrink_to_fit()` in `compact_buffer()` [LOW-MODERATE]

**Status:** Completed
**Priority:** 3
**File:** `src/graphql_token_stream.rs`
**Date:** 2026-02-08

**Problem:** After every buffer compaction (once per top-level definition),
`shrink_to_fit()` may trigger a reallocation to shrink Vec capacity, only for
the buffer to grow again for the next definition. For 1000+ definitions,
that's 1000+ potential realloc cycles.

**Change made:** Removed `shrink_to_fit()` call. Buffer retains capacity
between definitions.

**Trade-offs:** Slightly higher peak memory (~few KB retained). Negligible.

**Benchmark results (clean back-to-back runs):**

Schema parse (full parse, lexer + parser):

| Fixture               | Before   | After    | Change      |
|-----------------------|----------|----------|-------------|
| schema_parse/github   | 21.55ms  | 19.43ms  | **-9.9%**   |
| schema_parse/large    | 24.34ms  | 21.91ms  | **-10.0%**  |
| schema_parse/medium   | 4.769ms  | 4.155ms  | **-12.9%**  |
| schema_parse/small    | 42.29us  | 42.41us  | ~0%         |
| schema_parse/starwars | 91.03us  | 89.64us  | -1.3%       |

Lexer (expected: no change — B7 is parser-level):

| Fixture             | Before   | After    | Change |
|---------------------|----------|----------|--------|
| lexer/github_schema | 7.679ms  | 7.646ms  | ~0%    |
| lexer/large_schema  | 6.430ms  | 6.318ms  | -1.7%  |

Impact scales with number of top-level definitions: medium (~200
types) and large (~1000 types) showed the biggest gains. Small
schemas with few definitions showed no change, confirming that the
improvement comes from reduced realloc churn in the compaction loop.

**Machine:** Apple M2 Max, 12 cores, 64 GB RAM, macOS (Darwin 23.6.0, arm64)
**Rust:** rustc 1.90.0-nightly (0d9592026 2025-07-19)

**Verdict:** Unexpectedly large improvement (10-13% on medium/large
schemas). Much bigger than the estimated LOW-MODERATE. Keeping.

---

## B8: No `[profile.bench]` in workspace Cargo.toml [LOW — MEASUREMENT ONLY]

**Status:** Pending
**Priority:** 7
**File:** `Cargo.toml` (workspace root)

**Problem:** No benchmark-specific profile. Adding LTO and single codegen unit
helps cross-crate inlining.

**Suggested fix:**
```toml
[profile.bench]
lto = "thin"
codegen-units = 1
```

**Trade-offs:** Slower benchmark compilation. Affects all parsers equally in
comparative benchmarks.

**Est. impact:** LOW for relative comparisons, potentially MODERATE for absolute

---

## B9: Dual UTF-16 column tracking on every character [LOW]

**Status:** Pending (subsumed by B2 if adopted)
**Priority:** 9
**File:** `src/token_source/str_to_graphql_token_source.rs:202`

**Problem:** Every non-newline char updates `curr_col_utf16 += ch.len_utf16()`.
For ASCII `len_utf16()` always returns 1. Small per-char overhead.

**Suggested fix:** Make UTF-16 tracking opt-in via constructor flag, or defer
to B2's lazy position computation.

**Trade-offs:** API change; consumers needing UTF-16 must opt in.

**Est. impact:** LOW standalone — subsumed by B2