Expand description
Incremental parser with lexer checkpointing
This module provides a fully incremental parser that uses lexer checkpoints to efficiently re-lex only the changed portions of the input.
§Pipeline integration
Token caching and Parser::from_tokens are now wired together:
parse_with_checkpointscollects parser tokens (trivia-filtered, kind-converted) and caches them alongside the lexer checkpoints.reparse_from_checkpoint_two_sidedassembles a mixed token list from:- cached tokens before the left checkpoint (unchanged prefix)
- freshly-lexed tokens between left and right checkpoints (affected region)
- cached tokens after the right checkpoint, with byte shift applied (unchanged suffix)
- Then calls
Parser::from_tokensto skip re-lexing the unchanged portions.
§Two-Sided Checkpoint Window
The incremental parser uses a two-sided checkpoint window approach (#3527):
- Left checkpoint: The nearest checkpoint at or before the edit start
- Right checkpoint: The nearest checkpoint at or after the edit end
This replaces the previous fixed heuristic (+100 bytes) with checkpoint-based bounds, providing more precise re-lexing regions and better cache utilization.
The three-phase reparse algorithm ensures correctness by:
- Reusing cached tokens from the start up to the left checkpoint
- Re-lexing from the left checkpoint through the edit to the right checkpoint
- Reusing cached tokens from the right checkpoint to the end
Edge cases are handled gracefully:
- No checkpoint before edit → relex from position 0
- No checkpoint after edit → relex to
source.len() - Checkpoint at edit boundary → minimal re-lexing scope
§Segment-Level Metrics
The incremental parser tracks segment-level metrics to diagnose cache efficiency and fallback behavior. These metrics help understand how well the segment-based caching system is working:
-
segments_reused_before: Count of segments reused before the edit. A high value indicates good cache coverage for the unchanged prefix. -
segments_reused_after: Count of segments reused after the edit. A high value indicates good cache coverage for the unchanged suffix. -
segments_invalidated: Count of segments invalidated during edit. A high value indicates high cache churn, which may suggest the need for more granular segments or different checkpoint placement strategies. -
full_tail_fallbacks: Count of times we had to relex the entire tail because no cache hit was found after the edit. A high value indicates cache coverage gaps, which may suggest the need for more checkpoints in the tail region.
These metrics can be accessed via CheckpointedIncrementalParser::stats()
and formatted using the Display implementation for debugging and monitoring.
Structs§
- Checkpointed
Incremental Parser - Incremental parser with lexer checkpointing
- Incremental
Stats - Statistics for incremental parsing
- Simple
Edit - Simple edit structure for demos