Skip to main content

Module incremental_checkpoint

Module incremental_checkpoint 

Source
Expand description

Incremental parser with lexer checkpointing

This module provides a fully incremental parser that uses lexer checkpoints to efficiently re-lex only the changed portions of the input.

§Pipeline integration

Token caching and Parser::from_tokens are now wired together:

  1. parse_with_checkpoints collects parser tokens (trivia-filtered, kind-converted) and caches them alongside the lexer checkpoints.
  2. reparse_from_checkpoint_two_sided assembles a mixed token list from:
    • cached tokens before the left checkpoint (unchanged prefix)
    • freshly-lexed tokens between left and right checkpoints (affected region)
    • cached tokens after the right checkpoint, with byte shift applied (unchanged suffix)
    • Then calls Parser::from_tokens to skip re-lexing the unchanged portions.

§Two-Sided Checkpoint Window

The incremental parser uses a two-sided checkpoint window approach (#3527):

  • Left checkpoint: The nearest checkpoint at or before the edit start
  • Right checkpoint: The nearest checkpoint at or after the edit end

This replaces the previous fixed heuristic (+100 bytes) with checkpoint-based bounds, providing more precise re-lexing regions and better cache utilization.

The three-phase reparse algorithm ensures correctness by:

  1. Reusing cached tokens from the start up to the left checkpoint
  2. Re-lexing from the left checkpoint through the edit to the right checkpoint
  3. Reusing cached tokens from the right checkpoint to the end

Edge cases are handled gracefully:

  • No checkpoint before edit → relex from position 0
  • No checkpoint after edit → relex to source.len()
  • Checkpoint at edit boundary → minimal re-lexing scope

§Segment-Level Metrics

The incremental parser tracks segment-level metrics to diagnose cache efficiency and fallback behavior. These metrics help understand how well the segment-based caching system is working:

  • segments_reused_before: Count of segments reused before the edit. A high value indicates good cache coverage for the unchanged prefix.

  • segments_reused_after: Count of segments reused after the edit. A high value indicates good cache coverage for the unchanged suffix.

  • segments_invalidated: Count of segments invalidated during edit. A high value indicates high cache churn, which may suggest the need for more granular segments or different checkpoint placement strategies.

  • full_tail_fallbacks: Count of times we had to relex the entire tail because no cache hit was found after the edit. A high value indicates cache coverage gaps, which may suggest the need for more checkpoints in the tail region.

These metrics can be accessed via CheckpointedIncrementalParser::stats() and formatted using the Display implementation for debugging and monitoring.

Structs§

CheckpointedIncrementalParser
Incremental parser with lexer checkpointing
IncrementalStats
Statistics for incremental parsing
SimpleEdit
Simple edit structure for demos