Module incremental_checkpoint

Expand description

Incremental parser with lexer checkpointing

This module provides a fully incremental parser that uses lexer checkpoints to efficiently re-lex only the changed portions of the input.

§Pipeline integration

Token caching and Parser::from_tokens are now wired together:

parse_with_checkpoints collects parser tokens (trivia-filtered, kind-converted) and caches them alongside the lexer checkpoints.
reparse_from_checkpoint_two_sided assembles a mixed token list from:
- cached tokens before the left checkpoint (unchanged prefix)
- freshly-lexed tokens between left and right checkpoints (affected region)
- cached tokens after the right checkpoint, with byte shift applied (unchanged suffix)
- Then calls Parser::from_tokens to skip re-lexing the unchanged portions.

§Two-Sided Checkpoint Window

The incremental parser uses a two-sided checkpoint window approach (#3527):

Left checkpoint: The nearest checkpoint at or before the edit start
Right checkpoint: The nearest checkpoint at or after the edit end

This replaces the previous fixed heuristic (+100 bytes) with checkpoint-based bounds, providing more precise re-lexing regions and better cache utilization.

The three-phase reparse algorithm ensures correctness by:

Reusing cached tokens from the start up to the left checkpoint
Re-lexing from the left checkpoint through the edit to the right checkpoint
Reusing cached tokens from the right checkpoint to the end

Edge cases are handled gracefully:

No checkpoint before edit → relex from position 0
No checkpoint after edit → relex to source.len()
Checkpoint at edit boundary → minimal re-lexing scope

§Segment-Level Metrics

The incremental parser tracks segment-level metrics to diagnose cache efficiency and fallback behavior. These metrics help understand how well the segment-based caching system is working:

segments_reused_before: Count of segments reused before the edit. A high value indicates good cache coverage for the unchanged prefix.
segments_reused_after: Count of segments reused after the edit. A high value indicates good cache coverage for the unchanged suffix.
segments_invalidated: Count of segments invalidated during edit. A high value indicates high cache churn, which may suggest the need for more granular segments or different checkpoint placement strategies.
full_tail_fallbacks: Count of times we had to relex the entire tail because no cache hit was found after the edit. A high value indicates cache coverage gaps, which may suggest the need for more checkpoints in the tail region.

These metrics can be accessed via CheckpointedIncrementalParser::stats() and formatted using the Display implementation for debugging and monitoring.

Structs§

CheckpointedIncrementalParser: Incremental parser with lexer checkpointing
IncrementalStats: Statistics for incremental parsing
SimpleEdit: Simple edit structure for demos