kcl-syntax 0.2.164

Lossless syntax trees for KCL
Documentation

KCL Syntax

Crate for the lossless KCL lexer and future parser.

Current Status

Lexer first. Keep it compatible with the existing scanner (ES) used by the winnow implementation.

  • Logos implementation defines tokens and returns LexedSource.
  • kcl-lib depends on kcl-syntax only for running compatibility tests.
  • For a lossless lexer and to prepare for a lossless parser:
    • New tokens capture incomplete or erroneous input, for example UnterminatedString and UnterminatedBlockComment.
    • Open/close delimiters have separate tokens. Compatibility tests map them to the same token in ES.
    • Keywords get their own tokens. Compatibility tests map them to the same Keyword token in ES. This will help with lossless parsing.
    • Type and Function tokens from ES are removed. These tokens are semantic classifications and can be reintroduced in a later semantic-token phase.
    • import is always a keyword. ES has special treatment for import because it can be used as a function name. The lexer should not have this logic; the parser will disambiguate.
    • Escaped newline in a string, such as "a\\\n", follows string recovery and stops at the line boundary.
    • Unsupported Unicode scalars become Unknown instead of old-scanner lexical errors.
  • Added a manual Big List of Naughty Strings (BLNS) robustness runner to check that lexer input preserves text, does not panic, and does not hang. See README and the accompanying justfile.

Design

A lossless lexer and parser for KCL that can be used for both the LSP and the evaluator.

At a high level the main data structure is an immutable tree (Concrete Syntax Tree or CST) that will hold all information from the KCL text: comments, whitespace, code, and erroneous input. The CST will then provide further typed AST wrapper APIs for access and manipulation, such as an AST for the evaluator and a syntax tree for text query and manipulation.

The high-level architecture and tooling are inspired by projects such as rust-analyzer and Roslyn.