Skip to main content

Module token

Module token 

Source
Expand description

Core token types and helpers shared across the lexer, parser, and tooling.

This module provides the token types used throughout the lexing and parsing pipeline.
Lex opts for handling more complexity in the lexing stage in order to keep the parsing
stage very simple. This implies in greater token complexity, and this is the origin of
several token types.

Token Layers

Even though the grammar operates mostly over lines, we have multiple layers of tokens:

Structural Tokens:
    Indent, Dedent. These are semantic tokens that represent indentation level changes,
    similar to open/close braces in more c-style languages. They are produced by the
    semantic indentation transformation from raw Indentation tokens. See
    [semantic_indentation](crate::lex::lexing::transformations::semantic_indentation).

Core Tokens:
    Character/word level tokens. They are produced by the logos lexer. See [core](core) module
    for the complete list of core tokens. Grammar: [specs/v1/grammar-core.lex].

Line Tokens:
    A group of core tokens in a single line, and used in the actual parsing. See
    [line](line) module. The LineType enum is the definitive set of line classifications
    (blank, annotation start/end, data, subject, list, subject-or-list-item, paragraph,
    dialog, indent, dedent). Grammar: [specs/v1/grammar-line.lex].

Inline Tokens:
    Span-based tokens that operate at the character level within text content. Unlike
    line-based tokens, inline tokens can start and end at arbitrary positions and can be
    nested within each other. See [inline](inline) module. Grammar: [specs/v1/grammar-inline.lex].

Line Container Tokens:
    A vector of line tokens or other line container tokens. This is a tree representation
    of each level's lines. This is created and used by the parser. See [to_line_container]
    module.

Synthetic Tokens:
    Tokens that are not produced by the logos lexer, but are created by the lexing pipeline
    to capture context information from parent to children elements so that parsing can be
    done in a regular single pass.

    Context Injection: Synthetic tokens enable single-pass parsing by injecting parent context
    into child scopes. This avoids making the grammar context-sensitive and eliminates the need
    for tree walking during parsing.

    Example - Session Preceding Blank Lines: Sessions require preceding blank lines, but for a
    session that is the first element in its parent, that preceding blank line belongs to the
    parent session's scope. A synthetic BlankLine token is injected at the start of the child
    scope to represent this parent context, allowing the parser to check for the required
    preceding blank line without looking upward in the tree.

    Properties: Synthetic tokens are not consumed during parsing and do not become AST nodes.
    They exist solely to inform parsing decisions. Since they have no source text, they carry
    no byte range information.

Re-exports§

pub use core::Token;
pub use formatting::detokenize;
pub use formatting::ToLexString;
pub use inline::InlineKind;
pub use line::LineContainer;
pub use line::LineToken;
pub use line::LineType;
pub use normalization::utilities;

Modules§

core
Token definitions for the lex format
formatting
Detokenizer for the lex format
inline
Inline token types and specifications
line
Line-based token types for the lexer pipeline
normalization
Token normalization utilities
testing
Test factories for creating locations and spanned tokens succinctly.
to_line_container
Tree Builder - Builds hierarchical LineContainer tree from LineTokens