Trait lady_deirdre::syntax::Node

source ·

pub trait Node: Sized + 'static {
    type Token: Token;
    type Error: From<SyntaxError> + Sized + 'static;

    // Required method
    fn new<'code>(
        rule: SyntaxRule,
        session: &mut impl SyntaxSession<'code, Node = Self>
    ) -> Self;

    // Provided method
    fn parse<'code>(
        cursor: impl TokenCursor<'code, Token = Self::Token>
    ) -> SyntaxBuffer<Self> { ... }
}

Expand description

A trait that specifies syntax tree node kind and provides a syntax grammar parser.

An API user implements this trait to specify Programming Language syntax grammar and the type of the syntax tree node.

This trait is supposed to be implemented on the Rust enum type with variants representing tree node kinds, but this is not a strict requirement. From the functional sense the main purpose of the Node implementation is to provide a syntax parser that will re-parse sequences of Tokens by interacting with arbitrary SyntaxSession interface that, in turn, manages parsing process.

An API user is encouraged to implement this trait using helper Node macro-derive on enum types by specifying syntax grammar directly on enum variants through the macros attributes.

use lady_deirdre::{
    syntax::{Node, SyntaxError, SyntaxTree},
    lexis::{SimpleToken, TokenRef},
    Document,
};

#[derive(Node, PartialEq, Debug)]
#[token(SimpleToken)]
#[error(SyntaxError)]
#[skip($Whitespace)]
enum NumbersInParens {
    #[root]
    #[rule($ParenOpen & (numbers: $Number)*{$Symbol} & $ParenClose)]
    Root {
        numbers: Vec<TokenRef>,
    },
}

let doc = Document::<NumbersInParens>::from("(3, 4, 5)");

let root = doc.root().deref(&doc).unwrap();

match root {
    NumbersInParens::Root { numbers } => {
        assert_eq!(
            numbers.iter().map(|num| num.string(&doc).unwrap()).collect::<Vec<_>>(),
            vec!["3", "4", "5"],
        );
    },
}

An API user can implement the Node trait manually too. For example, using 3rd party parser libraries. See Node::new function specification for details.

Required Associated Types§

source

type Token: Token

Describes programming language’s lexical grammar.

source

type Error: From<SyntaxError> + Sized + 'static

Describes syntax/semantic error type of this programming language grammar.

Required Methods§

source

fn new<'code>( rule: SyntaxRule, session: &mut impl SyntaxSession<'code, Node = Self> ) -> Self

Parses a branch of the syntax tree from the sequence of Tokens using specified parse rule, and returns an instance of the top node of the branch.

This is a low-level API function.

An API user encouraged to use Node macro-derive to implement this trait automatically based on a set of LL(1) grammar rules, but you can implement it manually too.

You need to call this function manually only if you want to implement an extension API to this crate. In this case you should also prepare a custom implementation of the SyntaxSession trait. See SyntaxSession documentation for details.

Algorithm Specification:

The Algorithm lay behind this implementation is a Top-down Parser that parses a context-free language of LL grammar class with potentially unlimited lookahead. Note, that due to unlimited lookahead characteristic it could be a wide class of recursive-descending grammars including PEG grammars.
The Algorithm reads as many tokens from the input sequence as needed using session’s TokenCursor lookahead operations to recognize appropriate parse rule.
The Algorithm advances TokenCursor to as many tokens as needed to exactly match parsed rule.
To descend into a parsing subrule the Algorithm calls session’s descend function that consumes subrule’s kind and returns a weak reference into the rule’s parsed Node.
The Algorithm never calls descend function with ROOT_RULE. The Root Rule is not a recursive rule by design.
The Specification does not limit the way the Algorithm maps rule values to specific parsing function under the hood. This mapping is fully encapsulated by the Algorithm internals. In other words the “external” caller of the function new does not have to be aware of the mapping between the rule values and the types of produced nodes. The only exception from this is a ROOT_RULE value. If the “external” caller invokes new function with the ROOT_RULE parameter, the Algorithm guaranteed to enter the entire syntax tree parsing procedure.
When the function new invoked, the Algorithm guaranteed to complete parsing procedure regardless of input sequence, and to return a valid instance of Node. If the input sequence contains syntax errors, the Algorithm recovers these error in a way that is not specified. In this case the Algorithm could call session’s error function to register syntax error.

use lady_deirdre::{
    syntax::{Node, NodeRef, SyntaxSession, SyntaxRule, SyntaxError, SyntaxTree, ROOT_RULE},
    lexis::{SimpleToken, TokenCursor},
    Document,
};

// A syntax of embedded parentheses: `(foo (bar) baz)`.
enum Parens {
   Root { inner: Vec<NodeRef> },
   Parens { inner: Vec<NodeRef> },
   Other,
};
  
const PARENS_RULE: SyntaxRule = &1;
const OTHER_RULE: SyntaxRule = &2;

impl Node for Parens {
    type Token = SimpleToken;
    type Error = SyntaxError;

    fn new<'code>(
        rule: SyntaxRule,
        session: &mut impl SyntaxSession<'code, Node = Self>,
    ) -> Self {
        // Rule dispatcher that delegates parsing control flow to specialized parse
        // functions.

        if rule == ROOT_RULE {
            return Self::parse_root(session);
        }

        if rule == PARENS_RULE {
            return Self::parse_parens(session);
        }

        // Otherwise the `rule` is an `OTHER_RULE`.

        Self::parse_other(session)
    }

}

impl Parens {
    fn parse_root<'code>(session: &mut impl SyntaxSession<'code, Node = Self>) -> Self {
        let mut inner = vec![];

        loop {
            // Analysing of the next incoming token.
            match session.token(0) {
                Some(&SimpleToken::ParenOpen) => {
                    inner.push(session.descend(PARENS_RULE));
                }

                Some(_) => {
                    inner.push(session.descend(OTHER_RULE));
                }

                None => break,
            }
        }

        Self::Root { inner }
    }

    // Parsing a pair of parenthesis(`(...)`).
    fn parse_parens<'code>(session: &mut impl SyntaxSession<'code, Node = Self>) -> Self {
        let mut inner = vec![];

        // The first token is open parenthesis("("). Consuming it.
        session.advance();

        loop {
            // Analysing of the next incoming token.
            match session.token(0) {
                Some(&SimpleToken::ParenOpen) => {
                    inner.push(session.descend(PARENS_RULE));
                }

                // Close parenthesis(")") found. Parsing process finished successfully.
                Some(&SimpleToken::ParenClose) => {
                    // Consuming this token.
                    session.advance();

                    return Self::Parens { inner };
                }

                Some(_) => {
                    inner.push(session.descend(OTHER_RULE));
                }

                None => break,
            }
        }

        // Parse process has failed. We didn't find closing parenthesis.

        // Registering a syntax error.
        let span = session.site_ref(0)..session.site_ref(0);
        session.error(SyntaxError::UnexpectedEndOfInput {
            span,
            context: "Parse Parens",
        });

        // Returning what we have parsed so far.
        Self::Parens { inner }
    }

    // Parsing any sequence of tokens except parenthesis(`foo bar`).
    fn parse_other<'code>(session: &mut impl SyntaxSession<'code, Node = Self>) -> Self {
        // The first token is not a parenthesis token. Consuming it.
        session.advance();

        loop {
            // Analysing of the next incoming token.
            match session.token(0) {
                Some(&SimpleToken::ParenOpen) | Some(&SimpleToken::ParenClose) | None => {
                    break;
                }

                Some(_) => {
                    // The next token is not a parenthesis token. Consuming it.
                    session.advance();
                },
            }
        }

        Self::Other
    }
}

let doc = Document::<Parens>::from("foo (bar (baz) (aaa) ) bbb");

// The input text has been parsed without errors.
assert_eq!(doc.errors().count(), 0);

Provided Methods§

source

fn parse<'code>( cursor: impl TokenCursor<'code, Token = Self::Token> ) -> SyntaxBuffer<Self>

A helper function to immediately parse a subsequent of tokens in non-incremental way.

use lady_deirdre::{
    lexis::{SimpleToken, Token, SourceCode},
    syntax::{SimpleNode, Node, SyntaxTree},
};

let tokens = SimpleToken::parse("(foo bar)");

let sub_sequence = tokens.cursor(0..5); // A cursor into the "(foo bar" substring.

let syntax = SimpleNode::parse(sub_sequence);

// Close parenthesis is missing in this subsequence, so the syntax tree of the subsequence
// has syntax errors.
assert!(syntax.errors().count() > 0);

Implementors§

source §

Trait lady_deirdre::syntax::Node

Required Associated Types§

type Token: Token

type Error: From<SyntaxError> + Sized + 'static

Required Methods§

fn new<'code>( rule: SyntaxRule, session: &mut impl SyntaxSession<'code, Node = Self> ) -> Self

Provided Methods§

fn parse<'code>( cursor: impl TokenCursor<'code, Token = Self::Token> ) -> SyntaxBuffer<Self>

Implementors§

impl Node for SimpleNode

type Token = SimpleToken

type Error = SyntaxError

impl<T: Token> Node for NoSyntax<T>

type Token = T

type Error = SyntaxError