Crate teleparse

Source
Expand description

§teleparse

working in progress - Proc-macro powered LL(1) parsing library

This library is comparable to serde for parsing - All you need is define the syntax as data types and call parse() on the root type.

Features:

  • Syntax tree defined by macro attributes on structs and enums - no separate grammar file
  • Proc-macro powered - no separate build step to generate parser code
  • Provide a #[test] to ensure the grammar is LL(1), or fail at runtime
  • Utils for parsing components into primitives like tuples, options, and delimited lists

Credits:

  • The lexer implementation is backed by the ridiculously fast logos library
  • The “Dragon Book” Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman:

Progress:

  • Lexer/Tokens
    • Macro for terminals
  • Parser
    • LL(1) stuff
    • Macros
    • Semantic Tokens (token type applied later by the parser)
      • Tests
      • Documentation
    • Tests
    • Documentation
    • Hooks
  • Utillity types tp
  • Static Metadata
    • Bench
    • Test
    • Documentation
  • mdBook
    • Chapters
      • derive_lexicon
      • derive_syntax
      • using tp
      • semantic tokens
      • hooks (1.1)
      • using parser data
    • second iteration to add links
  • Usability testing
  • crate documentation linking to the book

Traditionally recursive grammar can also be simplified with built-in syntax types.

// with recursion
E  => T E'
E' => + T E' | ε
T  => F T'
T' => * F T' | ε
F  => ( E ) | id

// simplified
E  => T ( + T )*
T  => F ( * F )*
F  => ( E ) | id

Which can then be implemented as:

use teleparse::prelude::*;

#[derive_lexicon]
#[teleparse(ignore(r"\s+"))]
pub enum TokenType {
    #[teleparse(regex(r"\w+"), terminal(Ident))]
    Ident,
    #[teleparse(terminal(
        OpAdd = "+",
        OpMul = "*",
    ))]
    Op,
    /// Parentheses
    #[teleparse(terminal(
        ParenOpen = "(",
        ParenClose = ")"
    ))]
    Paren,
}

#[derive_syntax]
#[teleparse(root)]
struct E(tp::Split<T, OpAdd>); // E -> T ( + T )*
#[derive_syntax]
struct T(tp::Split<F, OpMul>); // T -> F ( * F )*
#[derive_syntax]
enum F {
    Ident(Ident),
    Paren((ParenOpen, Box<E>, ParenClose)),
}

fn main() -> Result<(), teleparse::GrammarError> {
    let source = "(a+b)*(c+d)";
    let _expr = E::parse(source)?;
    
    Ok(())
}

Modules§

lex
Lexical Analysis
parser
prelude
prelude for all traits and common traits when working with this library
syntax
tp

Macros§

first_set
Macro for creating FirstSet from a list of terminals
follow_set
Macro for creating FollowSet from a list of terminals
terminal_set
token_set
Macro to create a token set from a list of token types

Structs§

Parser
Span
A span of source code
Token
Item produced by a lexer, which holds the token type and the source span

Enums§

GrammarError
Error when constructing the grammar (i.e. not actually parsing yet).

Traits§

Lexer
Trait for lexer
Lexicon
Trait for defining the token types of a grammar
Produce
Production
An AST node
Root
ToSpan
Trait for types that can be converted to a Span

Type Aliases§

Pos
Position in the source code

Attribute Macros§

derive_lexicon
Transform an enum into a token type (a lexicon)
derive_syntax
Transform an enum or struct into a parse tree node, as well as deriving the production rule (the AST nodes)

Derive Macros§

ToSpan
Derive ToSpan from a type that stores a ToSpan as its first thing