Expand description
§teleparse
working in progress - Proc-macro powered LL(1) parsing library
This library is comparable to serde
for parsing - All you need is define the syntax
as data types and call parse()
on the root type.
Features:
- Syntax tree defined by macro attributes on structs and enums - no separate grammar file
- Proc-macro powered - no separate build step to generate parser code
- Provide a
#[test]
to ensure the grammar is LL(1), or fail at runtime - Utils for parsing components into primitives like tuples, options, and delimited lists
Credits:
- The lexer implementation is backed by the ridiculously fast logos library
- The “Dragon Book” Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman:
Progress:
-
Lexer/Tokens
- Macro for terminals
-
Parser
- LL(1) stuff
- Macros
-
Semantic Tokens (token type applied later by the parser)
- Tests
- Documentation
- Tests
- Documentation
- Hooks
-
Utillity types
tp
-
Static Metadata
- Bench
- Test
- Documentation
-
mdBook
-
Chapters
- derive_lexicon
- derive_syntax
-
using
tp
- semantic tokens
- hooks (1.1)
- using parser data
- second iteration to add links
-
Chapters
- Usability testing
- crate documentation linking to the book
Traditionally recursive grammar can also be simplified with built-in syntax types.
// with recursion
E => T E'
E' => + T E' | ε
T => F T'
T' => * F T' | ε
F => ( E ) | id
// simplified
E => T ( + T )*
T => F ( * F )*
F => ( E ) | id
Which can then be implemented as:
use teleparse::prelude::*;
#[derive_lexicon]
#[teleparse(ignore(r"\s+"))]
pub enum TokenType {
#[teleparse(regex(r"\w+"), terminal(Ident))]
Ident,
#[teleparse(terminal(
OpAdd = "+",
OpMul = "*",
))]
Op,
/// Parentheses
#[teleparse(terminal(
ParenOpen = "(",
ParenClose = ")"
))]
Paren,
}
#[derive_syntax]
#[teleparse(root)]
struct E(tp::Split<T, OpAdd>); // E -> T ( + T )*
#[derive_syntax]
struct T(tp::Split<F, OpMul>); // T -> F ( * F )*
#[derive_syntax]
enum F {
Ident(Ident),
Paren((ParenOpen, Box<E>, ParenClose)),
}
fn main() -> Result<(), teleparse::GrammarError> {
let source = "(a+b)*(c+d)";
let _expr = E::parse(source)?;
Ok(())
}
Modules§
- lex
- Lexical Analysis
- parser
- prelude
- prelude for all traits and common traits when working with this library
- syntax
- tp
Macros§
- first_
set - Macro for creating
FirstSet
from a list of terminals - follow_
set - Macro for creating
FollowSet
from a list of terminals - terminal_
set - token_
set - Macro to create a token set from a list of token types
Structs§
- Parser
- Span
- A span of source code
- Token
- Item produced by a lexer, which holds the token type and the source span
Enums§
- Grammar
Error - Error when constructing the grammar (i.e. not actually parsing yet).
Traits§
- Lexer
- Trait for lexer
- Lexicon
- Trait for defining the token types of a grammar
- Produce
- Production
- An AST node
- Root
- ToSpan
- Trait for types that can be converted to a
Span
Type Aliases§
- Pos
- Position in the source code
Attribute Macros§
- derive_
lexicon - Transform an enum into a token type (a lexicon)
- derive_
syntax - Transform an enum or struct into a parse tree node, as well as deriving the production rule (the AST nodes)
Derive Macros§
- ToSpan
- Derive ToSpan from a type that stores a ToSpan as its first thing