Crate pest_typed_derive

source ·
Expand description

Derive statically typed nodes and parser from pest grammar. Aimed to add enhancement to pest for those in need.

When using this crate, remember to add pest_typed as a dependency.

  • Refer to pest for pest’s syntax and built-in rules.
  • Refer to underlying crate pest_typed_generator for how it generates codes.
  • Refer to the derive macro derive_typed_parser to see what it will generates and how to use the macro.

§Auto-skipped Rules

When a rule is not atomic, inner contents that match COMMENT or WHITESPACE will be skipped automatically, and COMMENT is prior to WHITESPACE.

Note that skipped items are taken into consideration when using core::hash::Hash, PartialEq or Eq.

§Generation

We generate documents for automatically generated types, just hover on those types or view them in documents of your project to see them!

§Enumeration of Rules

The same with pest.

It implements Clone, Copy, Debug, Eq, Hash, Ord, PartialEq, PartialOrd.

§APIs

Note: to use pest_typed_derive as a dependency, pest_typed is also needed.

§Pairs API

Note: the simulated Pairs API behaves a bit different from original version. An atomic rule will not contain inner pairs.

§Accesser API

See derive_typed_parser for how to enable Accesser API.

Accesser API is a group of functions, called Accesser Functions, to access referenced rules (or tags, if enabled).

Accesser function is named with the same name of the rule that it’s accessing.

For example, if you have

foo = { bar }

you can access bar from an instance f of foo by calling f.bar().

Given the pest grammar being:

// See https://datatracker.ietf.org/doc/html/rfc4180.html.
file                 = { row ~ (NEWLINE ~ row)* ~ NEWLINE? }
row                  = { item ~ (comma ~ item)* }
item                 = { escaped_item | non_escaped_item }
escaped_item_content = { (legal_ascii | comma | NEWLINE | double_quote{2})* }
escaped_item         = { double_quote ~ escaped_item_content ~ double_quote }
non_escaped_item     = { legal_ascii+ }
legal_ascii          = { '\x20'..'\x21' | '\x23'..'\x2B' | '\x2D'..'\x7E' }
comma                = { "," }
double_quote         = { "\"" }

Here is a basic example on how to access and process referenced rules in a rule using Accesser API:

extern crate alloc;
use alloc::vec::Vec;
use anyhow::Error;
use core::{iter, result::Result};
use pest_typed::TypedParser;
use pest_typed_derive::TypedParser;

/// See https://datatracker.ietf.org/doc/html/rfc4180.html for CSV's format.
#[derive(TypedParser)]
#[grammar = "examples/csv.pest"]
#[emit_rule_reference]
struct Parser;

fn main() -> Result<(), Error> {
    // Prepare input. Output syntax will depend on this.
    let input = "name,age\nTom,10\nJerry,20";
    // Parser output. We may need some extra operations to make it into a table.
    let file = Parser::try_parse::<pairs::file>(input)?;
    // Separate file by row. As pest don't have separator yet, this may be very common.
    // As `file` has 2 references to `row` in total,
    // return value type will also be a tuple with 2 elements.
    // Hide other rules, we'll get `row ~ (_ ~ row)* ~ _?`.
    // Therefore the return value type will be `(&row, Vec<&row>)`.
    let (first_row, following_rows) = file.row();
    // Join rows as a single iterator.
    let rows = iter::once(first_row).chain(following_rows);
    // Sheet.
    let table = rows
        .map(|row| {
            // Separate each row by column.
            // Hide other rules, we'll get `item ~ (_ ~ item)*`.
            // Therefore the return value type will be `(&item, Vec<&item>)`.
            let (first_item, following_items) = row.item();
            // Join columns as a single iterator.
            let items = iter::once(first_item).chain(following_items);
            // Extract string from each cell.
            let row = items.map(|cell| cell.span.as_str()).collect::<Vec<_>>();
            row
        })
        .collect::<Vec<_>>();
    // Recover input from sheet.
    let recovered = table
        .into_iter()
        .map(|row| row.join(","))
        .collect::<Vec<_>>()
        .join("\n");
    assert_eq!(recovered, input);
    Ok(())
}

§Rule Structs

We generate a Rule Struct for each rule. The inner structure is generated from the grammar structure inside the rule (or parsing expression grammar).

And the pest grammar is displayed in short in doc comments using core::fmt::Display so that you can view the structure without switching to .pest files.

§Emitted Fields for Rule Structs

There are three cases related to fields of a generated struct:

  • Emit inner nodes and a span (normal rule, non-atomic rule and compound atomic rule in pest).
  • Emit a span (atomic rule in pest).
  • Emit inner expression (silent rule in pest).
§Example for Rule Structs
use anyhow::Error;
use pest_typed::{RuleStruct, Span, Storage, TypedParser};
use pest_typed_derive::{match_choices, TypedParser};

#[derive(TypedParser)]
#[grammar_inline = r#"
a =  { (b | c) ~ d }
b = _{ "b" ~ b? }
c = @{ "c" }
d = ${ "d" }
"#]
#[emit_rule_reference]
struct Parser;

fn parse(input: &'static str) -> Result<(), Error> {
    let a = Parser::try_parse::<pairs::a>(input)?;
    // With accesser API.
    // Call `b()` to get reference to `b`.
    // Call `c()` to get reference to `c`.
    // Call `d()` to get reference to `d`.
    if let Some(b) = a.b() {
        // `b` is a silent rule, it only contains inner expressions.
        // Its content may be wrapped in a Box when it's one of the nodes that is in a cycle with minimal length.
        // Then its size will always be the size of a Box.
        assert_eq!(std::mem::size_of_val(b), std::mem::size_of::<Box<usize>>());
    } else if let Some(c) = a.c() {
        assert_eq!(c.span.as_str(), "c");
    }
    let d = a.d();
    assert_eq!(d.span.as_str(), "d");

    // With structural API.
    use generics::Choice2;
    // Call `get_matched` to destruct the sequence.
    let (b_or_c, d) = a.get_matched();
    match b_or_c {
        Choice2::_0(b) => assert_eq!(
            std::mem::size_of_val(b.ref_inner()),
            std::mem::size_of::<Box<rules::b>>()
        ),
        Choice2::_1(c) => assert_eq!(std::mem::size_of_val(c), std::mem::size_of::<Span>()),
    }
    // Or match_choices from `pest_typed_derive`.
    // Note that if module `generics` is not in current scope,
    // you should import `generics` from somewhere.
    // This may be very easy to use, but may have a worse experience with IDE.
    match_choices!(b_or_c {
        b => println!("{b:?}"),
        c => println!("{c:?}"),
    });
    assert_eq!(d.content.get_content(), "d");

    Ok(())
}

fn main() -> Result<(), Error> {
    parse("bd")?;
    parse("cd")?;
    Ok(())
}

§Tag Structs

We generate a Rule Struct for each tag. The inner structure is generated from the grammar structure inside the tag.

§Emitted Fields for Tag Structs

Fields:

  • Inner content.
  • Span for matched input.
§Example for Tag Structs

An example using node tags.

use anyhow::Error;
use pest_typed::TypedParser;
use pest_typed_derive::TypedParser;

#[derive(TypedParser)]
#[grammar_inline = r#"
a  = { "a" ~ #b = (b1 | b2) }
b1 = { "bbb" }
b2 = { "cc" }
item = { "x" }
c  = { #a = item ~ ("," ~ #a = item)* }
"#]
#[emit_rule_reference]
#[emit_tagged_node_reference]
struct Parser;

fn main() -> Result<(), Error> {
    let a = Parser::try_parse::<pairs::a>("abbb")?;
    // Tags enabled.
    #[cfg(feature = "grammar-extras")]
    {
        // Access tag `b` with `b()`.
        let _b = a.b();
        // Tag `b` also has accesser functions.

        // if let Some(b1) = b.b1() {
        //     assert_eq!(b1.span.as_str(), "bbb");
        // } else if let Some(b2) = b.b2() {
        //     assert_eq!(b2.span.as_str(), "cc");
        // }
    }
    // Tags disabled.
    #[cfg(not(feature = "grammar-extras"))]
    {
        // Tag `b` is transparent.
        if let Some(b1) = a.b1() {
            assert_eq!(b1.span.as_str(), "bbb");
        } else if let Some(b2) = a.b2() {
            assert_eq!(b2.span.as_str(), "cc");
        }
    }
    Ok(())
}

An example using nested node tags.

use anyhow::Error;
use pest_typed::TypedParser;
use pest_typed_derive::TypedParser;

#[derive(TypedParser)]
#[grammar_inline = r#"a = { "a" ~ #b = (b1 ~ #c = (b2 ~ b3)) } b1 = { "b" } b2 = { "bb" } b3 = { "bbb" }"#]
#[emit_rule_reference]
#[emit_tagged_node_reference]
struct Parser;

fn main() -> Result<(), Error> {
    let a = Parser::try_parse::<pairs::a>("abbbbbb")?;
    #[cfg(feature = "grammar-extras")]
    {
        // With node tags, one can access inner nodes more precisely without defining many rules.
        // This maybe especially useful when you have some references the same rule.

        let _b = a.b();
        // assert_eq!(b.span.as_str(), "bbbbbb");
        // let b1 = b.b1();
        // assert_eq!(b1.span.as_str(), "b");
        // let c = b.c();
        // assert_eq!(c.span.as_str(), "bbbbb");
        // let b2 = c.b2();
        // assert_eq!(b2.span.as_str(), "bb");
        // let b3 = c.b3();
        // assert_eq!(b3.span.as_str(), "bbb");
    }
    #[cfg(not(feature = "grammar-extras"))]
    {
        let b1 = a.b1();
        assert_eq!(b1.span.as_str(), "b");
        let b2 = a.b2();
        assert_eq!(b2.span.as_str(), "bb");
        let b3 = a.b3();
        assert_eq!(b3.span.as_str(), "bbb");
    }
    Ok(())
}

§Normal Nodes

We can handle complexer problems with lower-level API (also named Structual API).

But note that, for a specific grammar, the structure of a Rule Struct indirectly depends on the optimizer in pest, as it uses pest_meta::optimizer::OptimizedExpr, so it may change in the future.

Maybe we can use pest_meta::ast::Expr by default in the future.

Node TypeFieldsFunctions
Non-silent ruleMatched content (wrapped in a Box), which can be used to access matched expression; matched span.See Accesser API.
Exact string (case-sensitive)const fn get_content(&self) to get original string, which requires trait pest_typed::Storage.
Exact string (case-insensitive)Matched content (an &'i str).const fn get_content(&self) to get original string, which requires trait pest_typed::Storage.
Sequence T, Res...Matched content as a tuple.get_matched(&self), which returns referencs of all elemnets (&elemnts...).
Choices T, Res...An enum, whose variants are choices.if_then(&self), several functions _0, _1, etc.
Optional (wrapped in an Option)
Repetition of TMatched content wrapped in a Vec<T>.iter_matched and iter_all (by reference); into_iter_matched and into_iter_all (by value).
Positive predicateMatched content (not consumed).
Negative predicate
PUSH and PEEKMatched content.
POP and POP_ALLPopped span.
DROP
§Sequence

One can use get_matched by reference (or into_matched by value) to access elements within a sequence directly.

Both functions return a tuple.

§Choices

Choices can be matched using match, as long as you find where its type is defined. Auto-generated choices types are named as Choice{n} where n is the count of choices. And every generics used can be found in mod generics.

Similarly, we provide a proc macro match_choices to handle choices with a bit simpler syntax. Note that you need to import module generics to use the macro.

What’s more, we provide several functions that simulate control structure like if (if_then(f)), else-if (else_if(f)) and else (else_then(f)).

Each of those functions accept a function f as argument, if and only if the branch is the actual case, f is called.

The structure must start with if_then(f). And else_if is only available when there are at least two cases that haven’t been handled, so if it’s the last case, use else_then(f) instead.

While else_then(f) returns the final result, if_then(f) and else_if(f) return a temporary helper object.

Using these functions, one can handle those cases one by one in order.

§Example
use anyhow::Error;
use pest_typed::{Storage as _, TypedParser as _};
use pest_typed_derive::{match_choices, TypedParser};

#[derive(TypedParser)]
#[grammar_inline = r#"
a  = { "a" ~ (b1 | b2 | b3) ~ ^"c" }
b1 = { "bbb" }
b2 = { "cc" }
b3 = { "d" }
"#]
#[emit_rule_reference]
struct Parser;

fn parse(input: &str) -> Result<(), Error> {
    let a = Parser::try_parse::<pairs::a>(input)?;
    let (str_a, var_b, c) = a.as_ref();
    assert_eq!(str_a.get_content(), "a");
    match_choices!(var_b {
        b1 => assert_eq!(b1.get_content(), "bbb"),
        b2 => assert_eq!(b2.get_content(), "cc"),
        b3 => assert_eq!(b3.get_content(), "d"),
    });
    // Or equivalently codes below. Sometimes you may need to call `.deref()`.
    use generics::Choice3;
    match var_b {
        Choice3::_0(b1) => assert_eq!(b1.get_content(), "bbb"),
        Choice3::_1(b2) => assert_eq!(b2.get_content(), "cc"),
        Choice3::_2(b3) => assert_eq!(b3.get_content(), "d"),
    };
    // Or codes below. Note that rust compiler won't be aware
    // that only exactly one of those closures you pass will be called,
    // so sometimes compiler will prevent you from using this.
    // This method is no longer recommended and may be deprecated in the future.
    // However, at current this is the only way that you can place a type innotation after the identifier.
    var_b
        .if_then(|b1: &pairs::b1<'_>| assert_eq!(b1.get_content(), "bbb"))
        .else_if(|b2: &pairs::b2<'_>| assert_eq!(b2.get_content(), "cc"))
        .else_then(|b3: &pairs::b3<'_>| assert_eq!(b3.get_content(), "d"));

    assert_eq!(c.get_content(), "c");
    assert!(c.content == "C" || c.content == "c");
    Ok(())
}

fn main() -> Result<(), Error> {
    parse("abbbc")?;
    parse("abbbC")?;
    parse("accc")?;
    parse("accC")?;
    parse("adc")?;
    parse("adC")?;
    Ok(())
}

§Lifetime

Structs have fields that contains references borrowed from the input, so each of them has a lifetime argument 'i.

Sometimes, you may encounter a lifetime error. Do not panic, just consider them seriously. And we’ll fix them if it’s caused by bad API design.

use anyhow::Error;
use pest_typed::TypedParser;
use pest_typed_derive::TypedParser;

#[derive(TypedParser)]
#[grammar_inline = r#"
a  = { "a" ~ (b1 | b2) }
b1 = { "bbb" }
b2 = { "cc" }
"#]
#[emit_rule_reference]
struct Parser;

fn parse(input: &'_ str) -> Result<&'_ str, Error> {
    let a = Parser::try_parse::<pairs::a>(input)?;
    let res = if let Some(b1) = a.b1() {
        b1.span.as_str()
    } else if let Some(b2) = a.b2() {
        b2.span.as_str()
    } else {
        unreachable!("All branches failed in succeeded matching");
    };
    Ok(res)
}

fn main() -> Result<(), Error> {
    let res = parse("abbb")?;
    println!("{}", res);
    Ok(())
}

Macros§

Derive Macros§