Parse

Trait Parse 

Source
pub trait Parse: Parser {
    // Provided methods
    fn parse(
        tokens: &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>,
    ) -> Result<Self, Error> { ... }
    fn parse_all(
        tokens: &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>,
    ) -> Result<Self, Error> { ... }
    fn parse_with<T>(
        tokens: &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>,
        f: impl FnOnce(Self, &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>) -> Result<T, Error>,
    ) -> Result<T, Error> { ... }
}
Expand description

This trait provides the user facing API to parse grammatical entities. It is implemented for anything that implements the Parser trait. The methods here encapsulating the iterator that is used for parsing into a transaction. This iterator is always Copy. Instead using a peekable iterator or implementing deeper peeking, parse clones this iterator to make access transactional, when parsing succeeds then the transaction becomes committed, otherwise it is rolled back.

This trait cannot be implemented by user code.

§Cookbook

§Parsing

Parsing is done over a ShadowCountedIter<<TokenStream as IntoIterator>::IntoIter> which is shortened as TokenIter. TokenIter are created with ToTokens::to_token_iter() or ToTokens::into_token_iter(), which should be implemented for for anyhing you need.

The main trait for parsing a TokenIter is the Parse trait. This traits methods are all default implemented and can be used as is. Parse is implemented for all types that implement Parser and ToTokens. Parser is the trait that has to be implemented for each type that should be parsed.

The IParse trait is implemented for TokenIter, this calls Parse::parse() in a convenient way when types can be inferred.

ToTokens complements Parser as it will turn any parseable entity back into a TokenStream.

§Parse::parse_with() transformations

The Parse::parse_with() method is used for parsing in more complex situations. In the simplest case it can be used to validate the values of a parsed type. More complex usage will fill in HiddenState and other not parsed members or construct completely new types from parsed entities.

§ToTokens

The ToTokens trait is responsible for emiting tokens to a TokenStream. Unlike the trait from the quote crate we define ToTokens for a lot more types and provide more methods.

Notably it provides the ToTokens::to_token_iter() and ToTokens::into_token_iter() methods which create the entry points for parsing.

When textual representation of a parsed entity is required then ToTokens::tokens_to_string can be used. The standard Display trait is implemented on top of that, as such every type that has ToTokens implented can be printed as text.

§Composition and Type Aliases

For moderately complex types it is possible to use composition with Cons, Either and other container types instead defining new enums or structures.

It is recommended to alias such composed types to give them useful names. This can be used for creating grammars on the fly without any boilerplate code.

§The [unsynn!{}] Macro

The easiest way to describe grammars is to use the [unsynn!{}] macro. This allows to define grammars just by defining enums and structures. The macro will generate the necessary implementations for Parser and ToTokens but offers no flexibility in what is generated. It is possible to add HiddenState<T> members to add non syntactic entries to there custom structs.

§Implementing Parsers

§Transactions

The Parse Trait parses items within a transaction. This is with the Transaction::transaction() method. Internally this clones the iterator, calls the Parser::parser() method and copies the cloned iterator back to the original on success. This means that if a parser fails, the input is reset to the state before the parser was called. For efficiency reasons the Parser::parser() methods are themself not transactional, when they fail they leave the input in a consumed state.

When one wants to manually parse alternatives within a Parser (like in a enum) it must be manually called within a transaction. This is only necessary when the parsed entity is compound and not the final alternative. For simple parsable entities one can just call the Parse::parse() method which already provides the transaction.

enum MyEnum {
    // Complex variant
    Tuple(i32, i32),
    // Simple variant
    Simple(i32),
    // Another simple variant
    Another(Ident),
}

impl Parser for MyEnum {
    fn parser(input: &mut TokenIter) -> Result<Self> {
        // Use [`Transaction::transaction()`] to parse the tuple variant
        if let Ok(tuple) = input.transaction(
            |mut trans_input|
            Ok(MyEnum::Tuple(
                i32::parser(&mut trans_input)?,
                i32::parser(&mut trans_input)?,
            ))
        ) {
            Ok(tuple)
        } else
        // Try to parse the simple variant
        // can use the `Parse::parse()` or `IParse::parse()` method directly since a
        // single entity will be put in a transaction by those.
        if let Ok(i) = input.parse() {
            Ok(MyEnum::Simple(i))
        } else {
            // Try to parse the last variant
            // this can use the `Parser::parser()` method since this is the final alternative
            Ok(MyEnum::Another(Ident::parser(input)?))
        }
    }
}

§Different ways to implement Parsers

There are different approaches how one can implement parsers. Each has its own advantages and disadvantages. unsynn supports both, so one can freely mix whatever makes most sense in a particular case.

unsynn uses a rather simple, first come - first served approach when parsing. Parsers may be a subset or share a common prefix with other parsers. This needs some attention.

In the case where parsers are subsets to other parsers and one puts them into a disjunction in a unsynn! { enum ..} or in a Either combinator the more specific case must come first, otherwise it will never match.

// Ident must come first since TokenTree matches any token.
type Example = Either<Ident, TokenTree>;

For the other case where parsers sharing longer prefixes (this should rarely happen in practice) it may benefit performance to break these into a type with the shared prefix that dispatches on the distinct parts.

§Exact AST Representation

One approach is to define a structures that reflects the AST of the grammar exactly. This is what the [unsynn!{}] macro and composition does. The program later works with the parsed structure directly. The advantage is that Parser and ToTokens are simple and come for free and that the source structure of the AST stays available.


unsynn!{
    // define a list of Ident = "LiteralString",.. assignments
    struct Assignment {
        id: Ident,
        _equal: Assign,
        value: LiteralString,
    }

    struct AssignmentList {
        list: DelimitedVec<Assignment, Comma>
    }
}

When the implementation generated by the [unsynn!{}] macro is not sufficient one can implement Parser and ToTokens for custom structs and enums manually.

§High level representation

Another approach is to represent the data more in the way further processing requires. This simplifies working with the data but one has to implement the Parser and ToTokens traits manually. Sometimes the Parse::parse_with() method will become useful in such cases.

// We could go with `unsynn!{struct Assignment{...}}` as above here. But lets use composition
// as example here. This stays internal so its complexity isnt exposed.
type Assignment = Cons<Ident, Assign, LiteralString>;

// Here we'll parse the list of assignments into a structure that represents the
// data in a way thats easier to use from a rust program
#[derive(Default)]
struct AssignmentList {
    // each 'Ident = LiteralString'
    list: Vec<(Ident, String)>,
    // We want to have a fast lookup to the entries
    lookup: HashMap<Ident, usize>,
}

impl Parser for AssignmentList {
    fn parser(input: &mut TokenIter) -> Result<Self> {
        let mut assignment_list = AssignmentList::default();

        // We construct the `AssignmentList` by parsing the content, appending and processing it.
        while let Ok(assignment) = Delimited::<Assignment, Comma>::parse(input) {
            assignment_list.list.push((
                assignment.value.first.clone(),
                // Create a String without the enclosing double quotes
                assignment.value.third.as_str().to_string()
            ));
            // add it to the lookup
            assignment_list.lookup.insert(
                assignment.value.first.clone(),
                assignment_list.list.len()-1
            );
            // No Comma, no more assignments
            if assignment.delimiter.is_none() {
                break;
            }
        }
        Ok(assignment_list)
    }
}

impl ToTokens for AssignmentList {
    fn to_tokens(&self, output: &mut TokenStream) {
        for a in &self.list {
            a.0.to_tokens(output);
            Assign::new().to_tokens(output);
            LiteralString::from_str(&a.1).to_tokens(output);
            Comma::new().to_tokens(output);
        }
    }
}

§Errors

Unsynn parsers return the first error encountered. It does not try error recovery on its own or being smart about what may have caused an error (typos, missing semicolons etc). The only exception to this is when parsing disjunct entities (Either or other enums) where errors are expected to happen on the first branches. When any branch succeeds the error is dropped and parsing goes on, when all branches fail then that error which made the most progress is returned. Progress is tracked with the ShadowCountedIter. This is implemented for enums created with the unsynn! macro as well for the Either::parser() method. This covers all normal cases.

When one needs to implement disunct parsers manually this has to be taken into account. This is then done by creating an Error with ErrorKind::NoError by let mut err = Error::no_error() within the Parser::parse implementation. Then any parser that is called subsequently tries to err.upgrade(Item::parser(..)) which handles storing the error which made the most progress. Eventually a Ok(...) or the upgraded Err(err) is returned. For details look at the source of Either::parser.

Errors carry a iterator starting at the location where the error happened. This can be used for further inspection.

Some parser types in unsynn are ZST’s this means they don’t carry the token they parsed and consequently the have no Span thus the location of an error will be unavailable for them. If that poses to be a problem this might be revised in future unsynn versions.

In other cases where spans are wrong this is considered a happy accident, please fill a Bug or send a PR fixing this issue.

§Error Recovery

Trying to recover from syntax errors is usually not required/advised for compiled languages. When there is a syntax error, report it to the user and let them fix the source code.

Recoverable parsing would need knowledge of the grammar (or even context) being parsed. This can not be supported by unsynn itself as it does not define any grammars. When one needs recoverable parsers then this has to implemented into the grammar definition. Future versions of unsynn may provide some tools to assist with this. The actual approach is still in discussion.

§Implementation/Performance Notes

Unsynn is (as of now) implemented as recursive descent PEG with backtracking. This has worst-case exponential complexity, there is a plan to fix that in future releases. Currently to avoid these cases it is recommended to formulate disjunctive parsers so that they fail early and don’t share long prefixes.

For example things like Either<Cons<LongValidCode, OneThing>, Cons<LongValidCode, OtherThing>> should be rewritten as Cons<LongValidCode, Either<OneThing, OtherThing>>.

§Stability Guarantees

Unsynn is in development, nevertheless we give some stability promises. These will only be broken when we discover technical reasons that make it infeasible to keep them up. Eventually things in the unstable category below will move up to stable.

§Stable

  • Operator name definitions in operator::names
    These follow common (rust) naming, if not this will be fixed, otherwise you can rely on these to be stable.
  • Functionality
    Existing types and parsers are there to stay. A few things especially when they are freshly added may be shaken out and refined/extended but existing functionality should be preserved. In some cases changes may require minor adjustments on code using it.

§Unstable

  • modules
    The module organization is still in flux and may be refactored at any time. This shouldn’t matter because unsynn reexports everything at the crate root. The only exception here is the operators::names which we try to keep stable.
  • internal representation
    Don’t rely on the internal representations there are some plans and ideas to change these. Some types that are currently ZST may become stateful, collections may become wrapped as Rc<Vec<T>> etc.
  • traits
    Currently there are the main traits Parse, Parser, IParse and ToTokens. In future these may become refactored and split into smaller traits and some more may be added. This will then require some changes for the user. The existing functionality will be preserved nevertheless.
  • trait bounds
    We don’t have extra trait bounds on parsed type in the current implementation. This will likely change in the future for improving the parser. Expect that types eventually should be at least Clone + Hash + PartialEq. Possibly some helper trait needs to be implemented too. Details will be worked out later.

Provided Methods§

Source

fn parse( tokens: &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>, ) -> Result<Self, Error>

This is the user facing API to parse grammatical entities. Calls a parser() within a transaction. Commits changes on success and returns the parsed value.

§Errors

When the parser returns an error the transaction is rolled back and the error is returned.

Source

fn parse_all( tokens: &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>, ) -> Result<Self, Error>

Exhaustive parsing within a transaction. This is a convenience method that implies a EndOfStream at the end. Thus it will error if parsing is not exhaustive.

§Errors

When the parser returns an error or there are tokens left in the stream the transaction is rolled back and a error is returned.

Source

fn parse_with<T>( tokens: &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>, f: impl FnOnce(Self, &mut ShadowCountedIter<'_, <TokenStream as IntoIterator>::IntoIter>) -> Result<T, Error>, ) -> Result<T, Error>

Parse a value in a transaction, pass it to a FnOnce(Self, &mut TokenIter) -> Result<T> closure which creates a new result or returns an Error.

This method is a very powerful tool as it allows anything from simple validations to complete transformations into a new type. You may find this useful to implement parsers for complex types that need some runtime logic.

The closures first argument is the parsed value and the second argument is the transactional iterator pointing after parsing Self. This can be used to create errors or parse further. In many cases it can be ignored with _.

§Example
// A parser that parses a comma delimited list of anything but commas
// and stores these lexical sorted.
struct OrderedStrings {
    strings: Vec<String>
}

impl Parser for OrderedStrings {
    fn parser(tokens: &mut TokenIter) -> Result<Self> {
        // Our input is CommaDelimitedVec<String>, we'll transform that into
        // OrderedStrings.
        Parse::parse_with(tokens, |this : CommaDelimitedVec<String>, _| {
            let mut strings: Vec<String> = this.into_iter()
                .map(|s| s.value)
                .collect();
            strings.sort();
            Ok(OrderedStrings { strings })
        })
    }
}
let mut input = "a, d, b, e, c,".to_token_iter();
let ordered_strings: OrderedStrings = input.parse().unwrap();
assert_eq!(ordered_strings.strings, vec!["a", "b", "c", "d", "e"]);
§Errors

When the parser or the closure returns an error, the transaction is rolled back and the error is returned.

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§

Source§

impl<T> Parse for T
where T: Parser,

Parse is implemented for anything that implements Parser.