A unique feature of unsynn is that one can define a parser as a composition of other
parsers on the fly without the need to define custom structures. This is done by using the
Cons and Either types. The Cons type is used to define a parser that is a
conjunction of two to four other parsers, while the Either type is used to define a
parser that is a disjunction of two to four other parsers.
This module provides parsers for types that contain possibly multiple values. This
includes stdlib types like Option, Vec, Box, Rc, RefCell and types
for delimited and repeated values with numbered repeats.
For easier composition we define the Delimited type here which is a T
followed by a optional delimiting entity D. This is used by the
DelimitedVec type to parse a list of entities separated by a delimiter.
This module contains the fundamental parsers. These are the basic tokens from
proc_macro2/proc_macro
and a few other ones defined by unsynn. These are the terminal entities when parsing tokens.
Being able to parse TokenTree and TokenStream allows one to parse opaque entities where
internal details are left out. The Cached type is used to cache the string representation
of the parsed entity. The Nothing type is used to match without consuming any tokens.
The Except type is used to match when the next token does not match the given type.
The EndOfStream type is used to match the end of the stream when no tokens are left.
The HiddenState type is used to hold additional information that is not part of the parsed syntax.
Groups are a way to group tokens together. They are used to represent the contents between
(), {}, [] or no delimiters at all. This module provides parser implementations for
opaque group types with defined delimiters and the GroupContaining types that parses the
surrounding delimiters and content of a group type.
This module provides a set of literal types that can be used to parse and tokenize
literals. The literals are parsed from the token stream and can be used to represent the
parsed value. unsynn defines only simplified literals, such as integers, characters and
strings. The literals here are not full rust syntax, which will be defined in the
unsynn-rust crate. There are Literal* for Integer, Character, String to parse simple
literals and ConstInteger<V> and ConstCharacter<V> who must match an exact character.
The later two also implement Default, thus they can be used to create constant tokens.
There is no ConstString; constant literal strings can be constructed with
IntoLiteralString<T>.
This module contains types for punctuation tokens. These are used to represent single and
multi character punctuation tokens. For single character punctuation tokens, there are
there are PunctAny, PunctAlone and PunctJoint types.
Helper macro that asserts that two entities implementing ToTokens result in the same
TokenStream. Used in tests to ensure that the output of parsing is as expected. This
macro allows two forms:
Generates a Literal from a format specification. Unlike format_literal_string!, this does not
add quotes and can be used to create any kind of literal, such as integers or floats.
unsynn provides its own quote!{} macro that translates tokens into a TokenStream while
interpolating variables prefixed with a Pound sign (#). This is similar to what the quote macro from
the quote crate does but not as powerful. There is no #(...) repetition (yet).
Getting the underlying string expensive as it always allocates a new String.
This type caches the string representation of a given entity. Note that this is
only reliable for fundamental entities that represent a single token. Spacing between
composed tokens is not stable and should be considered informal only.
This is used when one wants to parse a list of entities separated by delimiters. The
delimiter is optional and can be None eg. when the entity is the last in the
list. Usually the delimiter will be some simple punctuation token, but it is not limited
to that.
Succeeds when the next token matches T. The token will be removed from the stream but not stored.
Consequently the ToTokens implementations will panic with a message that it can not be emitted.
This can only be used when a token should be present but not stored and never emitted.
Parses a T (default: Nothing). Allows one to replace it at runtime, after parsing with
anything else implementing ToTokens. This is backed by a Rc. One can replace any
cloned occurrences or only the current one.
Succeeds when the next token does not match T. Will not consume any tokens. Usually
this has to be followed with a conjunctive match such as Cons<Except<T>, U> or followed
by another entry in a struct or tuple.
Sometimes one want to compose types or create structures for unsynn that have members that
are not part of the parsed syntax but add some additional information. This struct can be
used to hold such members while still using the Parser and ToTokens trait
implementations automatically generated by the [unsynn!{}] macro or composition syntax.
HiddenState will not consume any tokens when parsing and will not emit any tokens when
generating a TokenStream. On parsing it is initialized with a default value. It has
Deref and DerefMut implemented to access the inner value.
Parses T and concats all its elements to a single identifier by removing all characters
that are not valid in identifiers. When T implements Default, such as single string
(non group) keywords, operators and Const* literals. Then it can be used to create
IntoIdentifier on the fly. Note that construction may still fail when one tries to
create a invalid identifier such as one starting with digits for example.
Parses T and creates a LiteralString from it. When T implements Default, such as
single string (non group) keywords, operators and Const* literals. It can be used to
create IntoLiteralString on the fly.
Parses T and keeps it as opaque TokenStream. This is useful when one wants to parse a
sequence of tokens and keep it as opaque unit or re-parse it later as something else.
A Vec<T> that is filled up to the first appearance of an terminating S. This S may
be a subset of T, thus parsing become lazy. This is the same as
Cons<Vec<Cons<Except<S>,T>>,S> but more convenient and efficient.
A Vec<T> that is filled up to the first appearance of an terminating S. This S may
be a subset of T, thus parsing become lazy. Unlike LazyVec this variant does not consume
the final terminator. This is the same as Vec<Cons<Except<S>,T>>> but more convenient.
A literal string ("hello"), byte string (b"hello"), character ('a'),
byte character (b'a'), an integer or floating point number with or without
a suffix (1, 1u8, 2.3, 2.3f32).
A simple unsigned 128 bit integer. This is the most simple form to parse integers. Note
that only decimal integers without any other characters, signs or suffixes are supported,
this is not full rust syntax.
A double quoted string literal ("hello"). The quotes are included in the value. Note
that this is a simplified string literal, and only double quoted strings are supported,
this is not full rust syntax, eg. byte and C string literals are not supported.
NonEmptyOption<T> prevents Option from matching when T can succeed with empty
input. It ensures None is returned when no tokens remain, regardless of whether T
could succeed on an empty stream. This is crucial when parsing optional trailing content
that should only match if tokens are actually available to consume.
A unit that can not be parsed. This is useful as diagnostic placeholder for parsers that
are (yet) unimplemented. The nonparseable feature flag controls if Parser and ToTokens
will be implemented for it. This is useful in release builds that should not have any
NonParseable left behind.
A unit that always matches without consuming any tokens. This is required when one wants
to parse a Repeats without a delimiter. Note that using Nothing as primary entity
in a Vec, LazyVec, DelimitedVec or Repeats will result in an infinite
loop.
Operators made from up to four ASCII punctuation characters. Unused characters default to \0.
Custom operators can be defined with the crate::operator! macro. All but the last character are
Spacing::Joint. Attention must be payed when operators have the same prefix, the shorter
ones need to be tried first.
Skips over expected tokens. Will parse and consume the tokens but not store them.
Consequently the ToTokens implementations will not output any tokens.
This trait provides the user facing API to parse grammatical entities. It is implemented
for anything that implements the Parser trait. The methods here encapsulating the
iterator that is used for parsing into a transaction. This iterator is always
Clone. Instead using a peekable iterator or implementing deeper peeking, parse clones
this iterator to make access transactional, when parsing succeeds then the transaction
becomes committed, otherwise it is rolled back.
A trait for parsing a repeating T with a minimum and maximum limit.
Sometimes the number of elements to be parsed is determined at runtime eg. a number of
header items needs a matching number of values.
Helper Trait for refining error type names. Every parser type in unsynn eventually tries
to parse one of the fundamental types. When parsing fails then that fundamental type name
is recorded as expected type name of the error. Often this is not desired, a user wants to
know the type of parser that actually failed. Since we don’t want to keep a stack/vec of
errors for simplicity and performance reasons we provide a way to register refined type
names in errors. Note that this refinement should only be applied to leaves in the
AST. Refining errors on composed types will lead to unexpected results.
unsynn defines its own ToTokens trait to be able to implement it for std container types.
This is similar to the ToTokens from the quote crate but adds some extra methods and is
implemented for more types. Moreover the to_token_iter() method is the main entry point
for crating an iterator that can be used for parsing.
We track the position of the error by counting tokens. This trait is implemented for
references to shadow counted TokenIter, and usize. The later allows to pass in a
position directly or use usize::MAX in case no position data is available (which will
make this error the be the final one when upgrading).
Parse a T and replace it with its default value. This is a zero sized type. It can be
used for no allocation replacement elements in a Vec since it has a optimization for
zero-sized-types where it wont allocate any memory but just act as counter then.
DelimitedVec<T,D> with a minimum and maximum (inclusive) number of elements at first
without defaults. Parsing will succeed when at least the minimum number of elements is
reached and stop at the maximum number. The delimiter D defaults to Nothing to
parse sequences which don’t have delimiters.