Expand description
§facet-macros-parse
Parses Rust syntax for facet-macros
§Sponsors
Thanks to all individual sponsors:
…along with corporate sponsors:
…without whom this work could not exist.
§Special thanks
The facet logo was drawn by Misiasart.
§License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Modules§
- CHANGELOG
- Changelog
- combinator
- A unique feature of unsynn is that one can define a parser as a composition of other
parsers on the fly without the need to define custom structures. This is done by using the
ConsandEithertypes. TheConstype is used to define a parser that is a conjunction of two to four other parsers, while theEithertype is used to define a parser that is a disjunction of two to four other parsers. - container
- This module provides parsers for types that contain possibly multiple values. This
includes stdlib types like
Option,Vec,Box,Rc,RefCelland types for delimited and repeated values with numbered repeats. - debug
- Debug utilities for token stream inspection
- delimited
- For easier composition we define the
Delimitedtype here which is aTfollowed by a optional delimiting entityD. This is used by theDelimitedVectype to parse a list of entities separated by a delimiter. - dynamic
- This module contains the types for dynamic transformations after parsing.
- expressions
- Expression parser building blocks for creating operator precedence parsers.
- fundamental
- This module contains the fundamental parsers. These are the basic tokens from
proc_macro2/proc_macroand a few other ones defined by unsynn. These are the terminal entities when parsing tokens. Being able to parseTokenTreeandTokenStreamallows one to parse opaque entities where internal details are left out. TheCachedtype is used to cache the string representation of the parsed entity. TheNothingtype is used to match without consuming any tokens. TheExcepttype is used to match when the next token does not match the given type. TheEndOfStreamtype is used to match the end of the stream when no tokens are left. TheHiddenStatetype is used to hold additional information that is not part of the parsed syntax. - group
- Groups are a way to group tokens together. They are used to represent the contents between
(),{},[]or no delimiters at all. This module provides parser implementations for opaque group types with defined delimiters and theGroupContainingtypes that parses the surrounding delimiters and content of a group type. - literal
- This module provides a set of literal types that can be used to parse and tokenize
literals. The literals are parsed from the token stream and can be used to represent the
parsed value. unsynn defines only simplified literals, such as integers, characters and
strings. The literals here are not full rust syntax, which will be defined in the
unsynn-rustcrate. There areLiteral*forInteger, Character, Stringto parse simple literals andConstInteger<V>andConstCharacter<V>who must match an exact character. The later two also implementDefault, thus they can be used to create constant tokens. There is noConstString; constant literal strings can be constructed withIntoLiteralString<T>. - operator
- Combined punctuation tokens are represented by
Operator. Thecrate::operator!macro can be used to define custom operators. - predicates
- Parse predicates for compile-time parser control.
- punct
- This module contains types for punctuation tokens. These are used to represent single and
multi character punctuation tokens. For single character punctuation tokens, there are
there are
PunctAny,PunctAloneandPunctJointtypes. - rust_
types - Parsers for rusts types.
- transform
- This module contains the transforming parsers. This are the parsers that add, remove, replace or reorder Tokens while parsing.
Macros§
- assert_
tokens_ eq - Helper macro that asserts that two entities implementing
ToTokensresult in the sameTokenStream. Used in tests to ensure that the output of parsing is as expected. This macro allows two forms: - format_
cached_ ident - Generates a
CachedIdentfrom a format specification. - format_
ident - Generates a
Identfrom a format specification. - format_
literal - Generates a
Literalfrom a format specification. Unlikeformat_literal_string!, this does not add quotes and can be used to create any kind of literal, such as integers or floats. - format_
literal_ string - Generates a
LiteralStringfrom a format specification. Quote characters around the string are automatically added. - operator
- Define types matching operators (punctuation sequences).
- quote
- unsynn provides its own
quote!{}macro that translates tokens into aTokenStreamwhile interpolating variables prefixed with aPoundsign (#). This is similar to what the quote macro from the quote crate does but not as powerful. There is no#(...)repetition (yet).
Structs§
- AllOf
- Logical AND: 2-4 predicates must all succeed.
- Angle
Token Tree - Parses either a
TokenTreeor<...>grouping (which is not aGroupas far as proc-macros are concerned). - AnyOf
- Logical OR: 2-4 predicates, at least one must succeed.
- Attribute
- Represents an attribute annotation on a field, typically in the form
#[attr]. - Brace
Group - A opaque group of tokens within a Brace
- Brace
Group Containing - Parseable content within a Brace
- Bracket
Group - A opaque group of tokens within a Bracket
- Bracket
Group Containing - Parseable content within a Bracket
- Cached
- Getting the underlying string expensive as it always allocates a new
String. This type caches the string representation of a given entity. Note that this is only reliable for fundamental entities that represent a single token. Spacing between composed tokens is not stable and should be considered informal only. - Child
Inner - Inner value for #[facet(child)]
- Cons
- Conjunctive
Afollowed byBand optionalCandDWhenCandDare not used, they are set toNothing. - Const
Character - A constant
charof valueV. Must match V and also hasDefaultimplemented to create aLiteralCharacterwith valueV. - Const
Integer - A constant
u128integer of valueV. Must match V and also hasDefaultimplemented to create aLiteralIntegerwith valueV. - Default
Equals Inner - Inner value for #[facet(default = …)]
- Delimited
- This is used when one wants to parse a list of entities separated by delimiters. The
delimiter is optional and can be
Noneeg. when the entity is the last in the list. Usually the delimiter will be some simple punctuation token, but it is not limited to that. - Delimited
Vec - Since the delimiter in
Delimited<T,D>is optional aVec<Delimited<T,D>>would parse consecutive values even without delimiters.DelimitedVec<T, D, MIN, MAX, P>will stop parsing by MIN/MAX number of elements and depending on the policy defined byPwhich can be one ofTrailingDelimiter. - Deserialize
With Inner - Inner value for #[facet(deserialize_with = …)]
- Disable
- Always fails without consuming tokens.
- Discard
- Succeeds when the next token matches
T. The token will be removed from the stream but not stored. Consequently theToTokensimplementations will panic with a message that it can not be emitted. This can only be used when a token should be present but not stored and never emitted. - DocInner
- Represents documentation for an item.
- DynNode
- Parses a
T(default:Nothing). Allows one to replace it at runtime, after parsing with anything else implementingToTokens. This is backed by aRc. One can replace any cloned occurrences or only the current one. - Enable
- Always succeeds without consuming tokens.
- EndOf
Stream - Matches the end of the stream when no tokens are left.
- Enum
- Represents an enum definition.
e.g.,
#[repr(u8)] pub enum MyEnum<T> where T: Clone { Variant1, Variant2(T) }. - Enum
Variant Like - Represents a variant of an enum, including the optional discriminant value
- Error
- Error type for parsing.
- Except
- Succeeds when the next token does not match
T. Will not consume any tokens. Usually this has to be followed with a conjunctive match such asCons<Except<T>, U>or followed by another entry in a struct or tuple. - Expect
- Succeeds when the next token would match
T. Will not consume any tokens. This is similar to peeking. - Facet
Attr - Represents a facet attribute that can contain specialized metadata.
- Flatten
Inner - Inner value for #[facet(flatten)]
- Generic
Params - Represents the generic parameters of a struct or enum definition, enclosed in angle brackets.
e.g.,
<'a, T: Trait, const N: usize>. - Group
- A delimited token stream.
- Group
Containing - Any kind of Group
Gwith parseable contentC. The contentCmust parse exhaustive, anEndOfStreamis automatically implied. - Hidden
State - Sometimes one want to compose types or create structures for unsynn that have members that
are not part of the parsed syntax but add some additional information. This struct can be
used to hold such members while still using the
ParserandToTokenstrait implementations automatically generated by the [unsynn!{}] macro or composition syntax.HiddenStatewill not consume any tokens when parsing and will not emit any tokens when generating aTokenStream. On parsing it is initialized with a default value. It hasDerefandDerefMutimplemented to access the inner value. - Ident
- A word of Rust code, which may be a keyword or legal variable name.
- Insert
- Injects tokens without parsing anything.
- Into
Ident - Parses
Tand concats all its elements to a single identifier by removing all characters that are not valid in identifiers. WhenTimplementsDefault, such as single string (non group) keywords, operators andConst*literals. Then it can be used to createIntoIdentifieron the fly. Note that construction may still fail when one tries to create a invalid identifier such as one starting with digits for example. - Into
Literal String - Parses
Tand creates aLiteralStringfrom it. WhenTimplementsDefault, such as single string (non group) keywords, operators andConst*literals. It can be used to createIntoLiteralStringon the fly. - Into
Token Stream - Parses
Tand keeps it as opaqueTokenStream. This is useful when one wants to parse a sequence of tokens and keep it as opaque unit or re-parse it later as something else. - Invalid
- A unit that always fails to match. This is useful as default for generics.
See how
Either<A, B, C, D>uses this for unused alternatives. - Invariant
Inner - Represents invariants for a type.
- KChild
- The “child” keyword
Matches:
child, - KConst
- The “const” keyword.
Matches:
const, - KCrate
- The “crate” keyword.
Matches:
crate, - KDefault
- The “default” keyword.
Matches:
default, - KDeny
Unknown Fields - The “deny_unknown_fields” keyword.
Matches:
deny_unknown_fields, - KDeserialize
With - The “deserialize_with” keyword.
Matches:
deserialize_with, - KDoc
- The “doc” keyword.
Matches:
doc, - KEnum
- The “enum” keyword.
Matches:
enum, - KFacet
- The “facet” keyword.
Matches:
facet, - KFlatten
- The “flatten” keyword
Matches:
flatten, - KIn
- The “in” keyword.
Matches:
in, - KInvariants
- The “invariants” keyword.
Matches:
invariants, - KMut
- The “mut” keyword.
Matches:
mut, - KOpaque
- The “opaque” keyword.
Matches:
opaque, - KPub
- The “pub” keyword.
Matches:
pub, - KRename
- The “rename” keyword.
Matches:
rename, - KRename
All - The “rename_all” keyword.
Matches:
rename_all, - KRepr
- The “repr” keyword.
Matches:
repr, - KSensitive
- The “sensitive” keyword.
Matches:
sensitive, - KSerialize
With - The “serialize_with” keyword.
Matches:
serialize_with, - KSkip
Serializing - The “skip_serializing” keyword.
Matches:
skip_serializing, - KSkip
Serializing If - The “skip_serializing_if” keyword.
Matches:
skip_serializing_if, - KStruct
- The “struct” keyword.
Matches:
struct, - KTransparent
- The “transparent” keyword.
Matches:
transparent, - KType
Tag - The “type_tag” keyword.
Matches:
type_tag, - KWhere
- The “where” keyword.
Matches:
where, - LazyVec
- A
Vec<T>that is filled up to the first appearance of an terminatingS. ThisSmay be a subset ofT, thus parsing become lazy. This is the same asCons<Vec<Cons<Except<S>,T>>,S>but more convenient and efficient. - Lazy
VecUntil - A
Vec<T>that is filled up to the first appearance of an terminatingS. ThisSmay be a subset ofT, thus parsing become lazy. UnlikeLazyVecthis variant does not consume the final terminator. This is the same asVec<Cons<Except<S>,T>>>but more convenient. - Left
Assoc Expr - Left-associative infix operator expression.
- Lifetime
- Represents a lifetime annotation, like
'a. - Literal
- A literal string (
"hello"), byte string (b"hello"), character ('a'), byte character (b'a'), an integer or floating point number with or without a suffix (1,1u8,2.3,2.3f32). - Literal
Character - A single quoted character literal (
'x'). - Literal
Integer - A simple unsigned 128 bit integer. This is the most simple form to parse integers. Note that only decimal integers without any other characters, signs or suffixes are supported, this is not full rust syntax.
- Literal
String - A double quoted string literal (
"hello"). The quotes are included in the value. Note that this is a simplified string literal, and only double quoted strings are supported, this is not full rust syntax, eg. byte and C string literals are not supported. - NonEmpty
Option NonEmptyOption<T>preventsOptionfrom matching whenTcan succeed with empty input. It ensuresNoneis returned when no tokens remain, regardless of whetherTcould succeed on an empty stream. This is crucial when parsing optional trailing content that should only match if tokens are actually available to consume.- NonEmpty
Token Stream - Since parsing a
TokenStreamsucceeds even when no tokens are left, this type is used to parse aTokenStreamthat is not empty. - NonParseable
- A unit that can not be parsed. This is useful as diagnostic placeholder for parsers that
are (yet) unimplemented. The
nonparseablefeature flag controls ifParserandToTokenswill be implemented for it. This is useful in release builds that should not have anyNonParseableleft behind. - None
Group - A opaque group of tokens within a None
- None
Group Containing - Parseable content within a None
- Not
- Logical NOT: succeeds if inner predicate fails.
- Nothing
- A unit that always matches without consuming any tokens. This is required when one wants
to parse a
Repeatswithout a delimiter. Note that usingNothingas primary entity in aVec,LazyVec,DelimitedVecorRepeatswill result in an infinite loop. - OneOf
- Logical XOR: exactly one of 2-4 predicates must succeed.
- Operator
- Operators made from up to four ASCII punctuation characters. Unused characters default to
\0. Custom operators can be defined with thecrate::operator!macro. All but the last character areSpacing::Joint. Attention must be payed when operators have the same prefix, the shorter ones need to be tried first. - Parenthesis
Group - A opaque group of tokens within a Parenthesis
- Parenthesis
Group Containing - Parseable content within a Parenthesis
- Postfix
Expr - Postfix unary operator expression.
- Predicate
Cmp - Predicate that compares type
Awith typeBat runtime. - Prefix
Expr - Prefix unary operator expression.
- Punct
- A
Punctis a single punctuation character like+,-or#. - Punct
Alone - A single character punctuation token which is not followed by another punctuation character.
- Punct
Any - A single character punctuation token with any kind of
Spacing, - Punct
Joint - A single character punctuation token where the lexer joined it with the next
Punctor a single quote followed by a identifier (rust lifetime). - Rename
AllInner - Inner value for #[facet(rename_all = …)]
- Rename
Inner - Inner value for #[facet(rename = …)]
- Repr
Inner - Represents the inner content of a
reprattribute, typically used for specifying memory layout or representation hints. - Right
Assoc Expr - Right-associative infix operator expression.
- Serialize
With Inner - Inner value for #[facet(serialize_with = …)]
- Skip
- Skips over expected tokens. Will parse and consume the tokens but not store them.
Consequently the
ToTokensimplementations will not output any tokens. - Skip
Serializing IfInner - Inner value for #[facet(skip_serializing_if = …)]
- Skip
Serializing Inner - Inner value for #[facet(skip_serializing)]
- Span
- A region of source code, along with macro expansion information.
- Stderr
Log - A no-op debug parser that does nothing when the
debug_grammarfeature is disabled. - Struct
- Represents a struct definition.
- Struct
Enum Variant - Represents a struct-like enum variant.
e.g.,
MyVariant { field1: u32, field2: String }. - Struct
Field - Represents a field within a regular struct definition.
e.g.,
pub name: String. - Swap
- Swaps the order of two entities.
- Token
Iter - Iterator type for parsing token streams.
- Token
Stream - An abstract stream of tokens, or more concretely a sequence of token trees.
- Tokens
Remain - Succeeds only when tokens remain in the stream.
- Tuple
Field - Represents a field within a tuple struct definition.
e.g.,
pub String. - Tuple
Variant - Represents a tuple-like enum variant.
e.g.,
MyVariant(u32, String). - Type
TagInner - Inner value for #[facet(type_tag = …)]
- Unit
Variant - Represents a unit-like enum variant.
e.g.,
MyVariant. - Verbatim
Display - Display the verbatim tokens until the given token.
- Where
Clause - Represents a single predicate within a
whereclause. e.g.,T: Traitor'a: 'b. - Where
Clauses - Represents a
whereclause attached to a definition. e.g.,where T: Trait, 'a: 'b.
Enums§
- AdtDecl
- Represents an algebraic data type (ADT) declaration, which can be either a struct or enum.
- Attribute
Inner - Represents the inner content of an attribute annotation.
- Const
OrMut - Represents either the
constormutkeyword, often used with pointers. - Delimiter
- Describes how a sequence of token trees is delimited.
- Either
- Disjunctive
AorBor optionalCorDtried in that order. WhenCandDare not used, they are set toInvalid. - Enum
Variant Data - Represents the different kinds of variants an enum can have.
- Error
Kind - Actual kind of an error.
- Expr
- Represents a simple expression, currently only integer literals. Used potentially for const generic default values.
- Facet
Inner - Represents the inner content of a facet attribute.
- Generic
Param - Represents a single generic parameter within a
GenericParamslist. - Lifetime
OrTt - A lifetime or a tokentree, used to gather lifetimes in type definitions
- Spacing
- Whether a
Punctis followed immediately by anotherPunctor followed by another token or whitespace. - Struct
Kind - Represents the kind of a struct definition.
- Token
Tree - A single token or a delimited sequence of token trees (e.g.
[1, (), ..]). - Vis
- Represents visibility modifiers for items.
Traits§
- Dynamic
Tokens - Trait alias for any type that can be used in dynamic
ToTokenscontexts. - Group
Delimiter - Access to the surrounding
Delimiterof aGroupContainingand its variants. - IParse
- Extension trait for
TokenIterthat callsParse::parse(). - Into
Token Iter - Extension trait to convert iterators into
TokenIter. - Parse
- This trait provides the user facing API to parse grammatical entities. It is implemented
for anything that implements the
Parsertrait. The methods here encapsulating the iterator that is used for parsing into a transaction. This iterator is alwaysClone. Instead using a peekable iterator or implementing deeper peeking, parse clones this iterator to make access transactional, when parsing succeeds then the transaction becomes committed, otherwise it is rolled back. - Parser
- The
Parsertrait that must be implemented by anything we want to parse. We are parsing over aTokenIter(TokenStreamiterator). - Predicate
Op - Marker trait for compile-time parser predicates.
- Ranged
Repeats - A trait for parsing a repeating
Twith a minimum and maximum limit. Sometimes the number of elements to be parsed is determined at runtime eg. a number of header items needs a matching number of values. - Refine
Err - Helper Trait for refining error type names. Every parser type in unsynn eventually tries to parse one of the fundamental types. When parsing fails then that fundamental type name is recorded as expected type name of the error. Often this is not desired, a user wants to know the type of parser that actually failed. Since we don’t want to keep a stack/vec of errors for simplicity and performance reasons we provide a way to register refined type names in errors. Note that this refinement should only be applied to leaves in the AST. Refining errors on composed types will lead to unexpected results.
- ToToken
Iter - Extension trait to convert
TokenStreamsintoTokenIter. - ToTokens
- unsynn defines its own
ToTokenstrait to be able to implement it for std container types. This is similar to theToTokensfrom the quote crate but adds some extra methods and is implemented for more types. Moreover theto_token_iter()method is the main entry point for crating an iterator that can be used for parsing. - Token
Count - We track the position of the error by counting tokens. This trait is implemented for
references to shadow counted
TokenIter, andusize. The later allows to pass in a position directly or useusize::MAXin case no position data is available (which will make this error the be the final one when upgrading). - Transaction
- Helper trait to make
TokenItertransactional
Type Aliases§
- And
&- AndAnd
&&- AndEq
&=- Any
- Any number of T delimited by D or
Nothing - Apostrophe
- Represents the apostrophe ‘'’ operator.
' - AsDefault
- Parse a
Tand replace it with its default value. This is a zero sized type. It can be used for no allocation replacement elements in aVecsince it has a optimization for zero-sized-types where it wont allocate any memory but just act as counter then. - Assign
=- At
@- AtLeast
- At least N of T delimited by D or
Nothing - AtMost
- At most N of T delimited by D or
Nothing - Backslash
\- Bang
!- Bounds
- Represents type bounds, consisting of a colon followed by tokens until a comma, equals sign, or closing angle bracket is encountered.
- Cached
Group Groupwith cached string representation.- Cached
Ident Identwith cached string representation.- Cached
Literal Literalwith cached string representation.- Cached
Literal Integer LiteralIntegerwith cached string representation.- Cached
Literal String LiteralStringwith cached string representation.- Cached
Punct Punctwith cached string representation.- Cached
Token Tree TokenTree(any token) with cached string representation.- Caret
^- CaretEq
^=- Colon
:- Colon
Delimited Tfollowed by an optional:- Colon
Delimited Vec DelimitedVecofTdelimited by:withPas policy for the last delimiter.- Comma
,- Comma
Delimited Tfollowed by an optional,- Comma
Delimited Vec DelimitedVecofTdelimited by,withPas policy for the last delimiter.- Dollar
$- Dot
.- DotDelimited
Tfollowed by an optional.- DotDelimited
Vec DelimitedVecofTdelimited by.withPas policy for the last delimiter.- DotDot
..- DotDot
Eq ..=- Double
Semicolon - Represents the double semicolon ‘::’ operator.
:: - Ellipsis
...- Eq
- Represents the ‘=’ operator.
= - Equal
==- Exactly
- Exactly N of T delimited by D or
Nothing - FatArrow
=>- Ge
>=- Gt
>- Infix
Expr - Generic infix operator expression.
- LArrow
<-- Le
<=- Lifetime
Tick 'WithSpacing::Joint- Lt
<- Many
- One or more of T delimited by D or
Nothing - Minus
-- MinusEq
-=- ModPath
- Represents a module path, consisting of an optional path separator followed by a path-separator-delimited sequence of identifiers.
- NonAssoc
Expr - Type alias for non-associative binary operators.
- NotEqual
!=- Optional
- Zero or one of T delimited by D or
Nothing - Or
|- OrDefault
- Tries to parse a
Tor inserts aDwhen that fails. - OrEq
|=- OrOr
||- PathSep
::- Path
SepDelimited Tfollowed by an optional::- Path
SepDelimited Vec DelimitedVecofTdelimited by::withPas policy for the last delimiter.- Percent
%- Percent
Eq %=- Plus
+- PlusEq
+=- Pound
#- Question
?- RArrow
->- Repeats
DelimitedVec<T,D>with a minimum and maximum (inclusive) number of elements at first without defaults. Parsing will succeed when at least the minimum number of elements is reached and stop at the maximum number. The delimiterDdefaults toNothingto parse sequences which don’t have delimiters.- Replace
- Parse-skip a
Tand inserts aU: Defaultin place. This is a zero sized type. - Result
- Result type for parsing.
- Semi
- Represents the ‘;’ operator.
; - Semicolon
;- Semicolon
Delimited Tfollowed by an optional;- Semicolon
Delimited Vec DelimitedVecofTdelimited by;withPas policy for the last delimiter.- Shl
<<- ShlEq
<<=- Shr
>>- ShrEq
>>=- Slash
/- SlashEq
/=- Star
*- StarEq
*=- Tilde
~- Token
Stream Until - Parses a
TokenStreamuntil, but excludingT. The presence ofTis mandatory. - Verbatim
Until - Parses tokens and groups until
Cis found on the current token tree level.