Expand description

Logo by Misiasart
Thanks to all individual and corporate sponsors, without whom this work could not exist:
Parses Rust syntax for facet-macros
§License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Modules§
- combinator
- A unique feature of unsynn is that one can define a parser as a composition of other
parsers on the fly without the need to define custom structures. This is done by using the
Cons
andEither
types. TheCons
type is used to define a parser that is a conjunction of two to four other parsers, while theEither
type is used to define a parser that is a disjunction of two to four other parsers. - container
- This module provides parsers for types that contain possibly multiple values. This
includes stdlib types like
Option
,Vec
,Box
,Rc
,RefCell
and types for delimited and repeated values with numbered repeats. - delimited
- For easier composition we define the
Delimited
type here which is aT
followed by a optional delimiting entityD
. This is used by theDelimitedVec
type to parse a list of entities separated by a delimiter. - fundamental
- This module contains the fundamental parsers. These parsers are the basic tokens from
proc_macro2
and a few other ones defined by unsynn. These are the terminal entities when parsing tokens. Being able to parseTokenTree
andTokenStream
allows one to parse opaque entities where internal details are left out. TheCached
type is used to cache the string representation of the parsed entity. TheNothing
type is used to match without consuming any tokens. TheExcept
type is used to match when the next token does not match the given type. TheEndOfStream
type is used to match the end of the stream when no tokens are left. TheHiddenState
type is used to hold additional information that is not part of the parsed syntax. - group
- Groups are a way to group tokens together. They are used to represent the contents between
()
,{}
,[]
or no delimiters at all. This module provides parser implementations for opaque group types with defined delimiters and theGroupContaining
types that parses the surrounding delimiters and content of a group type. - literal
- This module provides a set of literal types that can be used to parse and tokenize
literals. The literals are parsed from the token stream and can be used to represent the
parsed value. unsynn defines only simplified literals, such as integers, characters and
strings. The literals here are not full rust syntax, which will be defined in the
unsynn-rust
crate. - operator
- Combined punctuation tokens are represented by
Operator
. Thecrate::operator!
macro can be used to define custom operators. - punct
- This module contains types for punctuation tokens. These are used to represent single and
multi character punctuation tokens. For single character punctuation tokens, there are
there are
PunctAny
,PunctAlone
andPunctJoint
types. - rust_
types - Parsers for rusts types.
Macros§
- operator
- Define types matching operators (punctuation sequences).
Structs§
- Angle
Token Tree - Parses either a
TokenTree
or<...>
grouping (which is not aGroup
as far as proc-macros are concerned). - Attribute
- Represents an attribute annotation on a field, typically in the form
#[attr]
. - Brace
Group - A opaque group of tokens within a Brace
- Brace
Group Containing - Parseable content within a Brace
- Bracket
Group - A opaque group of tokens within a Bracket
- Bracket
Group Containing - Parseable content within a Bracket
- Cached
- Getting the underlying string expensive as it always allocates a new
String
. This type caches the string representation of a given entity. Note that this is only reliable for fundamental entities that represent a single token. Spacing between composed tokens is not stable and should be considered informal only. - Child
Inner - Inner value for #[facet(child)]
- Cons
- Conjunctive
A
followed byB
and optionalC
andD
WhenC
andD
are not used, they are set toNothing
. - Default
Equals Inner - Inner value for #[facet(default = …)]
- Delimited
- This is used when one wants to parse a list of entities separated by delimiters. The
delimiter is optional and can be
None
eg. when the entity is the last in the list. Usually the delimiter will be some simple punctuation token, but it is not limited to that. - Delimited
Vec - Since the delimiter in
Delimited<T,D>
is optional aVec<Delimited<T,D>>
would parse consecutive values even without delimiters.DelimitedVec<T,D>
will stop parsing after the first value without a delimiter. - Discard
- Succeeds when the next token matches
T
. The token will be removed from the stream but not stored. Consequently theToTokens
implementations will panic with a message that it can not be emitted. This can only be used when a token should be present but not stored and never emitted. - DocInner
- Represents documentation for an item.
- EndOf
Stream - Matches the end of the stream when no tokens are left.
- Enum
- Represents an enum definition.
e.g.,
#[repr(u8)] pub enum MyEnum<T> where T: Clone { Variant1, Variant2(T) }
. - Enum
Variant Like - Represents a variant of an enum, including the optional discriminant value
- Error
- Error type for parsing.
- Except
- Succeeds when the next token does not match
T
. Will not consume any tokens. - Expect
- Succeeds when the next token would match
T
. Will not consume any tokens. This is similar to peeking. - Facet
Attr - Represents a facet attribute that can contain specialized metadata.
- Flatten
Inner - Inner value for #[facet(flatten)]
- Generic
Params - Represents the generic parameters of a struct or enum definition, enclosed in angle brackets.
e.g.,
<'a, T: Trait, const N: usize>
. - Group
- A delimited token stream.
- Group
Containing - Any kind of Group
G
with parseable contentC
. The contentC
must parse exhaustive, aEndOfStream
is automatically implied. - Hidden
State - Sometimes one want to compose types or create structures for unsynn that have members that
are not part of the parsed syntax but add some additional information. This struct can be
used to hold such members while still using the
Parser
andToTokens
trait implementations automatically generated by the [unsynn!{}
] macro or composition syntax.HiddenState
will not consume any tokens when parsing and will not emit any tokens when generating aTokenStream
. On parsing it is initialized with a default value. It hasDeref
andDerefMut
implemented to access the inner value. - Ident
- A word of Rust code, which may be a keyword or legal variable name.
- Invalid
- A unit that always fails to match. This is useful as default for generics.
See how
Either<A, B, C, D>
uses this for unused alternatives. - Invariant
Inner - Represents invariants for a type.
- KChild
- The “child” keyword
Matches:
child
, - KConst
- The “const” keyword.
Matches:
const
, - KCrate
- The “crate” keyword.
Matches:
crate
, - KDefault
- The “default” keyword.
Matches:
default
, - KDeny
Unknown Fields - The “deny_unknown_fields” keyword.
Matches:
deny_unknown_fields
, - KDoc
- The “doc” keyword.
Matches:
doc
, - KEnum
- The “enum” keyword.
Matches:
enum
, - KFacet
- The “facet” keyword.
Matches:
facet
, - KFlatten
- The “flatten” keyword
Matches:
flatten
, - KIn
- The “in” keyword.
Matches:
in
, - KInvariants
- The “invariants” keyword.
Matches:
invariants
, - KMut
- The “mut” keyword.
Matches:
mut
, - KOpaque
- The “opaque” keyword.
Matches:
opaque
, - KPub
- The “pub” keyword.
Matches:
pub
, - KRename
- The “rename” keyword.
Matches:
rename
, - KRename
All - The “rename_all” keyword.
Matches:
rename_all
, - KRepr
- The “repr” keyword.
Matches:
repr
, - KSensitive
- The “sensitive” keyword.
Matches:
sensitive
, - KSkip
Serializing - The “skip_serializing” keyword.
Matches:
skip_serializing
, - KSkip
Serializing If - The “skip_serializing_if” keyword.
Matches:
skip_serializing_if
, - KStruct
- The “struct” keyword.
Matches:
struct
, - KTransparent
- The “transparent” keyword.
Matches:
transparent
, - KType
Tag - The “type_tag” keyword.
Matches:
type_tag
, - KWhere
- The “where” keyword.
Matches:
where
, - LazyVec
- A
Vec<T>
that is filled up to the first appearance of an terminatingS
. ThisS
may be a subset ofT
, thus parsing become lazy. This is the same asCons<Vec<Cons<Except<S>,T>>,S>
but more convenient and efficient. - Lifetime
- Represents a lifetime annotation, like
'a
. - Literal
- A literal string (
"hello"
), byte string (b"hello"
), character ('a'
), byte character (b'a'
), an integer or floating point number with or without a suffix (1
,1u8
,2.3
,2.3f32
). - Literal
Character - A single quoted character literal (
'x'
). - Literal
Integer - A simple unsigned 128 bit integer. This is the most simple form to parse integers. Note that only decimal integers without any other characters, signs or suffixes are supported, this is not full rust syntax.
- Literal
String - A double quoted string literal (
"hello"
). The quotes are included in the value. Note that this is a simplified string literal, and only double quoted strings are supported, this is not full rust syntax, eg. byte and C string literals are not supported. - NonEmpty
Token Stream - Since parsing a
TokenStream
succeeds even when no tokens are left, this type is used to parse aTokenStream
that is not empty. - None
Group - A opaque group of tokens within a None
- None
Group Containing - Parseable content within a None
- Nothing
- A unit that always matches without consuming any tokens. This is required when one wants
to parse a
Repeats
without a delimiter. Note that usingNothing
as primary entity in aVec
,LazyVec
,DelimitedVec
orRepeats
will result in an infinite loop. - Operator
- Operators made from up to four ASCII punctuation characters. Unused characters default to
\0
. Custom operators can be defined with thecrate::operator!
macro. All but the last character areSpacing::Joint
. Attention must be payed when operators have the same prefix, the shorter ones need to be tried first. - Parenthesis
Group - A opaque group of tokens within a Parenthesis
- Parenthesis
Group Containing - Parseable content within a Parenthesis
- Punct
- A
Punct
is a single punctuation character like+
,-
or#
. - Punct
Alone - A single character punctuation token which is not followed by another punctuation character.
- Punct
Any - A single character punctuation token with any kind of
Spacing
, - Punct
Joint - A single character punctuation token where the lexer joined it with the next
Punct
or a single quote followed by a identifier (rust lifetime). - Rename
AllInner - Inner value for #[facet(rename_all = …)]
- Rename
Inner - Inner value for #[facet(rename = …)]
- Repeats
- Like
DelimitedVec<T,D>
but with a minimum and maximum (inclusive) number of elements. Parsing will succeed when at least the minimum number of elements is reached and stop at the maximum number. The delimiterD
defaults toNothing
to parse sequences which don’t have delimiters. - Repr
Inner - Represents the inner content of a
repr
attribute, typically used for specifying memory layout or representation hints. - Skip
- Skips over expected tokens. Will parse and consume the tokens but not store them.
Consequently the
ToTokens
implementations will not output any tokens. - Skip
Serializing IfInner - Inner value for #[facet(skip_serializing_if = …)]
- Skip
Serializing Inner - Inner value for #[facet(skip_serializing)]
- Span
- A region of source code, along with macro expansion information.
- Struct
- Represents a struct definition.
- Struct
Enum Variant - Represents a struct-like enum variant.
e.g.,
MyVariant { field1: u32, field2: String }
. - Struct
Field - Represents a field within a regular struct definition.
e.g.,
pub name: String
. - Token
Stream - An abstract stream of tokens, or more concretely a sequence of token trees.
- Tuple
Field - Represents a field within a tuple struct definition.
e.g.,
pub String
. - Tuple
Variant - Represents a tuple-like enum variant.
e.g.,
MyVariant(u32, String)
. - Type
TagInner - Inner value for #[facet(type_tag = …)]
- Unit
Variant - Represents a unit-like enum variant.
e.g.,
MyVariant
. - Verbatim
Display - Display the verbatim tokens until the given token.
- Where
Clause - Represents a single predicate within a
where
clause. e.g.,T: Trait
or'a: 'b
. - Where
Clauses - Represents a
where
clause attached to a definition. e.g.,where T: Trait, 'a: 'b
.
Enums§
- AdtDecl
- Represents an algebraic data type (ADT) declaration, which can be either a struct or enum.
- Attribute
Inner - Represents the inner content of an attribute annotation.
- Const
OrMut - Represents either the
const
ormut
keyword, often used with pointers. - Delimiter
- Describes how a sequence of token trees is delimited.
- Either
- Disjunctive
A
orB
or optionalC
orD
tried in that order. WhenC
andD
are not used, they are set toInvalid
. - Enum
Variant Data - Represents the different kinds of variants an enum can have.
- Error
Kind - Actual kind of an error.
- Expr
- Represents a simple expression, currently only integer literals. Used potentially for const generic default values.
- Facet
Inner - Represents the inner content of a facet attribute.
- Generic
Param - Represents a single generic parameter within a
GenericParams
list. - Lifetime
OrTt - A lifetime or a tokentree, used to gather lifetimes in type definitions
- Spacing
- Whether a
Punct
is followed immediately by anotherPunct
or followed by another token or whitespace. - Struct
Kind - Represents the kind of a struct definition.
- Token
Tree - A single token or a delimited sequence of token trees (e.g.
[1, (), ..]
). - Vis
- Represents visibility modifiers for items.
Traits§
- Group
Delimiter - Access to the surrounding
Delimiter
of aGroupContaining
and its variants. - IParse
- Extension trait for
TokenIter
that callsParse::parse()
. - Parse
- This trait provides the user facing API to parse grammatical entities. It is implemented
for anything that implements the
Parser
trait. The methods here encapsulating the iterator that is used for parsing into a transaction. This iterator is alwaysCopy
. Instead using a peekable iterator or implementing deeper peeking, parse clones this iterator to make access transactional, when parsing succeeds then the transaction becomes committed, otherwise it is rolled back. - Parser
- The
Parser
trait that must be implemented by anything we want to parse. We are parsing over aTokenIter
(proc_macro2::TokenStream
iterator). - Ranged
Repeats - A trait for parsing a repeating
T
with a minimum and maximum limit. Sometimes the number of elements to be parsed is determined at runtime eg. a number of header items needs a matching number of values. - ToTokens
- unsynn defines its own
ToTokens
trait to be able to implement it for std container types. This is similar to theToTokens
from the quote crate but adds some extra methods and is implemented for more types. Moreover theto_token_iter()
method is the main entry point for crating an iterator that can be used for parsing. - Token
Count - We track the position of the error by counting tokens. This trait is implemented for
references to shadow counted
TokenIter
, andusize
. The later allows to pass in a position directly or useusize::MAX
in case no position data is available (which will make this error the be the final one when upgrading). - Transaction
- Helper trait to make
TokenIter
transactional
Type Aliases§
- And
&
- AndAnd
&&
- AndEq
&=
- Any
- Any number of T delimited by D or
Nothing
- Apostrophe
- Represents the apostrophe ‘'’ operator.
'
- Assign
=
- At
@
- AtLeast
- At least N of T delimited by D or
Nothing
- AtMost
- At most N of T delimited by D or
Nothing
- Backslash
\
- Bang
!
- Bounds
- Represents type bounds, consisting of a colon followed by tokens until a comma, equals sign, or closing angle bracket is encountered.
- Cached
Group Group
with cached string representation.- Cached
Ident Ident
with cached string representation.- Cached
Literal Literal
with cached string representation.- Cached
Punct Punct
with cached string representation.- Cached
Token Tree TokenTree
(any token) with cached string representation.- Caret
^
- CaretEq
^=
- Colon
:
- Colon
Delimited T
followed by an optional:
- Colon
Delimited Vec - Vector of
T
delimited by:
- Comma
,
- Comma
Delimited T
followed by an optional,
- Comma
Delimited Vec - Vector of
T
delimited by,
- Dollar
$
- Dot
.
- DotDelimited
T
followed by an optional.
- DotDelimited
Vec - Vector of
T
delimited by.
- DotDot
..
- DotDot
Eq ..=
- Double
Semicolon - Represents the double semicolon ‘::’ operator.
::
- Ellipsis
...
- Eq
- Represents the ‘=’ operator.
=
- Equal
==
- Exactly
- Exactly N of T delimited by D or
Nothing
- FatArrow
=>
- Ge
>=
- Gt
>
- LArrow
<-
- Le
<=
- Lifetime
Tick '
WithSpacing::Joint
- Lt
<
- Many
- One or more of T delimited by D or
Nothing
- Minus
-
- MinusEq
-=
- ModPath
- Represents a module path, consisting of an optional path separator followed by a path-separator-delimited sequence of identifiers.
- NotEqual
!=
- Optional
- Zero or one of T delimited by D or
Nothing
- Or
|
- OrEq
|=
- OrOr
||
- PathSep
::
- Path
SepDelimited T
followed by an optional::
- Path
SepDelimited Vec - Vector of
T
delimited by::
- Percent
%
- Percent
Eq %=
- Plus
+
- PlusEq
+=
- Pound
#
- Question
?
- RArrow
->
- Result
- Result type for parsing.
- Semi
- Represents the ‘;’ operator.
;
- Semicolon
;
- Semicolon
Delimited T
followed by an optional;
- Semicolon
Delimited Vec - Vector of
T
delimited by;
- Shl
<<
- ShlEq
<<=
- Shr
>>
- ShrEq
>>=
- Slash
/
- SlashEq
/=
- Star
*
- StarEq
*=
- Tilde
~
- Token
Iter - Type alias for the iterator type we use for parsing. This Iterator is Clone and produces
&TokenTree
. The shadow counter counts tokens in the background to track progress which is used to keep the error that made the most progress in disjunctive parsers. - Underscore
_
- Verbatim
Until - Parses tokens and groups until
C
is found on the current token tree level.