Expand description
unsynn (from german ‘unsinn’ for nonsense) is a minimalist rust parser library. It achieves this by leaving out the actual grammar implementations which are implemented in distinct crates. Still it comes with batteries included, there are parsers, combinators and transformers to solve most parsing tasks.
In exchange it offers simple composeable Parsers and declarative Parser construction. Grammars will be implemented in their own crates (see unsynn-rust).
It is primarily intended use is when one wants to create proc macros for rust that define their own grammar or need only sparse rust parsers.
Other uses can be building parsers for gramars outside a rust/proc-macro context. Unsynn can
parse any &str data (The tokenizer step relies on proc_macro2).
§Examples
§Creating and Parsing Custom Types
The unsynn!{} macro generates the Parser and ToTokens implementations for your types.
Notice that unsynn implements Parser and ToTokens for many standard rust types. Like
we use u32 in this example.
let mut token_iter = "foo ( 1, 2, 3 )".to_token_iter();
unsynn!{
struct IdentThenParenthesisedNumbers {
ident: Ident,
numbers: ParenthesisGroupContaining::<CommaDelimitedVec<u32>>,
}
}
// iter.parse() is from the IParse trait
let ast: IdentThenParenthesisedNumbers = token_iter.parse().unwrap();
assert_tokens_eq!(ast, "foo(1,2,3)");In case the automatiically generated Parser and ToTokens implementations are not
sufficient the macro supports custom parsing and token emission through parse_with and
to_tokens clauses:
from Type:: Parse from a different type before transformation (requiresparse_with)parse_with: Transform or validate values during parsing (used alone for validation, withfromfor transformation)to_tokens: Customize how types are emitted back to tokens (independent)
The parse_with and to_tokens clauses are independent and optional. The from clause must be used together with parse_with. When more control is needed, the implementations can also be written manually.
§Custom Parsing and ToTokens
Example of custom parsing and token emission - parse emoji as bools, emit as emoji:
unsynn! {
struct ThumbVote(bool) from LiteralCharacter:
parse_with |value, _tokens| {
Ok(Self(value.value() == '👍'))
}
to_tokens |s, tokens| {
Literal::character(if s.0 {'👍'} else {'👎'}).to_tokens(tokens);
};
}See the COOKBOOK for more details on parse_with
and to_tokens clauses.
§Using Composition
Composition can be used without defining new datatypes. This is useful for simple parsers or
when one wants to parse things on the fly which are desconstructed immediately. See the
combinator module for more composition types.
// We parse this below
let mut token_iter = "foo ( 1, 2, 3 )".to_token_iter();
// Type::parse() is from the Parse trait
let ast =
Cons::<Ident, ParenthesisGroupContaining::<CommaDelimitedVec<u32>>>
::parse(&mut token_iter).unwrap();
assert_tokens_eq!(ast, "foo ( 1, 2, 3 )");§Custom Operators and Keywords
Keywords and operators can be defined within the unsynn!{} macro:
unsynn! {
keyword Calc = "CALC";
operator Add = "+";
operator Substract = "-";
operator Multiply = "*";
operator Divide = "/";
}
// Build expression parser with proper precedence
type Expression = Cons<Calc, AdditiveExpr, Semicolon>;
type AdditiveExpr = LeftAssocExpr<MultiplicativeExpr, Either<Add, Substract>>;
type MultiplicativeExpr = LeftAssocExpr<LiteralInteger, Either<Multiply, Divide>>;
let ast = "CALC 2*3+4*5 ;".to_token_iter()
.parse::<Expression>().expect("syntax error");Keywords and operators can also be defined using standalone keyword!{} and operator!{} macros.
See the operator names reference for predefined operators.
For more details on building expression parsers with proper precedence and associativity,
see the expressions module documentation.
§Feature Flags
-
proc_macro2:
Controls whether unsynn uses theproc_macro2crate or the built-inproc_macrocrate for token handling. This is enabled by default. When enabled, unsynn can parse from strings (via&str::to_token_iter()), convert tokens to strings (viatokens_to_string()), and be used in any context (tests, examples, etc.). When disabled, unsynn uses only the built-inproc_macrocrate and can only be used from proc-macro crates (withproc-macro = truein Cargo.toml). This creates leaner proc macros without theproc_macro2dependency.APIs disabled without
proc_macro2:- String parsing:
ToTokensfor &str/String (parsing strings into tokens) - Format macros:
format_ident!(),format_literal!(),format_literal_string!() - Transform types:
IntoIdent<T>(requires string parsing for validation) - Test helper:
assert_tokens_eq!()(requires string parsing) - String-based constructors:
Cached::new(),Cached::from_string()(require string parsing)
APIs that remain available:
- Token to string conversion:
tokens_to_string(),to_token_iter(),into_token_iter() - All parsing functionality (works with
TokenStreamfrom proc macro input) - All
ToTokensimplementations (except for &str/String) - Transform types:
IntoLiteralString<T>(usesLiteral::string()constructor) - Type:
Cached<T>(but not the string-based constructors)
- String parsing:
-
hash_keywords:
This enables hash tables for larger keyword groups. This is enabled by default since it guarantees fast lookup in all use-cases and the extra dependency it introduces is very small. Nevertheless this feature can be disabled when keyword grouping is not or rarely used to remove the dependency onrust_hash. Keyword lookups then fall back to a binary search implementation. Note that the implementation already optimizes the cases where only one or only a few keywords are in a group. -
criterion:
Enables thecriterionbenchmarking framework for performance benchmarks. This is disabled by default to keep the dependency tree light. Usecargo bench --features criterionto run the criterion benchmarks. Without this feature, only non-criterion benchmarks will run. -
docgen:
Theunsynn!{},keyword!{}andoperator!{}macros will automatically generate some additional docs. This is enabled by default. -
nonparsable:
This enables the implementation ofParserandToTokensfor theNonParseabletype. When not set, any use of it will result in a compile error. One may disable this for release builds to prevent anyNonParsableleft used in the code, thus checking for completeness (NonParseableis used for marking unimplemented types) and avoiding potential panics at runtime. This is enabled by default, consider to disable it in release builds. -
debug_grammar:
Enables theStderrLog<T, N>debug type that prints type information and token sequences to stderr during parsing. This is useful for debugging complex grammars and understanding parser behavior. When disabled (the default),StderrLogbecomes is zero-cost/no-op. This is disabled by default. Enable it during development withcargo test --features debug_grammarorcargo build --features debug_grammar. See the COOKBOOK for usage examples. -
trait_methods_track_caller:
Adds#[track_caller]toParse,Parser,IParseandToTokenstrait methods. The idea here is to make unsynn more transparent in case of a panic and point closer to the users code that caused the problem. This has a neglible performance impact and is a experimental feature. When it has some bad side effects, please report it. This is enabled by default. -
extra_asserts:
Enables expensive runtime sanity checks for unsynn internals. Enabled while developing unsynn. This adds diagnostics to datastructures and makes unsynn slower and bigger. Should be disabled when unsynn is used by another crate. Currently enabled by default which may (and eventually will) be disabled for stable releases. -
extra_tests:
Enable expensive tests that check semantics that should taken ‘for granted’, will make the testsuite slower. Even without these tests enabled we aim for full (cargo-mutants) test coverage withextra_assertsenabled. This is disabled by default. Many of these tests are kept from development to assert correct semantics but are covered elsewhere.
§Cookbook
While unsynn is pretty lean most code should have documentation and examples. Still some
things need an explanantion how to be efficiently used. Which will be explained here.
§Parsing
Parsing is done over a TokenIter which iterates over the token stream. TokenIter can be
created with to_token_iter() or into_token_iter(), which are implemented for TokenStream
and string types (string impls require the proc_macro2 feature).
The main trait for parsing a TokenIter is the Parse trait. This traits methods are all
default implemented and can be used as is. Parse is implemented for all types that
implement Parser and ToTokens. Parser is the trait that has to be implemented for
each type that should be parsed.
The IParse trait is implemented for TokenIter, this calls Parse::parse() in a
convenient way when types can be inferred.
§Parse::parse_with() transformations
The Parse::parse_with() method is used for parsing in more complex situations. In the
simplest case it can be used to validate the values of a parsed type. More complex usage will
fill in HiddenState and other not parsed members or construct completely new types from
parsed entities. See also the transform module for transformation helpers.
§ToTokens
The ToTokens trait complements Parser by turning parsed entities back into a
TokenStream. Unlike the trait from the quote crate, we define ToTokens for many more types
and provide additional methods.
Notably it provides methods to create the entry points for parsing (when the proc_macro2
feature is enabled: to_token_iter() and into_token_iter()).
When textual representation of a parsed entity is required then tokens_to_string() can be
used (requires proc_macro2 feature). The standard Display trait is implemented on top of
that, as such every type that has ToTokens implemented can be printed as text.
§Composition and Type Aliases
For moderately complex types it is possible to use composition with Cons, Either and
other container types instead defining new enums or structures.
It is recommended to alias such composed types to give them useful names. This can be used for creating grammars on the fly without any boilerplate code.
§The unsynn!{} Macro
The recommended way to describe grammars is to use the unsynn!{} macro. This allows to define
grammars by defining enums and structures. The macro will generate the necessary implementations
for Parser and ToTokens in a safe/optimized way. It is possible to add
HiddenState<T> members to add non syntactic entries to there custom structs.
The macro supports generic types and static lifetime parameters with trait bounds, including
simple trait names (Clone), qualified paths (std::fmt::Debug), lifetime bounds
('static), and multiple bounds combined with + (T: Clone + std::fmt::Debug + 'static). For complete syntax details, see the unsynn! macro
documentation.
Custom parsing and token emission can be implemented with through parse_with and to_tokens
clauses.
§Custom Parsing with parse_with
Use parse_with alone for validation, or with from Type for transformation:
Transformation (requires from Type):
unsynn! {
// Transform: parse integer as bool (from Type required)
struct BoolInt(bool) from LiteralInteger:
parse_with |value, _tokens| {
Ok(Self(value.value() != 0))
};
}Validation (without from):
unsynn! {
// Validate: ensure positive integers only
struct PositiveInt(LiteralInteger);
parse_with |this, tokens| {
if this.0.value() > 0 {
Ok(this)
} else {
Error::other(None, tokens, "must be positive".into())
}
};
}§Custom Token Emission with to_tokens
Use to_tokens to customize how types are emitted back to tokens:
unsynn! {
// Custom output: emit bools as YES/NO
struct YesNo(bool);
to_tokens |s, tokens| {
if s.0 {
Literal::string("YES").to_tokens(tokens);
} else {
Literal::string("NO").to_tokens(tokens);
}
};
}§Combining parse_with and to_tokens
Both clauses are independent and can be used together:
unsynn! {
// Transformation + custom output
struct BoolInt(bool) from LiteralInteger:
parse_with |value, _tokens| { Ok(Self(value.value() != 0)) }
to_tokens |s, tokens| {
Literal::u64_unsuffixed(if s.0 {1} else {0}).to_tokens(tokens);
};
}
unsynn! {
// Validation + custom output
struct PositiveInt(LiteralInteger);
parse_with |this, tokens| {
if this.0.value() > 0 { Ok(this) }
else { Error::other(None, tokens, "must be positive".into()) }
}
to_tokens |s, tokens| {
Punct::new('+', Spacing::Alone).to_tokens(tokens);
s.0.to_tokens(tokens);
};
}§Implementing Parsers
§Transactions
The Parse Trait parses items within a transaction. This is with the
Transaction::transaction() method. Internally this clones the iterator, calls the
Parser::parser() method and copies the cloned iterator back to the original on success.
This means that if a parser fails, the input is reset to the state before the parser was
called. For efficiency reasons the Parser::parser() methods are not transactional,
when they fail they leave the input in a consumed state.
NOTE:: For conjunctive parsers (sequential parsing without alternatives/backtracking),
calling Parser::parser() directly instead of Parse::parse() eliminates transaction overhead.
This is also safe for final elements in a disjunction, but dangerous for non-final
alternatives that need rollback on failure. Benchmarks over 30% speedup depending on
complexity. When in doubt calling the Parse::parse() method is alweays on the
safe side, but comes with some performance penalty.
When one wants to manually parse alternatives within a Parser (like in a enum) these
must be manually called within a transaction. This is only necessary when the parsed entity
is compound and not the final alternative.
enum MyEnum {
// Complex variant
Tuple(i32, i32),
// Simple variant
Simple(i32),
// Another simple variant
Another(Ident),
}
impl Parser for MyEnum {
fn parser(input: &mut TokenIter) -> Result<Self> {
// Use [`Transaction::transaction()`] to parse the tuple variant
if let Ok(tuple) = input.transaction(
|trans_input| -> Result<MyEnum> {
Ok(MyEnum::Tuple(
i32::parser(trans_input)?,
i32::parser(trans_input)?,
))
}
) {
Ok(tuple)
} else
// Try to parse the simple variant
// can use the `Parse::parse()` or `IParse::parse()` method directly since a
// single entity will be put in a transaction by those.
if let Ok(i) = input.parse() {
Ok(MyEnum::Simple(i))
} else {
// Try to parse the last variant
// this can use the `Parser::parser()` method since this is the final alternative
Ok(MyEnum::Another(Ident::parser(input)?))
}
}
}§Different ways to design Parsers
There are different approaches how one can implement parsers. Each has its own advantages and disadvantages. unsynn is agnostic, one can freely mix whatever makes most sense in a particular case.
unsynn uses a rather simple, first come - first served approach when parsing. Parsers may be a subset or share a common prefix with other parsers. This needs some attention.
In the case where parsers are subsets to other parsers and one puts them into a disjunction in
a unsynn! { enum ..} or in a Either combinator the more specific case must come first,
otherwise it will never match.
// Ident must come first since TokenTree matches any token.
type Example = Either<Ident, TokenTree>;For the other case where parsers sharing longer prefixes (this should rarely happen in practice) it may benefit performance to break these into a type with the shared prefix that dispatches on the distinct parts.
§Exact AST Representation
One approach is to define a structures that reflects the AST of the grammar exactly. This is
what the unsynn!{} macro and composition does. The program later works with the parsed
structure directly. The advantage is that Parser and ToTokens are simple and come for
free and that the source structure of the AST stays available.
unsynn!{
// define a list of Ident = "LiteralString",.. assignments
struct Assignment {
id: Ident,
_equal: Assign,
value: LiteralString,
}
struct AssignmentList {
list: DelimitedVec<Assignment, Comma>
}
}When the implementation generated by the unsynn!{} macro is not sufficient one can
implement Parser and ToTokens for custom structs and enums manually.
§High level representation
Another approach is to represent the data more in the way further processing requires. This
simplifies working with the data but one has to implement the Parser and ToTokens
traits manually. Often the Parse::parse_with() method will become useful in such
cases. The transform module provides parsers that can transform the input already when parsing.
// We could go with `unsynn!{struct Assignment{...}}` as above here. But lets use composition
// as example here. This stays internal so its complexity isnt exposed.
type Assignment = Cons<Ident, Assign, LiteralString>;
// Here we'll parse the list of assignments into a structure that represents the
// data in a way thats easier to use from a rust program
#[derive(Default)]
struct AssignmentList {
// each 'Ident = LiteralString'
list: Vec<(Ident, String)>,
// We want to have a fast lookup to the entries
lookup: HashMap<Ident, usize>,
}
impl Parser for AssignmentList {
fn parser(input: &mut TokenIter) -> Result<Self> {
let mut assignment_list = AssignmentList::default();
// We construct the `AssignmentList` by parsing the content, appending and processing it.
while let Ok(assignment) = Delimited::<Assignment, Comma>::parse(input) {
assignment_list.list.push((
assignment.value.first.clone(),
// Create a String without the enclosing double quotes
assignment.value.third.as_str().to_string()
));
// add it to the lookup
assignment_list.lookup.insert(
assignment.value.first.clone(),
assignment_list.list.len()-1
);
// No Comma, no more assignments
if assignment.delimiter.is_none() {
break;
}
}
Ok(assignment_list)
}
}
impl ToTokens for AssignmentList {
fn to_tokens(&self, output: &mut TokenStream) {
for a in &self.list {
a.0.to_tokens(output);
Assign::new().to_tokens(output);
LiteralString::from_str(&a.1).to_tokens(output);
Comma::new().to_tokens(output);
}
}
}§Generic Types with Trait Bounds
The unsynn!{} macro supports comprehensive generic type syntax with lifetime parameters
and trait bounds. You can use:
- Lifetime parameters:
<'a>,<'a, 'b>, or mixed<'a, T> - Simple trait names:
T: Clone - Qualified paths:
T: std::fmt::Debug(no imports needed) - Lifetime bounds:
T: 'static - Multiple bounds:
T: Clone + std::fmt::Debug + 'static - Where clauses:
where T: Clone + 'static
§Examples
unsynn! {
// Simple trait bounds
struct SimpleGeneric<T: Clone>(T);
// Qualified paths (no import needed!)
struct WithDebug<T: std::fmt::Debug>(T);
// Multiple bounds with lifetime bounds
struct Complex<T: Clone + std::fmt::Display + 'static>(T);
// Where clauses work too
struct WhereClause<T>(T) where T: Clone + std::fmt::Debug;
}Lifetime parameters are also supported:
unsynn! {
// Single lifetime parameter
struct WithLifetime<'a> {
name: LiteralString,
_marker: std::marker::PhantomData<&'a ()>,
}
// Mixed lifetime and type parameters
struct MixedParams<'a, T: Clone> {
data: T,
_marker: std::marker::PhantomData<&'a ()>,
}
}Limitations:
- Generic type arguments in bounds (e.g.,
T: Trait<U>) are not supported - HRTB (Higher-Ranked Trait Bounds like
for<'a>) are not yet supported - Lifetime parameters on tuple structs and enums with inline
implblocks are not supported (use separateimplblocks outside the macro instead)
For implementation details, see the unsynn! macro documentation.
§Parse Predicates
Parse predicates provide compile-time control over parser behavior using zero-sized types
(ZSTs). They enable context-sensitive parsing by acting as compile-time guards on generic
types. The predicateflag keyword creates newtype wrappers that can implement custom traits
for type-safe context validation.
See the predicates module for the full API reference and the
unsynn! macro Parse Predicates section for syntax details.
§Type-Safe Context Predicates
The predicatetrait macro creates custom context marker traits for compile-time type safety.
It automatically generates trait definitions and implementations for universal predicates
(Enable, Disable, TokensRemain) and logical operators, eliminating boilerplate while
maintaining type safety.
unsynn! {
// Define context traits - automatically implements for universal predicates
predicatetrait ExpressionContext;
predicatetrait StatementContext;
// Create context-specific predicates
predicateflag InExpression for ExpressionContext;
predicateflag InStatement for StatementContext;
predicateflag InBothContexts for ExpressionContext, StatementContext;
keyword KwIf = "if";
keyword KwWhile = "while";
// Type parameters use custom context traits
pub struct IfExpr<P: ExpressionContext = InExpression> {
_guard: P,
kw_if: KwIf,
condition: ParenthesisGroup,
body: BraceGroup,
}
pub struct WhileStmt<P: StatementContext = InStatement> {
_guard: P,
kw_while: KwWhile,
condition: ParenthesisGroup,
body: BraceGroup,
}
// Conditional field using logical operators
pub struct ConditionalField<P: ExpressionContext = InExpression> {
_guard: P,
name: Ident,
}
}
// Context-specific predicates work as expected
let mut tokens = "if (x > 0) { return true; }".to_token_iter();
let result = IfExpr::<InExpression>::parse(&mut tokens);
assert!(result.is_ok());
// Universal predicates (Enable/Disable/TokensRemain) work in ANY context
let mut tokens = "if (true) { }".to_token_iter();
let result = IfExpr::<Enable>::parse(&mut tokens); // Enable implements ExpressionContext
assert!(result.is_ok());
// Logic operators automatically implement context traits when operands do
let mut tokens = "if (x) { y }".to_token_iter();
let result = IfExpr::<AllOf<InExpression, InBothContexts>>::parse(&mut tokens);
assert!(result.is_ok()); // AllOf implements ExpressionContext because both operands do
// Not inverts the logic while preserving trait implementations
let mut tokens = "if (x) { y }".to_token_iter();
let result = IfExpr::<Not<Disable>>::parse(&mut tokens);
assert!(result.is_ok()); // Not<Disable> implements ExpressionContext and succeeds
// OneOf: exactly one predicate must succeed
type OnlyInExpression = OneOf<InExpression, Disable>;
let mut tokens = "value".to_token_iter();
let result = ConditionalField::<OnlyInExpression>::parse(&mut tokens);
assert!(result.is_ok()); // Succeeds: exactly one (InExpression) succeedsBase predicates: Enable (always succeeds), Disable
(always fails), TokensRemain (succeeds only when tokens
remain).
Logical operators: AllOf (AND), AnyOf (OR),
OneOf (XOR), Not (negation) - each accepts 2-4
operands.
Each predicateflag creates a newtype tuple struct that automatically implements Parser,
ToTokens, Clone, Debug, PredicateOp, and Default (when the base is Enable or
TokensRemain), plus any custom traits specified after impl.
§Type Identity Checking with PredicateCmp
The PredicateCmp<A, B, Same = Enable, Different = Disable> predicate enables distinguishing between different
predicate flags at compile time. It dispatches to Same types A and B are identical, and
like Differnt when they differ.
This is useful when you need to accept a specific predicate flag while rejecting others that implement the same trait:
unsynn! {
predicatetrait Context;
predicateflag InExpr = Enable for Context;
predicateflag InStmt = Enable for Context;
predicateflag InType = Enable for Context;
// Only accepts InExpr specifically
struct StructLiteral<C: Context> {
guard: PredicateCmp<C, InExpr>,
name: Ident,
fields: BraceGroup,
}
// Accepts InExpr but NOT InStmt
struct TernaryExpr<C: Context> {
guard: AllOf<PredicateCmp<C, InExpr>, Not<PredicateCmp<C, InStmt>>>,
condition: Ident,
if_true: Ident,
if_false: Ident,
}
// Accepts either InExpr or InType, but not InStmt
struct GenericPath<C: Context> {
guard: AnyOf<PredicateCmp<C, InExpr>, PredicateCmp<C, InType>>,
segments: PathSepDelimitedVec<Ident>,
}
}
// ✅ StructLiteral accepts InExpr
let mut tokens = "Point { x: 1, y: 2 }".to_token_iter();
assert!(StructLiteral::<InExpr>::parse(&mut tokens).is_ok());
// ❌ StructLiteral rejects InStmt (even though InStmt implements Context)
let mut tokens = "Point { x: 1, y: 2 }".to_token_iter();
assert!(StructLiteral::<InStmt>::parse(&mut tokens).is_err());
// ✅ TernaryExpr accepts InExpr and explicitly rejects InStmt
let mut tokens = "cond val1 val2".to_token_iter();
assert!(TernaryExpr::<InExpr>::parse(&mut tokens).is_ok());
// ❌ TernaryExpr rejects InStmt due to Not<PredicateCmp<C, InStmt>>
let mut tokens = "cond val1 val2".to_token_iter();
assert!(TernaryExpr::<InStmt>::parse(&mut tokens).is_err());
// ✅ GenericPath accepts InExpr
let mut tokens = "std::vec::Vec".to_token_iter();
assert!(GenericPath::<InExpr>::parse(&mut tokens).is_ok());
// ✅ GenericPath accepts InType
let mut tokens = "std::vec::Vec".to_token_iter();
assert!(GenericPath::<InType>::parse(&mut tokens).is_ok());
// ❌ GenericPath rejects InStmt
let mut tokens = "std::vec::Vec".to_token_iter();
assert!(GenericPath::<InStmt>::parse(&mut tokens).is_err());Requirements:
- Type parameters must implement
PredicateOp(which includes a'staticbound) - Zero runtime overhead (usually optimized to compile-time constant)
Combining with logical operators:
// Accept A or B, but not C
type AcceptAOrBNotC<T> = AllOf<
AnyOf<PredicateCmp<T, A>, PredicateCmp<T, B>>,
Not<PredicateCmp<T, C>>
>;
// Accept exactly A (not B, not C)
type OnlyA<T> = AllOf<
PredicateCmp<T, A>,
Not<AnyOf<PredicateCmp<T, B>, PredicateCmp<T, C>>>
>;See PredicateCmp for more details.
§Expression Parsing
Unsynn provides expression building blocks for creating operator precedence parsers. These types handle the common patterns found in expression grammars:
expressions::PrefixExpr- Unary prefix operators (e.g.,-x,!flag,*ptr)expressions::PostfixExpr- Unary postfix operators (e.g.,x?,x!)expressions::InfixExpr(orexpressions::NonAssocExpr) - Non-associative binary operators (e.g.,a == b)expressions::LeftAssocExpr- Left-associative operators (e.g.,a + b + c)expressions::RightAssocExpr- Right-associative operators (e.g.,a = b = c)
§Basic Usage
// Define precedence levels (highest to lowest binding)
type MultiplicativeExpr = LeftAssocExpr<LiteralInteger, Star>;
type AdditiveExpr = LeftAssocExpr<MultiplicativeExpr, Plus>;
// Parse with proper precedence: "1 + 2 * 3" → 1 + (2 * 3)
let mut tokens = "1 + 2 * 3".to_token_iter();
let expr: AdditiveExpr = tokens.parse().unwrap();
assert_eq!(expr.len(), 2); // Two operands at addition level: 1 and (2*3)§No Nesting Required
All expression types use DelimitedVec internally, so you don’t need to nest them:
// ✅ Correct: Single PrefixExpr handles multiple operators (--x)
type MyPrefixExpr = PrefixExpr<Minus, Ident>;
// ❌ Wrong: Don't nest - it's redundant!
// type MyPrefixExpr = PrefixExpr<Minus, PrefixExpr<Minus, Ident>>;§Design Patterns for Recursive Expressions
When building expression parsers, you often need to handle recursive structures like
parenthesized expressions (expr) or grouped expressions that can contain the same expression
type they’re part of. This creates a circular type dependency that Rust’s type system doesn’t
allow directly.
These patterns solve the “infinitely sized type” problem by breaking the recursion cycle.
Without indirection, a type like PrimaryExpr::Grouped(Expr) would require Expr to contain
PrimaryExpr, which contains Expr, which contains PrimaryExpr… creating an infinite
chain.
Option 1: Box for Indirection (Recommended)
- Simplest approach
- Use
Box<Expression>in grouped/parenthesized variants - Works naturally with Rust’s type system
- Small allocation overhead (usually negligible)
- Example:
Grouped(ParenthesisGroupContaining<Box<Expr>>)
Option 2: Generic Parameter (Advanced)
- Make primary expression generic over root type:
PrimaryExpr<E> - Pass the expression type to itself:
type Expression = AssignmentExpr<Expression>; - More type-safe but more complex signatures
- Allows different recursion strategies per grammar
Option 3: Traditional Hierarchy
- Explicit wrapper types at each precedence level
- Matches traditional parser generator patterns
- More boilerplate but very clear structure
- Each level wraps the next explicitly
For more details, see the expressions module documentation.
§Complete Example with Parentheses
This example shows a complete expression parser with proper precedence hierarchy and
parenthesized expressions. Parentheses are not part of the expression building blocks -
you handle them in your primary expression type using Box for recursion.
unsynn!{
keyword Calc = "CALC";
operator Pow = "^";
operator Factorial = "!";
operator Mul = "*";
operator Div = "/";
operator Add = "+";
operator Sub = "-";
// Define primary expressions (THIS is where parentheses go)
enum PrimaryExpr {
Literal(LiteralInteger),
Grouped(ParenthesisGroupContaining<Box<Expr>>), // Box breaks recursion
}
// Build precedence hierarchy (highest to lowest binding)
// Power level: Mix infix (^) and postfix (!) at same precedence
// Parse postfix first, then allow x^y between postfix expressions
type MyPostfixExpr = PostfixExpr<PrimaryExpr, Factorial>;
type PowerExpr = RightAssocExpr<MyPostfixExpr, Pow>;
// Multiplication and Division (left-associative)
struct MultiplicativeOp(Either<Mul, Div>);
type MultiplicativeExpr = LeftAssocExpr<PowerExpr, MultiplicativeOp>;
// Addition and Subtraction (left-associative)
struct AdditiveOp(Either<Add, Sub>);
type AdditiveExpr = LeftAssocExpr<MultiplicativeExpr, AdditiveOp>;
// Step 3: Define root expression type
type Expr = AdditiveExpr;
// Top-level statement
struct Expression(Calc, Expr, Semicolon);
}
// Parentheses override precedence
let mut tokens = "CALC (2 + 3) * 4 ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC (2 + 3) * 4 ;");
// Standard precedence: 10 - 5 + 2 * 3 = (10 - 5) + (2 * 3)
let mut tokens = "CALC 10 - 5 + 2 * 3 ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC 10 - 5 + 2 * 3 ;");
// Power is right-associative: 2^3^2 = 2^(3^2) = 2^9 = 512
let mut tokens = "CALC 2 ^ 3 ^ 2 ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC 2 ^ 3 ^ 2 ;");
// Mixing infix and postfix at same level: 2^3! = 2^(3!) = 2^6 = 64
// Postfix binds to its left operand first, then ^ operates on the result
let mut tokens = "CALC 2 ^ 3 ! ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC 2 ^ 3 ! ;");
// The parsed structure shows 3! is grouped before ^ operates on it
let debug_str = format!("{:?}", ast);
assert!(debug_str.contains("PostfixExpr { operand: Literal(LiteralInteger { literal: Literal { lit: 3"));
assert!(debug_str.contains("operators: DelimitedVec"));
assert!(debug_str.contains("Operator<'!'>"));§Lazy Parsing and Lookahead
Grammars frequently need to parse tokens into lists or lookahead to match only when some
conditions are met without consuming tokens. See also the container
module for container implementations and the fundamental module
for lookahead types.
§Greedy Container Types
Unsynn provides several containers that parse elements greedily (as many as possible):
Vec<T>- ParsesTrepeatedly until it fails. Always succeeds (empty vec if no matches).LazyVec<T, S>- ParsesTuntil terminatorS, consuming and storingSin.terminatorfield.LazyVecUntil<T, S>- ParsesTuntil seeing terminatorS(via lookahead), without consumingS.DelimitedVec<T, D, P, MIN, MAX>- Parses delimited lists with configurable trailing delimiter policy and length constraints.
All these containers implement the RangedRepeats trait, which allows constraining the
number of parsed elements.
§Length constrained Type Aliases
For convenience, several type aliases provide common repeat patterns:
Any<T>- Parse 0 or moreMany<T>- Parse 1 or moreExactly<T, N>- Parse exactly NAtLeast<T, N>- Parse N or moreAtMost<T, N>- Parse N or less
§Examples
// Parse any number of attributes
type Attributes = Vec<Attribute>;
// Parse comma-separated parameters (trailing comma optional)
type Params = DelimitedVec<Param, Comma, TrailingDelimiter::Optional>;
// Parse 1 or more statements
type Statements = Many<Statement>;§Lookahead and Composition
Lookahead types check what’s ahead without consuming tokens:
Expect<T>- Positive lookahead: succeeds ifTmatches (doesn’t consume)Except<T>- Negative lookahead: succeeds ifTdoes NOT match (doesn’t consume)
Both return zero-sized types and never advance the token position.
§Composing with Cons
Lookahead is most powerful when combined with Cons for sequential parsing:
Negative lookahead (ensure something is NOT next):
// Parse ButThis only if we're NOT at NotThis
type NegativeExample = Cons<Except<NotThis>, ButThis>;Positive lookahead (check something follows):
// Parse This, then verify ThenThis follows (without consuming it)
type PositiveExample = Cons<This, Expect<ThenThis>>;parse_with() transitions:
Another common pattern is to use lookahead together with parse_with() transitions. This is
especially useful in custom Parser implementations. unsynn itself make frequent use of this.
// Parse an identifier and verify it's followed by '='
let mut tokens = "foo = 42".to_token_iter();
let ident: Ident = <Cons<Ident, Expect<Assign>>>::parse_with(
&mut tokens,
|t, _| Ok(t.first)
).unwrap();
assert_eq!(ident.to_string(), "foo");
assert_tokens_eq!(tokens, "= 42");§Using unsynn! - conjunctive structs
When defining structs in unsynn!{}, fields are conjunctive, so you can write lookahead patterns directly:
Negative lookahead:
unsynn! {
struct NegativeLookahead {
// Ensure we're not at NotThis before parsing This
_check: Except<NotThis>,
pub this: This,
}
}Positive lookahead:
unsynn! {
struct PositiveLookahead {
pub first: First,
// Verify Second follows (without consuming it)
_check: Expect<Second>,
}
}§NonEmptyOption<T> - Optional Non-Empty Content
A common pattern is parsing optional content that should NOT match on empty input:
// Parse expression only if tokens remain
type TrailingExpr = NonEmptyOption<Expression>;NonEmptyOption<T> is like Option<T> but only succeeds when tokens are available to
parse. It prevents matching types that accept empty input (like Vec, Option, etc.).
§Quick Reference
| Pattern | Type | Use Case |
|---|---|---|
| Parse until failure | Vec<T> | Unknown terminator, parse as much as possible |
| Parse until terminator | LazyVec<T, S> | Known terminator, consume it |
| Parse before terminator | LazyVecUntil<T, S> | Known terminator, don’t consume it |
| Delimited list | DelimitedVec<T, D, P, MIN, MAX> | Comma-separated, sized constraints |
| Optional non-empty | NonEmptyOption<T> | Parse T only if tokens remain |
| Positive lookahead | Expect<T> | Check what’s next without consuming |
| Negative lookahead | Except<T> | Ensure what’s next is NOT T |
| Conditional parse | Either<Cons<Expect<A>, X>, Y> | Parse X if A ahead, else Y |
RangedRepeats type aliases: Any<T>, Many<T>, AtLeast<T,N>, AtMost<T,N>, Exactly<T,N>
§See Also
- Container Types - Implementation of lazy containers
- Combinator Types -
Cons,Either, and other combinators - Fundamental Types -
Expect,Except,EndOfStream
§Errors
Unsynn parsers return the first error encountered. It does not try error recovery on its own
or being smart about what may have caused an error (typos, missing semicolons etc). The only
exception to this is when parsing disjunct entities (Either or other enums) where errors are
expected to happen on the first branches. When any branch succeeds the error is dropped and
parsing goes on, when all branches fail then that error which made the most progress is
returned. Progress is tracked with the ShadowCountedIter. This is implemented for enums
created with the unsynn! macro as well for the Either::parser() method. This covers all
normal cases.
When one needs to implement disjunct parsers manually this has to be taken into account.
This is then done by creating an Error with ErrorKind::NoError by
let mut err = Error::no_error() within the Parser::parse implementation. Then any parser
that is called subsequently tries to err.upgrade(Item::parser(..)) which handles storing the
error which made the most progress. Eventually a Ok(...) or the upgraded Err(err) is
returned. For details look at the source of Either::parser.
Errors carry the failed token, the type name that was expected (possibly refined) and a iterator past the location where the error happened. This can be used for further inspection.
Some parser types in unsynn are ZST’s this means they don’t carry the token they parsed and
consequently the have no Span thus the location of an error will be unavailable for them.
If that poses to be a problem this might be revised in future unsynn versions.
In other cases where spans are wrong this is considered a happy accident, please fill a Bug or send a PR fixing this issue.
§Error Recovery
Trying to recover from syntax errors is usually not required/advised for compiled languages. When there is a syntax error, report it to the user and let them fix the source code.
Recoverable parsing would need knowledge of the grammar (or even context) being parsed. This can not be supported by unsynn itself as it does not define any grammars. When one needs recoverable parsers then this has to implemented into the grammar definition. Future versions of unsynn may provide some tools to assist with this. The actual approach is still in discussion.
§Writing Tests
When writing tests one often has to compare a parsed entity against some expected source code.
The tokens_to_string() method provides a reliable way to get a canonical (not pretty printed)
string representation. This is used by the assert_tokens_eq! macro (requires proc_macro2
feature for the string parsing):
// default construct some AST
let expression = <Cons<ConstInteger<1>, Plus, ConstInteger<2>>>::default();
// whitespace doesn't matter here, comments are stripped
assert_tokens_eq!(expression, " 1 + 2 // comments are stripped");Note: tokens_to_string() works with both proc_macro and proc_macro2, but
assert_tokens_eq! requires the proc_macro2 feature because it needs to parse the
expected string into tokens for comparison.
§Grammar Debugging
When developing complex parsers, it can be helpful to see what tokens are being parsed at
specific points in your grammar. The debug module provides StderrLog<T, N>
for this purpose. T is a informal type, the name will be printed, N is a integer limit
for how many tokens are printed (default: 5). Debug output is only generated when the
debug_grammar feature is enabled, otherwise this types parser is no-op.
§Usage
Wrap any parser type with StderrLog to print debug information to stderr:
use unsynn::*;
// Enable with: cargo build --features debug_grammar
let mut tokens = "fn foo(bar: i32) -> Result<(), Error> { }".to_token_iter();
// With debug_grammar enabled, this prints to stderr:
// Type: proc_macro2::Ident
// Source: fn foo(bar : i32) -> Result …
//
// Without debug_grammar, StderrLog is a complete no-op (doesn't parse or consume tokens)
let _ident: StderrLog<Ident, 10> = tokens.parse().unwrap();§Debug Output Recursion
StderrLog automatically recurses into groups (parentheses, brackets, braces) and prints
tokens from within them, respecting the N limit at each level.
§Best Practices
- Add
StderrLogwrappers at key points in your grammar during development - Use different
Nvalues to see more or less context - Leave them in place - without the feature they’re completely erased (zero-cost)
- Enable the feature only during debugging:
cargo test --features debug_grammar
See the debug module documentation for more details.
§Generating and Transforming Code
After parsing one usually wants to emit modified or generated code. This is done with the
ToTokens trait. For convenience we also provide a quote!{} macro which allows one to
template code in place. Unlike its big cousin (the quote crate) ours is rather simple and has
less features (no #(...) repeats, but we have #{...} blocks). Eventually this may be
extended.
§Implementation/Performance Notes
Unsynn is (as of now) implemented as recursive descent PEG with backtracking. This has worst-case exponential complexity, there is a plan to fix that in future releases. Currently to avoid these cases it is recommended to formulate disjunctive parsers so that they fail early and don’t share long prefixes.
For example things like
Either<Cons<LongValidCode, OneThing>, Cons<LongValidCode, OtherThing>> should be
rewritten as Cons<LongValidCode, Either<OneThing, OtherThing>>.
§Stability Guarantees
Unsynn is in development, nevertheless we give some stability promises. These will only be broken when we discover technical reasons that make it infeasible to keep them up. Eventually things in the unstable category below will move up to stable.
§Stable
- Operator name definitions in
operator::names
These follow common (rust) naming, if not this will be fixed, otherwise you can rely on these to be stable. - Functionality
Existing types and parsers are there to stay. A few things especially when they are freshly added may be shaken out and refined/extended but existing functionality should be preserved. In some cases changes may require minor adjustments on code using it. The unsynn! macro itself feels pretty good now. Few new features may be added and minor things fixed (internals, trait bounds etc.). Overall syntax probably stays
§Unstable
- modules
The module organization is still in flux and may be refactored at any time. This shouldn’t matter because unsynn reexports everything at the crate root. The only exception here is theoperator::nameswhich we try to keep stable. - internal representation
Don’t rely on the internal representations there are some plans and ideas to change these. Some types that are currently ZST may become stateful, collections may become wrapped asRc<Vec<T>>etc. - traits
Currently there are the main traitsParse,Parser,IParseandToTokens. In future these may become refactored and split into smaller traits and some more may be added. This will then require some changes for the user. The existing functionality will be preserved nevertheless. - trait bounds
Trait bound are still in flux. Expect that types eventually should be at leastClone + Hash + PartialEq, some types will requireDefault. Possibly some helper trait needs to be implemented too. Details will be worked out over time. - TokenIter
Will be (again) completely rewritten in 0.4 for performance. No idea how well we can keep backward compatibility. - Parser Predicates
They work so far I am not totally happy with them. Decision if they stay as they are is postponed. - Debug module/
StderrLogIt’s a hack will certainly see some makeover, but maybe theStderrLogjust stays as is. - Lifetimes
Where and when we can support lifetimes in bounds (reverted back to ’static) needs to be shaken out. Probably it stays with ’static
§AI Agent Reference Guide
For AI coding assistants working with unsynn, see ‘HOW-TO-USE-UNSYNN-BY-AI.md’ included in the distribution - a comprehensive, compressed reference designed for AI agents. This standalone guide can be placed directly into projects using unsynn.
Note for human readers: The AI guide is optimized for machine parsing and may be dense. This cookbook and the official documentation are better suited for human learning.
§Roadmap
With v0.1.0 we follow rusts practice of semantic versioning, there will be breaking changes on 0.x releases but we try to keep these at a minimum. The planned ‘unsynn-rust’ and along that a ‘unsynn-derive’ will be implemented. When thes are ready and no major deficiencies in ‘unsynn’ are found then it is time for a 1.0.0 release.
§Planned/Ideas
-
can we add prettyprint for
tokens_to_string? should be done in a external crate -
TODO: which types can implement default? … keywords, bool, character, can we reverse string from char
-
improve error handing
- document how errors are reported and what the user can do to handle them
- User can/should write forgiving grammars that are error tolerant
- add tests error/span handling
- future versions may improve the Span handling considerably. Probably by an extra feature flag. We
aim for ergonomic/automagical correct spans, the user shouldnt be burdened by making
things correct. Details for this need to be laied out. Maybe a
SpanOf<T: Parse> - can we have some
Explain<T>that explains what was expected and why it failed to simplify complex errors?
-
transformer/feature
case_converthttps://crates.io/crates/heck -
Brainfart: Dynamic parser construction
insteadparse::<UnsynnType>()create a parse function dynamically from a str parsed by unsynn itself"Either<This, That>".to_parser().parse()this will need atrait DynUnsynnimplementing the common/dynamic parts of these and a registry where all entities supporting dynamic construction are registered. This will likely be factored out into a unsynn-dyn crate Add some scanf like DSL to generate these parsers. xmacro may use it like $(foo@Ident: values) -
Braintfart: Memoization
- not before v0.3 maybe much later, this may be a good opportunity to sponsor unsynn development
in
TokenIter:
Rc< enum { Countdown(Cell(usize)), Memo(HashMap< (counter, typeid, Option<NonZeroU64 hash_over_extra_parameters>), (Result, TokenIter_after) >>), }>Countdown counter which activates memoization only after certain number of tokens parsed, parsing small things does not need the overhead of memoizing. can we somehow (auto) Trait which types become memoized? Small things don’t need to be memoized.
Note to my future self:
Result needs to be dyn Memo where trait Memo: Clone and clone is cheap: enum MaybeRc<T>{Direkt(T), Shared(Rc<T>)} - not before v0.3 maybe much later, this may be a good opportunity to sponsor unsynn development
in
-
add rust types
- f32: 32-bit floating point number
- f64: 64-bit floating point number (default)
-
Benchmarking
- optimize the unsynn! macro (and keyword!, operator!)
This can be done by refactoring the @aspect clauses into dedicated (#[doc(hidden)] unsynn_aspect!) macros and reordering some clauses. But to start this work we need a solid benchmark for the macros first. - benchmark unsynn-rust grammars
- optimize the unsynn! macro (and keyword!, operator!)
-
Brainfart: a build-macro crate (benchmark!)
- library that expands *_unsynn.rs to *_parser.rs from build.rs
-
add Performance guide to COOKBOOK which opt level/lto for what usecase
§Design Priorities
Unsynn foremost goal is to make parsers easy and ergonomic to use. We deliberately provide some duplicated functionality and type aliases to prioritize expressiveness. Fast compile times with as little as necessary dependencies comes second. We do not focus explicitly on rust syntax, this will be addressed by other crates.
§Development
unsynn is meant to evolve opportunistically. When you spot a problem or need a new feature feel free to open an issue or (prefered!) send a PR.
Commits and other git operations are augmented and validated with
bar. For
contributors it is recommened to enable bar too by calling ./bar activate
within a checked out unsynn repository.
§Support/Feature Requests/Sponsoring
When you need professional support or want some feature implemented feel free to contact ct.unsynn@pipapo.org, I’m happy help out, even more when it brings some food on the table.
§Contribution/Coding Guidelines
Chances to get contributions merged increase when you:
- Include documentation following the existing documentation practice. Write examples/doctests.
- Passing
./bar lintsis a absolute requirement, I am not even looking at contributions that fail basic lint and formatting checks. Hint: try./bar dwimto apply formatting and trivial fixes. - Ideally passing
./barwithout errors or warnings. Although if some problems and errors remain to be discussed then a WIP-PR failing tests is temporary acceptable. - Passing test-coverage with
./bar cargo_mutants. - When you activated the githooks with ‘./bar activate’ then the policies for viable commits are enforced automatically.
- Implement reasonable complete things. Not everything needs to be included in a first version, but it must be usable.
§Git Branches
main
Will be updated on new releases. When you plan to make a small contribution that should be merged soon then you can work on top ofmain. Will have linear history.release-*
stable release may get their own branch for fixes and backported features- Will have linear history.devel
Development branch which will eventually be merged intomain. Non-trivial contributions that may take some time to develop should usedevelas starting point. But be prepared to rebase frequently on top of the ongoingdevel. May itself become rebased on fixes and features.fix-*
Non trivial bugfixes are prepared infix-*branches.feature-*
More complex features and experiments are developed in feature branches. Any non trivial contribution should be done in afeature-*branch as well. Once complete they become merged intodevel. Some of these experiments may stall or be abandoned, do not base your contribution on an existing feature branch.
For a history how unsynn evolved, check the CHANGELOG.
Modules§
- CHANGELOG
- Changelog
- Trailing
Delimiter - Policy for the delimiter of the last element in a sequence. Note that delimiters are
after some element, for cases where you have leading delimiters you need to define
grammars that start with
DelimiterorOption<Delimiter>. - combinator
- A unique feature of unsynn is that one can define a parser as a composition of other
parsers on the fly without the need to define custom structures. This is done by using the
ConsandEithertypes. TheConstype is used to define a parser that is a conjunction of two to four other parsers, while theEithertype is used to define a parser that is a disjunction of two to four other parsers. - container
- This module provides parsers for types that contain possibly multiple values. This
includes stdlib types like
Option,Vec,Box,Rc,RefCelland types for delimited and repeated values with numbered repeats. - debug
- Debug utilities for token stream inspection
- delimited
- For easier composition we define the
Delimitedtype here which is aTfollowed by a optional delimiting entityD. This is used by theDelimitedVectype to parse a list of entities separated by a delimiter. - dynamic
- This module contains the types for dynamic transformations after parsing.
- expressions
- Expression parser building blocks for creating operator precedence parsers.
- fundamental
- This module contains the fundamental parsers. These are the basic tokens from
proc_macro2/proc_macroand a few other ones defined by unsynn. These are the terminal entities when parsing tokens. Being able to parseTokenTreeandTokenStreamallows one to parse opaque entities where internal details are left out. TheCachedtype is used to cache the string representation of the parsed entity. TheNothingtype is used to match without consuming any tokens. TheExcepttype is used to match when the next token does not match the given type. TheEndOfStreamtype is used to match the end of the stream when no tokens are left. TheHiddenStatetype is used to hold additional information that is not part of the parsed syntax. - group
- Groups are a way to group tokens together. They are used to represent the contents between
(),{},[]or no delimiters at all. This module provides parser implementations for opaque group types with defined delimiters and theGroupContainingtypes that parses the surrounding delimiters and content of a group type. - literal
- This module provides a set of literal types that can be used to parse and tokenize
literals. The literals are parsed from the token stream and can be used to represent the
parsed value. unsynn defines only simplified literals, such as integers, characters and
strings. The literals here are not full rust syntax, which will be defined in the
unsynn-rustcrate. There areLiteral*forInteger, Character, Stringto parse simple literals andConstInteger<V>andConstCharacter<V>who must match an exact character. The later two also implementDefault, thus they can be used to create constant tokens. There is noConstString; constant literal strings can be constructed withIntoLiteralString<T>. - names
- Unsynn does not implement rust grammar, for common Operators we make an exception because
they are mostly universal and already partial lexed (
Spacing::Alone/Joint) it would add a lot confusion when every user has to redefine common operator types. These operator names have their own module and are reexported at the crate root. This allows one to import only the named operators. - operator
- Combined punctuation tokens are represented by
Operator. Thecrate::operator!macro can be used to define custom operators. - predicates
- Parse predicates for compile-time parser control.
- punct
- This module contains types for punctuation tokens. These are used to represent single and
multi character punctuation tokens. For single character punctuation tokens, there are
there are
PunctAny,PunctAloneandPunctJointtypes. - rust_
types - Parsers for rusts types.
- transform
- This module contains the transforming parsers. This are the parsers that add, remove, replace or reorder Tokens while parsing.
Macros§
- assert_
tokens_ eq - Helper macro that asserts that two entities implementing
ToTokensresult in the sameTokenStream. Used in tests to ensure that the output of parsing is as expected. This macro allows two forms: - format_
cached_ ident - Generates a
CachedIdentfrom a format specification. - format_
ident - Generates a
Identfrom a format specification. - format_
literal - Generates a
Literalfrom a format specification. Unlikeformat_literal_string!, this does not add quotes and can be used to create any kind of literal, such as integers or floats. - format_
literal_ string - Generates a
LiteralStringfrom a format specification. Quote characters around the string are automatically added. - keyword
- Define types matching keywords.
- operator
- Define types matching operators (punctuation sequences).
- quote
- unsynn provides its own
quote!{}macro that translates tokens into aTokenStreamwhile interpolating variables prefixed with aPoundsign (#). This is similar to what the quote macro from the quote crate does but not as powerful. There is no#(...)repetition (yet). - unsynn
- This macro supports the definition of enums, tuple structs and normal structs and
generates
ParserandToTokensimplementations for them. It will deriveDebug. Generics/Lifetimes are not supported on the primary type. Note: eventually a derive macro forParserandToTokenswill become supported by a ‘unsynn-derive’ crate to give finer control over the expansion.#[derive(Copy, Clone)]have to be manually defined. Keyword and operator definitions can also be defined, they delegate to thekeyword!andoperator!macro described below. All entities can be prefixed bypubto make them public. Type aliases, function definitions, macros and use statements are passed through. This makes thing easier readable when you define larger unsynn macro blocks.
Structs§
- AllOf
- Logical AND: 2-4 predicates must all succeed.
- AnyOf
- Logical OR: 2-4 predicates, at least one must succeed.
- Brace
Group - A opaque group of tokens within a Brace
- Brace
Group Containing - Parseable content within a Brace
- Bracket
Group - A opaque group of tokens within a Bracket
- Bracket
Group Containing - Parseable content within a Bracket
- Cached
- Getting the underlying string expensive as it always allocates a new
String. This type caches the string representation of a given entity. Note that this is only reliable for fundamental entities that represent a single token. Spacing between composed tokens is not stable and should be considered informal only. - Cons
- Conjunctive
Afollowed byBand optionalCandDWhenCandDare not used, they are set toNothing. - Const
Character - A constant
charof valueV. Must match V and also hasDefaultimplemented to create aLiteralCharacterwith valueV. - Const
Integer - A constant
u128integer of valueV. Must match V and also hasDefaultimplemented to create aLiteralIntegerwith valueV. - Delimited
- This is used when one wants to parse a list of entities separated by delimiters. The
delimiter is optional and can be
Noneeg. when the entity is the last in the list. Usually the delimiter will be some simple punctuation token, but it is not limited to that. - Delimited
Vec - Since the delimiter in
Delimited<T,D>is optional aVec<Delimited<T,D>>would parse consecutive values even without delimiters.DelimitedVec<T, D, MIN, MAX, P>will stop parsing by MIN/MAX number of elements and depending on the policy defined byPwhich can be one ofTrailingDelimiter. - Disable
- Always fails without consuming tokens.
- Discard
- Succeeds when the next token matches
T. The token will be removed from the stream but not stored. Consequently theToTokensimplementations will panic with a message that it can not be emitted. This can only be used when a token should be present but not stored and never emitted. - DynNode
- Parses a
T(default:Nothing). Allows one to replace it at runtime, after parsing with anything else implementingToTokens. This is backed by aRc. One can replace any cloned occurrences or only the current one. - Enable
- Always succeeds without consuming tokens.
- EndOf
Stream - Matches the end of the stream when no tokens are left.
- Error
- Error type for parsing.
- Except
- Succeeds when the next token does not match
T. Will not consume any tokens. Usually this has to be followed with a conjunctive match such asCons<Except<T>, U>or followed by another entry in a struct or tuple. - Expect
- Succeeds when the next token would match
T. Will not consume any tokens. This is similar to peeking. - Group
- A delimited token stream.
- Group
Containing - Any kind of Group
Gwith parseable contentC. The contentCmust parse exhaustive, anEndOfStreamis automatically implied. - Hidden
State - Sometimes one want to compose types or create structures for unsynn that have members that
are not part of the parsed syntax but add some additional information. This struct can be
used to hold such members while still using the
ParserandToTokenstrait implementations automatically generated by theunsynn!{}macro or composition syntax.HiddenStatewill not consume any tokens when parsing and will not emit any tokens when generating aTokenStream. On parsing it is initialized with a default value. It hasDerefandDerefMutimplemented to access the inner value. - Ident
- A word of Rust code, which may be a keyword or legal variable name.
- Insert
- Injects tokens without parsing anything.
- Into
Ident - Parses
Tand concats all its elements to a single identifier by removing all characters that are not valid in identifiers. WhenTimplementsDefault, such as single string (non group) keywords, operators andConst*literals. Then it can be used to createIntoIdentifieron the fly. Note that construction may still fail when one tries to create a invalid identifier such as one starting with digits for example. - Into
Literal String - Parses
Tand creates aLiteralStringfrom it. WhenTimplementsDefault, such as single string (non group) keywords, operators andConst*literals. It can be used to createIntoLiteralStringon the fly. - Into
Token Stream - Parses
Tand keeps it as opaqueTokenStream. This is useful when one wants to parse a sequence of tokens and keep it as opaque unit or re-parse it later as something else. - Invalid
- A unit that always fails to match. This is useful as default for generics.
See how
Either<A, B, C, D>uses this for unused alternatives. - LazyVec
- A
Vec<T>that is filled up to the first appearance of an terminatingS. ThisSmay be a subset ofT, thus parsing become lazy. This is the same asCons<Vec<Cons<Except<S>,T>>,S>but more convenient and efficient. - Lazy
VecUntil - A
Vec<T>that is filled up to the first appearance of an terminatingS. ThisSmay be a subset ofT, thus parsing become lazy. UnlikeLazyVecthis variant does not consume the final terminator. This is the same asVec<Cons<Except<S>,T>>>but more convenient. - Left
Assoc Expr - Left-associative infix operator expression.
- Literal
- A literal string (
"hello"), byte string (b"hello"), character ('a'), byte character (b'a'), an integer or floating point number with or without a suffix (1,1u8,2.3,2.3f32). - Literal
Character - A single quoted character literal (
'x'). - Literal
Integer - A simple unsigned 128 bit integer. This is the most simple form to parse integers. Note that only decimal integers without any other characters, signs or suffixes are supported, this is not full rust syntax.
- Literal
String - A double quoted string literal (
"hello"). The quotes are included in the value. Note that this is a simplified string literal, and only double quoted strings are supported, this is not full rust syntax, eg. byte and C string literals are not supported. - NonEmpty
Option NonEmptyOption<T>preventsOptionfrom matching whenTcan succeed with empty input. It ensuresNoneis returned when no tokens remain, regardless of whetherTcould succeed on an empty stream. This is crucial when parsing optional trailing content that should only match if tokens are actually available to consume.- NonEmpty
Token Stream - Since parsing a
TokenStreamsucceeds even when no tokens are left, this type is used to parse aTokenStreamthat is not empty. - NonParseable
- A unit that can not be parsed. This is useful as diagnostic placeholder for parsers that
are (yet) unimplemented. The
nonparseablefeature flag controls ifParserandToTokenswill be implemented for it. This is useful in release builds that should not have anyNonParseableleft behind. - None
Group - A opaque group of tokens within a None
- None
Group Containing - Parseable content within a None
- Not
- Logical NOT: succeeds if inner predicate fails.
- Nothing
- A unit that always matches without consuming any tokens. This is required when one wants
to parse a
Repeatswithout a delimiter. Note that usingNothingas primary entity in aVec,LazyVec,DelimitedVecorRepeatswill result in an infinite loop. - OneOf
- Logical XOR: exactly one of 2-4 predicates must succeed.
- Operator
- Operators made from up to four ASCII punctuation characters. Unused characters default to
\0. Custom operators can be defined with thecrate::operator!macro. All but the last character areSpacing::Joint. Attention must be payed when operators have the same prefix, the shorter ones need to be tried first. - Parenthesis
Group - A opaque group of tokens within a Parenthesis
- Parenthesis
Group Containing - Parseable content within a Parenthesis
- Postfix
Expr - Postfix unary operator expression.
- Predicate
Cmp - Predicate that compares type
Awith typeBat runtime. - Prefix
Expr - Prefix unary operator expression.
- Punct
- A
Punctis a single punctuation character like+,-or#. - Punct
Alone - A single character punctuation token which is not followed by another punctuation character.
- Punct
Any - A single character punctuation token with any kind of
Spacing, - Punct
Joint - A single character punctuation token where the lexer joined it with the next
Punctor a single quote followed by a identifier (rust lifetime). - Right
Assoc Expr - Right-associative infix operator expression.
- Skip
- Skips over expected tokens. Will parse and consume the tokens but not store them.
Consequently the
ToTokensimplementations will not output any tokens. - Span
unsynnreexports the entities fromproc_macro2it implementsParseandToTokensfor. A region of source code, along with macro expansion information.- Stderr
Log - A debug parser that prints the typename of
Tand the nextNtokens to stderr. - Swap
- Swaps the order of two entities.
- Token
Iter - Iterator type for parsing token streams.
- Token
Stream - An abstract stream of tokens, or more concretely a sequence of token trees.
- Tokens
Remain - Succeeds only when tokens remain in the stream.
Enums§
- Delimiter
- Describes how a sequence of token trees is delimited.
- Either
- Disjunctive
AorBor optionalCorDtried in that order. WhenCandDare not used, they are set toInvalid. - Error
Kind - Actual kind of an error.
- Spacing
- Whether a
Punctis followed immediately by anotherPunctor followed by another token or whitespace. - Token
Tree - A single token or a delimited sequence of token trees (e.g.
[1, (), ..]).
Traits§
- Dynamic
Tokens - Trait alias for any type that can be used in dynamic
ToTokenscontexts. - Group
Delimiter - Access to the surrounding
Delimiterof aGroupContainingand its variants. - IParse
- Extension trait for
TokenIterthat callsParse::parse(). - Into
Token Iter - Extension trait to convert iterators into
TokenIter. - Parse
- This trait provides the user facing API to parse grammatical entities. It is implemented
for anything that implements the
Parsertrait. The methods here encapsulating the iterator that is used for parsing into a transaction. This iterator is alwaysClone. Instead using a peekable iterator or implementing deeper peeking, parse clones this iterator to make access transactional, when parsing succeeds then the transaction becomes committed, otherwise it is rolled back. - Parser
- The
Parsertrait that must be implemented by anything we want to parse. We are parsing over aTokenIter(TokenStreamiterator). - Predicate
Op - Marker trait for compile-time parser predicates.
- Ranged
Repeats - A trait for parsing a repeating
Twith a minimum and maximum limit. Sometimes the number of elements to be parsed is determined at runtime eg. a number of header items needs a matching number of values. - Refine
Err - Helper Trait for refining error type names. Every parser type in unsynn eventually tries to parse one of the fundamental types. When parsing fails then that fundamental type name is recorded as expected type name of the error. Often this is not desired, a user wants to know the type of parser that actually failed. Since we don’t want to keep a stack/vec of errors for simplicity and performance reasons we provide a way to register refined type names in errors. Note that this refinement should only be applied to leaves in the AST. Refining errors on composed types will lead to unexpected results.
- ToToken
Iter - Extension trait to convert
TokenStreamsintoTokenIter. - ToTokens
- unsynn defines its own
ToTokenstrait to be able to implement it for std container types. This is similar to theToTokensfrom the quote crate but adds some extra methods and is implemented for more types. Moreover theto_token_iter()method is the main entry point for crating an iterator that can be used for parsing. - Token
Count - We track the position of the error by counting tokens. This trait is implemented for
references to shadow counted
TokenIter, andusize. The later allows to pass in a position directly or useusize::MAXin case no position data is available (which will make this error the be the final one when upgrading). - Transaction
- Helper trait to make
TokenItertransactional
Type Aliases§
- And
&- AndAnd
&&- AndEq
&=- Any
- Any number of T delimited by D or
Nothing - AsDefault
- Parse a
Tand replace it with its default value. This is a zero sized type. It can be used for no allocation replacement elements in aVecsince it has a optimization for zero-sized-types where it wont allocate any memory but just act as counter then. - Assign
=- At
@- AtLeast
- At least N of T delimited by D or
Nothing - AtMost
- At most N of T delimited by D or
Nothing - Backslash
\- Bang
!- Cached
Group Groupwith cached string representation.- Cached
Ident Identwith cached string representation.- Cached
Literal Literalwith cached string representation.- Cached
Literal Integer LiteralIntegerwith cached string representation.- Cached
Literal String LiteralStringwith cached string representation.- Cached
Punct Punctwith cached string representation.- Cached
Token Tree TokenTree(any token) with cached string representation.- Caret
^- CaretEq
^=- Colon
:- Colon
Delimited Tfollowed by an optional:- Colon
Delimited Vec DelimitedVecofTdelimited by:withPas policy for the last delimiter.- Comma
,- Comma
Delimited Tfollowed by an optional,- Comma
Delimited Vec DelimitedVecofTdelimited by,withPas policy for the last delimiter.- Dollar
$- Dot
.- DotDelimited
Tfollowed by an optional.- DotDelimited
Vec DelimitedVecofTdelimited by.withPas policy for the last delimiter.- DotDot
..- DotDot
Eq ..=- Ellipsis
...- Equal
==- Exactly
- Exactly N of T delimited by D or
Nothing - FatArrow
=>- Ge
>=- Gt
>- Infix
Expr - Generic infix operator expression.
- LArrow
<-- Le
<=- Lifetime
Tick 'WithSpacing::Joint- Lt
<- Many
- One or more of T delimited by D or
Nothing - Minus
-- MinusEq
-=- NonAssoc
Expr - Type alias for non-associative binary operators.
- NotEqual
!=- Optional
- Zero or one of T delimited by D or
Nothing - Or
|- OrDefault
- Tries to parse a
Tor inserts aDwhen that fails. - OrEq
|=- OrOr
||- PathSep
::- Path
SepDelimited Tfollowed by an optional::- Path
SepDelimited Vec DelimitedVecofTdelimited by::withPas policy for the last delimiter.- Percent
%- Percent
Eq %=- Plus
+- PlusEq
+=- Pound
#- Question
?- RArrow
->- Repeats
DelimitedVec<T,D>with a minimum and maximum (inclusive) number of elements at first without defaults. Parsing will succeed when at least the minimum number of elements is reached and stop at the maximum number. The delimiterDdefaults toNothingto parse sequences which don’t have delimiters.- Replace
- Parse-skip a
Tand inserts aU: Defaultin place. This is a zero sized type. - Result
- Result type for parsing.
- Semicolon
;- Semicolon
Delimited Tfollowed by an optional;- Semicolon
Delimited Vec DelimitedVecofTdelimited by;withPas policy for the last delimiter.- Shl
<<- ShlEq
<<=- Shr
>>- ShrEq
>>=- Slash
/- SlashEq
/=- Star
*- StarEq
*=- Tilde
~- Token
Stream Until - Parses a
TokenStreamuntil, but excludingT. The presence ofTis mandatory.