Crate unsynn

Expand description

unsynn (from german ‘unsinn’ for nonsense) is a minimalist rust parser library. It achieves this by leaving out the actual grammar implementations which are implemented in distinct crates. Still it comes with batteries included, there are parsers, combinators and transformers to solve most parsing tasks.

In exchange it offers simple composeable Parsers and declarative Parser construction. Grammars will be implemented in their own crates (see unsynn-rust).

It is primarily intended use is when one wants to create proc macros for rust that define their own grammar or need only sparse rust parsers.

Other uses can be building parsers for gramars outside a rust/proc-macro context. Unsynn can parse any &str data (The tokenizer step relies on proc_macro2).

§Examples

§Creating and Parsing Custom Types

The unsynn!{} macro generates the Parser and ToTokens implementations for your types.

Notice that unsynn implements Parser and ToTokens for many standard rust types. Like we use u32 in this example.

let mut token_iter = "foo ( 1, 2, 3 )".to_token_iter();

unsynn!{
    struct IdentThenParenthesisedNumbers {
        ident: Ident,
        numbers: ParenthesisGroupContaining::<CommaDelimitedVec<u32>>,
    }
}

// iter.parse() is from the IParse trait
let ast: IdentThenParenthesisedNumbers = token_iter.parse().unwrap();

assert_tokens_eq!(ast, "foo(1,2,3)");

In case the automatiically generated Parser and ToTokens implementations are not sufficient the macro supports custom parsing and token emission through parse_with and to_tokens clauses:

from Type:: Parse from a different type before transformation (requires parse_with)
parse_with: Transform or validate values during parsing (used alone for validation, with from for transformation)
to_tokens: Customize how types are emitted back to tokens (independent)

The parse_with and to_tokens clauses are independent and optional. The from clause must be used together with parse_with. When more control is needed, the implementations can also be written manually.

§Custom Parsing and `ToTokens`

Example of custom parsing and token emission - parse emoji as bools, emit as emoji:

unsynn! {
    struct ThumbVote(bool) from LiteralCharacter:
    parse_with |value, _tokens| {
        Ok(Self(value.value() == '👍'))
    }
    to_tokens |s, tokens| {
        Literal::character(if s.0 {'👍'} else {'👎'}).to_tokens(tokens);
    };
}

See the COOKBOOK for more details on parse_with and to_tokens clauses.

§Using Composition

Composition can be used without defining new datatypes. This is useful for simple parsers or when one wants to parse things on the fly which are desconstructed immediately. See the combinator module for more composition types.

// We parse this below
let mut token_iter = "foo ( 1, 2, 3 )".to_token_iter();

// Type::parse() is from the Parse trait
let ast =
    Cons::<Ident, ParenthesisGroupContaining::<CommaDelimitedVec<u32>>>
        ::parse(&mut token_iter).unwrap();

assert_tokens_eq!(ast, "foo ( 1, 2, 3 )");

§Custom Operators and Keywords

Keywords and operators can be defined within the unsynn!{} macro:

unsynn! {
    keyword Calc = "CALC";
    operator Add = "+";
    operator Substract = "-";
    operator Multiply = "*";
    operator Divide = "/";
}

// Build expression parser with proper precedence
type Expression = Cons<Calc, AdditiveExpr, Semicolon>;
type AdditiveExpr = LeftAssocExpr<MultiplicativeExpr, Either<Add, Substract>>;
type MultiplicativeExpr = LeftAssocExpr<LiteralInteger, Either<Multiply, Divide>>;

let ast = "CALC 2*3+4*5 ;".to_token_iter()
    .parse::<Expression>().expect("syntax error");

Keywords and operators can also be defined using standalone keyword!{} and operator!{} macros. See the operator names reference for predefined operators.

For more details on building expression parsers with proper precedence and associativity, see the expressions module documentation.

§Feature Flags

proc_macro2:
Controls whether unsynn uses the proc_macro2 crate or the built-in proc_macro crate for token handling. This is enabled by default. When enabled, unsynn can parse from strings (via &str::to_token_iter()), convert tokens to strings (via tokens_to_string()), and be used in any context (tests, examples, etc.). When disabled, unsynn uses only the built-in proc_macro crate and can only be used from proc-macro crates (with proc-macro = true in Cargo.toml). This creates leaner proc macros without the proc_macro2 dependency.

APIs disabled without proc_macro2:
- String parsing: ToTokens for &str/String (parsing strings into tokens)
- Format macros: format_ident!(), format_literal!(), format_literal_string!()
- Transform types: IntoIdent<T> (requires string parsing for validation)
- Test helper: assert_tokens_eq!() (requires string parsing)
- String-based constructors: Cached::new(), Cached::from_string() (require string parsing)
APIs that remain available:
- Token to string conversion: tokens_to_string(), to_token_iter(), into_token_iter()
- All parsing functionality (works with TokenStream from proc macro input)
- All ToTokens implementations (except for &str/String)
- Transform types: IntoLiteralString<T> (uses Literal::string() constructor)
- Type: Cached<T> (but not the string-based constructors)
hash_keywords:
This enables hash tables for larger keyword groups. This is enabled by default since it guarantees fast lookup in all use-cases and the extra dependency it introduces is very small. Nevertheless this feature can be disabled when keyword grouping is not or rarely used to remove the dependency on rust_hash. Keyword lookups then fall back to a binary search implementation. Note that the implementation already optimizes the cases where only one or only a few keywords are in a group.
criterion:
Enables the criterion benchmarking framework for performance benchmarks. This is disabled by default to keep the dependency tree light. Use cargo bench --features criterion to run the criterion benchmarks. Without this feature, only non-criterion benchmarks will run.
docgen:
The unsynn!{}, keyword!{} and operator!{} macros will automatically generate some additional docs. This is enabled by default.
nonparsable:
This enables the implementation of Parser and ToTokens for the NonParseable type. When not set, any use of it will result in a compile error. One may disable this for release builds to prevent any NonParsable left used in the code, thus checking for completeness (NonParseable is used for marking unimplemented types) and avoiding potential panics at runtime. This is enabled by default, consider to disable it in release builds.
debug_grammar:
Enables the StderrLog<T, N> debug type that prints type information and token sequences to stderr during parsing. This is useful for debugging complex grammars and understanding parser behavior. When disabled (the default), StderrLog becomes is zero-cost/no-op. This is disabled by default. Enable it during development with cargo test --features debug_grammar or cargo build --features debug_grammar. See the COOKBOOK for usage examples.
trait_methods_track_caller:
Adds #[track_caller] to Parse, Parser, IParse and ToTokens trait methods. The idea here is to make unsynn more transparent in case of a panic and point closer to the users code that caused the problem. This has a neglible performance impact and is a experimental feature. When it has some bad side effects, please report it. This is enabled by default.
extra_asserts:
Enables expensive runtime sanity checks for unsynn internals. Enabled while developing unsynn. This adds diagnostics to datastructures and makes unsynn slower and bigger. Should be disabled when unsynn is used by another crate. Currently enabled by default which may (and eventually will) be disabled for stable releases.
extra_tests:
Enable expensive tests that check semantics that should taken ‘for granted’, will make the testsuite slower. Even without these tests enabled we aim for full (cargo-mutants) test coverage with extra_asserts enabled. This is disabled by default. Many of these tests are kept from development to assert correct semantics but are covered elsewhere.

§Cookbook

While unsynn is pretty lean most code should have documentation and examples. Still some things need an explanantion how to be efficiently used. Which will be explained here.

§Parsing

Parsing is done over a TokenIter which iterates over the token stream. TokenIter can be created with to_token_iter() or into_token_iter(), which are implemented for TokenStream and string types (string impls require the proc_macro2 feature).

The main trait for parsing a TokenIter is the Parse trait. This traits methods are all default implemented and can be used as is. Parse is implemented for all types that implement Parser and ToTokens. Parser is the trait that has to be implemented for each type that should be parsed.

The IParse trait is implemented for TokenIter, this calls Parse::parse() in a convenient way when types can be inferred.

§`Parse::parse_with()` transformations

The Parse::parse_with() method is used for parsing in more complex situations. In the simplest case it can be used to validate the values of a parsed type. More complex usage will fill in HiddenState and other not parsed members or construct completely new types from parsed entities. See also the transform module for transformation helpers.

§ToTokens

The ToTokens trait complements Parser by turning parsed entities back into a TokenStream. Unlike the trait from the quote crate, we define ToTokens for many more types and provide additional methods.

Notably it provides methods to create the entry points for parsing (when the proc_macro2 feature is enabled: to_token_iter() and into_token_iter()).

When textual representation of a parsed entity is required then tokens_to_string() can be used (requires proc_macro2 feature). The standard Display trait is implemented on top of that, as such every type that has ToTokens implemented can be printed as text.

§Composition and Type Aliases

For moderately complex types it is possible to use composition with Cons, Either and other container types instead defining new enums or structures.

It is recommended to alias such composed types to give them useful names. This can be used for creating grammars on the fly without any boilerplate code.

§The `unsynn!{}` Macro

The recommended way to describe grammars is to use the unsynn!{} macro. This allows to define grammars by defining enums and structures. The macro will generate the necessary implementations for Parser and ToTokens in a safe/optimized way. It is possible to add HiddenState<T> members to add non syntactic entries to there custom structs.

The macro supports generic types and static lifetime parameters with trait bounds, including simple trait names (Clone), qualified paths (std::fmt::Debug), lifetime bounds ('static), and multiple bounds combined with + (T: Clone + std::fmt::Debug + 'static). For complete syntax details, see the unsynn! macro documentation.

Custom parsing and token emission can be implemented with through parse_with and to_tokens clauses.

§Custom Parsing with `parse_with`

Use parse_with alone for validation, or with from Type for transformation:

Transformation (requires from Type):

unsynn! {
    // Transform: parse integer as bool (from Type required)
    struct BoolInt(bool) from LiteralInteger:
    parse_with |value, _tokens| {
        Ok(Self(value.value() != 0))
    };
}

Validation (without from):

unsynn! {
    // Validate: ensure positive integers only
    struct PositiveInt(LiteralInteger);
    parse_with |this, tokens| {
        if this.0.value() > 0 {
            Ok(this)
        } else {
            Error::other(None, tokens, "must be positive".into())
        }
    };
}

§Custom Token Emission with `to_tokens`

Use to_tokens to customize how types are emitted back to tokens:

unsynn! {
    // Custom output: emit bools as YES/NO
    struct YesNo(bool);
    to_tokens |s, tokens| {
        if s.0 {
            Literal::string("YES").to_tokens(tokens);
        } else {
            Literal::string("NO").to_tokens(tokens);
        }
    };
}

§Combining `parse_with` and `to_tokens`

Both clauses are independent and can be used together:

unsynn! {
    // Transformation + custom output
    struct BoolInt(bool) from LiteralInteger:
    parse_with |value, _tokens| { Ok(Self(value.value() != 0)) }
    to_tokens |s, tokens| {
        Literal::u64_unsuffixed(if s.0 {1} else {0}).to_tokens(tokens);
    };
}

unsynn! {
    // Validation + custom output
    struct PositiveInt(LiteralInteger);
    parse_with |this, tokens| {
        if this.0.value() > 0 { Ok(this) }
        else { Error::other(None, tokens, "must be positive".into()) }
    }
    to_tokens |s, tokens| {
        Punct::new('+', Spacing::Alone).to_tokens(tokens);
        s.0.to_tokens(tokens);
    };
}

§Implementing Parsers

§Transactions

The Parse Trait parses items within a transaction. This is with the Transaction::transaction() method. Internally this clones the iterator, calls the Parser::parser() method and copies the cloned iterator back to the original on success. This means that if a parser fails, the input is reset to the state before the parser was called. For efficiency reasons the Parser::parser() methods are not transactional, when they fail they leave the input in a consumed state.

NOTE:: For conjunctive parsers (sequential parsing without alternatives/backtracking), calling Parser::parser() directly instead of Parse::parse() eliminates transaction overhead. This is also safe for final elements in a disjunction, but dangerous for non-final alternatives that need rollback on failure. Benchmarks over 30% speedup depending on complexity. When in doubt calling the Parse::parse() method is alweays on the safe side, but comes with some performance penalty.

When one wants to manually parse alternatives within a Parser (like in a enum) these must be manually called within a transaction. This is only necessary when the parsed entity is compound and not the final alternative.

enum MyEnum {
    // Complex variant
    Tuple(i32, i32),
    // Simple variant
    Simple(i32),
    // Another simple variant
    Another(Ident),
}

impl Parser for MyEnum {
    fn parser(input: &mut TokenIter) -> Result<Self> {
        // Use [`Transaction::transaction()`] to parse the tuple variant
        if let Ok(tuple) = input.transaction(
            |trans_input| -> Result<MyEnum> {
                Ok(MyEnum::Tuple(
                    i32::parser(trans_input)?,
                    i32::parser(trans_input)?,
                ))
            }
        ) {
            Ok(tuple)
        } else
        // Try to parse the simple variant
        // can use the `Parse::parse()` or `IParse::parse()` method directly since a
        // single entity will be put in a transaction by those.
        if let Ok(i) = input.parse() {
            Ok(MyEnum::Simple(i))
        } else {
            // Try to parse the last variant
            // this can use the `Parser::parser()` method since this is the final alternative
            Ok(MyEnum::Another(Ident::parser(input)?))
        }
    }
}

§Different ways to design Parsers

There are different approaches how one can implement parsers. Each has its own advantages and disadvantages. unsynn is agnostic, one can freely mix whatever makes most sense in a particular case.

unsynn uses a rather simple, first come - first served approach when parsing. Parsers may be a subset or share a common prefix with other parsers. This needs some attention.

In the case where parsers are subsets to other parsers and one puts them into a disjunction in a unsynn! { enum ..} or in a Either combinator the more specific case must come first, otherwise it will never match.

// Ident must come first since TokenTree matches any token.
type Example = Either<Ident, TokenTree>;

For the other case where parsers sharing longer prefixes (this should rarely happen in practice) it may benefit performance to break these into a type with the shared prefix that dispatches on the distinct parts.

§Exact AST Representation

One approach is to define a structures that reflects the AST of the grammar exactly. This is what the unsynn!{} macro and composition does. The program later works with the parsed structure directly. The advantage is that Parser and ToTokens are simple and come for free and that the source structure of the AST stays available.


unsynn!{
    // define a list of Ident = "LiteralString",.. assignments
    struct Assignment {
        id: Ident,
        _equal: Assign,
        value: LiteralString,
    }

    struct AssignmentList {
        list: DelimitedVec<Assignment, Comma>
    }
}

When the implementation generated by the unsynn!{} macro is not sufficient one can implement Parser and ToTokens for custom structs and enums manually.

§High level representation

Another approach is to represent the data more in the way further processing requires. This simplifies working with the data but one has to implement the Parser and ToTokens traits manually. Often the Parse::parse_with() method will become useful in such cases. The transform module provides parsers that can transform the input already when parsing.

// We could go with `unsynn!{struct Assignment{...}}` as above here. But lets use composition
// as example here. This stays internal so its complexity isnt exposed.
type Assignment = Cons<Ident, Assign, LiteralString>;

// Here we'll parse the list of assignments into a structure that represents the
// data in a way thats easier to use from a rust program
#[derive(Default)]
struct AssignmentList {
    // each 'Ident = LiteralString'
    list: Vec<(Ident, String)>,
    // We want to have a fast lookup to the entries
    lookup: HashMap<Ident, usize>,
}

impl Parser for AssignmentList {
    fn parser(input: &mut TokenIter) -> Result<Self> {
        let mut assignment_list = AssignmentList::default();

        // We construct the `AssignmentList` by parsing the content, appending and processing it.
        while let Ok(assignment) = Delimited::<Assignment, Comma>::parse(input) {
            assignment_list.list.push((
                assignment.value.first.clone(),
                // Create a String without the enclosing double quotes
                assignment.value.third.as_str().to_string()
            ));
            // add it to the lookup
            assignment_list.lookup.insert(
                assignment.value.first.clone(),
                assignment_list.list.len()-1
            );
            // No Comma, no more assignments
            if assignment.delimiter.is_none() {
                break;
            }
        }
        Ok(assignment_list)
    }
}

impl ToTokens for AssignmentList {
    fn to_tokens(&self, output: &mut TokenStream) {
        for a in &self.list {
            a.0.to_tokens(output);
            Assign::new().to_tokens(output);
            LiteralString::from_str(&a.1).to_tokens(output);
            Comma::new().to_tokens(output);
        }
    }
}

§Generic Types with Trait Bounds

The unsynn!{} macro supports comprehensive generic type syntax with lifetime parameters and trait bounds. You can use:

Lifetime parameters: <'a>, <'a, 'b>, or mixed <'a, T>
Simple trait names: T: Clone
Qualified paths: T: std::fmt::Debug (no imports needed)
Lifetime bounds: T: 'static
Multiple bounds: T: Clone + std::fmt::Debug + 'static
Where clauses: where T: Clone + 'static

§Examples

unsynn! {
    // Simple trait bounds
    struct SimpleGeneric<T: Clone>(T);
    
    // Qualified paths (no import needed!)
    struct WithDebug<T: std::fmt::Debug>(T);
    
    // Multiple bounds with lifetime bounds
    struct Complex<T: Clone + std::fmt::Display + 'static>(T);
    
    // Where clauses work too
    struct WhereClause<T>(T) where T: Clone + std::fmt::Debug;
}

Lifetime parameters are also supported:

unsynn! {
    // Single lifetime parameter
    struct WithLifetime<'a> {
        name: LiteralString,
        _marker: std::marker::PhantomData<&'a ()>,
    }
    
    // Mixed lifetime and type parameters
    struct MixedParams<'a, T: Clone> {
        data: T,
        _marker: std::marker::PhantomData<&'a ()>,
    }
}

Limitations:

Generic type arguments in bounds (e.g., T: Trait<U>) are not supported
HRTB (Higher-Ranked Trait Bounds like for<'a>) are not yet supported
Lifetime parameters on tuple structs and enums with inline impl blocks are not supported (use separate impl blocks outside the macro instead)

For implementation details, see the unsynn! macro documentation.

§Parse Predicates

Parse predicates provide compile-time control over parser behavior using zero-sized types (ZSTs). They enable context-sensitive parsing by acting as compile-time guards on generic types. The predicateflag keyword creates newtype wrappers that can implement custom traits for type-safe context validation.

See the predicates module for the full API reference and the unsynn! macro Parse Predicates section for syntax details.

§Type-Safe Context Predicates

The predicatetrait macro creates custom context marker traits for compile-time type safety. It automatically generates trait definitions and implementations for universal predicates (Enable, Disable, TokensRemain) and logical operators, eliminating boilerplate while maintaining type safety.

unsynn! {
    // Define context traits - automatically implements for universal predicates
    predicatetrait ExpressionContext;
    predicatetrait StatementContext;
    
    // Create context-specific predicates
    predicateflag InExpression for ExpressionContext;
    predicateflag InStatement for StatementContext;
    predicateflag InBothContexts for ExpressionContext, StatementContext;

    keyword KwIf = "if";
    keyword KwWhile = "while";
    
    // Type parameters use custom context traits
    pub struct IfExpr<P: ExpressionContext = InExpression> {
        _guard: P,
        kw_if: KwIf,
        condition: ParenthesisGroup,
        body: BraceGroup,
    }
    
    pub struct WhileStmt<P: StatementContext = InStatement> {
        _guard: P,
        kw_while: KwWhile,
        condition: ParenthesisGroup,
        body: BraceGroup,
    }
    
    // Conditional field using logical operators
    pub struct ConditionalField<P: ExpressionContext = InExpression> {
        _guard: P,
        name: Ident,
    }
}

// Context-specific predicates work as expected
let mut tokens = "if (x > 0) { return true; }".to_token_iter();
let result = IfExpr::<InExpression>::parse(&mut tokens);
assert!(result.is_ok());

// Universal predicates (Enable/Disable/TokensRemain) work in ANY context
let mut tokens = "if (true) { }".to_token_iter();
let result = IfExpr::<Enable>::parse(&mut tokens);  // Enable implements ExpressionContext
assert!(result.is_ok());


// Logic operators automatically implement context traits when operands do
let mut tokens = "if (x) { y }".to_token_iter();
let result = IfExpr::<AllOf<InExpression, InBothContexts>>::parse(&mut tokens);
assert!(result.is_ok());  // AllOf implements ExpressionContext because both operands do


// Not inverts the logic while preserving trait implementations
let mut tokens = "if (x) { y }".to_token_iter();
let result = IfExpr::<Not<Disable>>::parse(&mut tokens);
assert!(result.is_ok());  // Not<Disable> implements ExpressionContext and succeeds

// OneOf: exactly one predicate must succeed
type OnlyInExpression = OneOf<InExpression, Disable>;
let mut tokens = "value".to_token_iter();
let result = ConditionalField::<OnlyInExpression>::parse(&mut tokens);
assert!(result.is_ok());  // Succeeds: exactly one (InExpression) succeeds

Base predicates: Enable (always succeeds), Disable (always fails), TokensRemain (succeeds only when tokens remain).

Logical operators: AllOf (AND), AnyOf (OR), OneOf (XOR), Not (negation) - each accepts 2-4 operands.

Each predicateflag creates a newtype tuple struct that automatically implements Parser, ToTokens, Clone, Debug, PredicateOp, and Default (when the base is Enable or TokensRemain), plus any custom traits specified after impl.

§Type Identity Checking with `PredicateCmp`

The PredicateCmp<A, B, Same = Enable, Different = Disable> predicate enables distinguishing between different predicate flags at compile time. It dispatches to Same types A and B are identical, and like Differnt when they differ.

This is useful when you need to accept a specific predicate flag while rejecting others that implement the same trait:

unsynn! {
    predicatetrait Context;
    predicateflag InExpr = Enable for Context;
    predicateflag InStmt = Enable for Context;
    predicateflag InType = Enable for Context;

    // Only accepts InExpr specifically
    struct StructLiteral<C: Context> {
        guard: PredicateCmp<C, InExpr>,
        name: Ident,
        fields: BraceGroup,
    }

    // Accepts InExpr but NOT InStmt
    struct TernaryExpr<C: Context> {
        guard: AllOf<PredicateCmp<C, InExpr>, Not<PredicateCmp<C, InStmt>>>,
        condition: Ident,
        if_true: Ident,
        if_false: Ident,
    }

    // Accepts either InExpr or InType, but not InStmt
    struct GenericPath<C: Context> {
        guard: AnyOf<PredicateCmp<C, InExpr>, PredicateCmp<C, InType>>,
        segments: PathSepDelimitedVec<Ident>,
    }
}

// ✅ StructLiteral accepts InExpr
let mut tokens = "Point { x: 1, y: 2 }".to_token_iter();
assert!(StructLiteral::<InExpr>::parse(&mut tokens).is_ok());

// ❌ StructLiteral rejects InStmt (even though InStmt implements Context)
let mut tokens = "Point { x: 1, y: 2 }".to_token_iter();
assert!(StructLiteral::<InStmt>::parse(&mut tokens).is_err());

// ✅ TernaryExpr accepts InExpr and explicitly rejects InStmt
let mut tokens = "cond val1 val2".to_token_iter();
assert!(TernaryExpr::<InExpr>::parse(&mut tokens).is_ok());

// ❌ TernaryExpr rejects InStmt due to Not<PredicateCmp<C, InStmt>>
let mut tokens = "cond val1 val2".to_token_iter();
assert!(TernaryExpr::<InStmt>::parse(&mut tokens).is_err());

// ✅ GenericPath accepts InExpr
let mut tokens = "std::vec::Vec".to_token_iter();
assert!(GenericPath::<InExpr>::parse(&mut tokens).is_ok());

// ✅ GenericPath accepts InType
let mut tokens = "std::vec::Vec".to_token_iter();
assert!(GenericPath::<InType>::parse(&mut tokens).is_ok());

// ❌ GenericPath rejects InStmt
let mut tokens = "std::vec::Vec".to_token_iter();
assert!(GenericPath::<InStmt>::parse(&mut tokens).is_err());

Requirements:

Type parameters must implement PredicateOp (which includes a 'static bound)
Zero runtime overhead (usually optimized to compile-time constant)

Combining with logical operators:

// Accept A or B, but not C
type AcceptAOrBNotC<T> = AllOf<
    AnyOf<PredicateCmp<T, A>, PredicateCmp<T, B>>,
    Not<PredicateCmp<T, C>>
>;

// Accept exactly A (not B, not C)
type OnlyA<T> = AllOf<
    PredicateCmp<T, A>,
    Not<AnyOf<PredicateCmp<T, B>, PredicateCmp<T, C>>>
>;

See PredicateCmp for more details.

§Expression Parsing

Unsynn provides expression building blocks for creating operator precedence parsers. These types handle the common patterns found in expression grammars:

expressions::PrefixExpr - Unary prefix operators (e.g., -x, !flag, *ptr)
expressions::PostfixExpr - Unary postfix operators (e.g., x?, x!)
expressions::InfixExpr (or expressions::NonAssocExpr) - Non-associative binary operators (e.g., a == b)
expressions::LeftAssocExpr - Left-associative operators (e.g., a + b + c)
expressions::RightAssocExpr - Right-associative operators (e.g., a = b = c)

§Basic Usage

// Define precedence levels (highest to lowest binding)
type MultiplicativeExpr = LeftAssocExpr<LiteralInteger, Star>;
type AdditiveExpr = LeftAssocExpr<MultiplicativeExpr, Plus>;

// Parse with proper precedence: "1 + 2 * 3" → 1 + (2 * 3)
let mut tokens = "1 + 2 * 3".to_token_iter();
let expr: AdditiveExpr = tokens.parse().unwrap();
assert_eq!(expr.len(), 2); // Two operands at addition level: 1 and (2*3)

§No Nesting Required

All expression types use DelimitedVec internally, so you don’t need to nest them:

// ✅ Correct: Single PrefixExpr handles multiple operators (--x)
type MyPrefixExpr = PrefixExpr<Minus, Ident>;

// ❌ Wrong: Don't nest - it's redundant!
// type MyPrefixExpr = PrefixExpr<Minus, PrefixExpr<Minus, Ident>>;

§Design Patterns for Recursive Expressions

When building expression parsers, you often need to handle recursive structures like parenthesized expressions (expr) or grouped expressions that can contain the same expression type they’re part of. This creates a circular type dependency that Rust’s type system doesn’t allow directly.

These patterns solve the “infinitely sized type” problem by breaking the recursion cycle. Without indirection, a type like PrimaryExpr::Grouped(Expr) would require Expr to contain PrimaryExpr, which contains Expr, which contains PrimaryExpr… creating an infinite chain.

Option 1: Box for Indirection (Recommended)

Simplest approach
Use Box<Expression> in grouped/parenthesized variants
Works naturally with Rust’s type system
Small allocation overhead (usually negligible)
Example: Grouped(ParenthesisGroupContaining<Box<Expr>>)

Option 2: Generic Parameter (Advanced)

Make primary expression generic over root type: PrimaryExpr<E>
Pass the expression type to itself: type Expression = AssignmentExpr<Expression>;
More type-safe but more complex signatures
Allows different recursion strategies per grammar

Option 3: Traditional Hierarchy

Explicit wrapper types at each precedence level
Matches traditional parser generator patterns
More boilerplate but very clear structure
Each level wraps the next explicitly

For more details, see the expressions module documentation.

§Complete Example with Parentheses

This example shows a complete expression parser with proper precedence hierarchy and parenthesized expressions. Parentheses are not part of the expression building blocks - you handle them in your primary expression type using Box for recursion.


unsynn!{
    keyword  Calc = "CALC";
    operator Pow = "^";
    operator Factorial = "!";
    operator Mul = "*";
    operator Div = "/";
    operator Add = "+";
    operator Sub = "-";

    // Define primary expressions (THIS is where parentheses go)
    enum PrimaryExpr {
        Literal(LiteralInteger),
        Grouped(ParenthesisGroupContaining<Box<Expr>>),  // Box breaks recursion
    }

    // Build precedence hierarchy (highest to lowest binding)
    // Power level: Mix infix (^) and postfix (!) at same precedence
    // Parse postfix first, then allow x^y between postfix expressions
    type MyPostfixExpr = PostfixExpr<PrimaryExpr, Factorial>;
    type PowerExpr = RightAssocExpr<MyPostfixExpr, Pow>;

    // Multiplication and Division (left-associative)
    struct MultiplicativeOp(Either<Mul, Div>);
    type MultiplicativeExpr = LeftAssocExpr<PowerExpr, MultiplicativeOp>;

    // Addition and Subtraction (left-associative)
    struct AdditiveOp(Either<Add, Sub>);
    type AdditiveExpr = LeftAssocExpr<MultiplicativeExpr, AdditiveOp>;

    // Step 3: Define root expression type
    type Expr = AdditiveExpr;

    // Top-level statement
    struct Expression(Calc, Expr, Semicolon);
}

// Parentheses override precedence
let mut tokens = "CALC (2 + 3) * 4 ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC (2 + 3) * 4 ;");

// Standard precedence: 10 - 5 + 2 * 3 = (10 - 5) + (2 * 3)
let mut tokens = "CALC 10 - 5 + 2 * 3 ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC 10 - 5 + 2 * 3 ;");

// Power is right-associative: 2^3^2 = 2^(3^2) = 2^9 = 512
let mut tokens = "CALC 2 ^ 3 ^ 2 ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC 2 ^ 3 ^ 2 ;");

// Mixing infix and postfix at same level: 2^3! = 2^(3!) = 2^6 = 64
// Postfix binds to its left operand first, then ^ operates on the result
let mut tokens = "CALC 2 ^ 3 ! ;".to_token_iter();
let ast: Expression = tokens.parse().unwrap();
assert_tokens_eq!(ast, "CALC 2 ^ 3 ! ;");

// The parsed structure shows 3! is grouped before ^ operates on it
let debug_str = format!("{:?}", ast);
assert!(debug_str.contains("PostfixExpr { operand: Literal(LiteralInteger { literal: Literal { lit: 3"));
assert!(debug_str.contains("operators: DelimitedVec"));
assert!(debug_str.contains("Operator<'!'>"));

§Lazy Parsing and Lookahead

Grammars frequently need to parse tokens into lists or lookahead to match only when some conditions are met without consuming tokens. See also the container module for container implementations and the fundamental module for lookahead types.

§Greedy Container Types

Unsynn provides several containers that parse elements greedily (as many as possible):

Vec<T> - Parses T repeatedly until it fails. Always succeeds (empty vec if no matches).
LazyVec<T, S> - Parses T until terminator S, consuming and storing S in .terminator field.
LazyVecUntil<T, S> - Parses T until seeing terminator S (via lookahead), without consuming S.
DelimitedVec<T, D, P, MIN, MAX> - Parses delimited lists with configurable trailing delimiter policy and length constraints.

All these containers implement the RangedRepeats trait, which allows constraining the number of parsed elements.

§Length constrained Type Aliases

For convenience, several type aliases provide common repeat patterns:

Any<T> - Parse 0 or more
Many<T> - Parse 1 or more
Exactly<T, N> - Parse exactly N
AtLeast<T, N> - Parse N or more
AtMost<T, N> - Parse N or less

§Examples


// Parse any number of attributes
type Attributes = Vec<Attribute>;

// Parse comma-separated parameters (trailing comma optional)
type Params = DelimitedVec<Param, Comma, TrailingDelimiter::Optional>;

// Parse 1 or more statements
type Statements = Many<Statement>;

§Lookahead and Composition

Lookahead types check what’s ahead without consuming tokens:

Expect<T> - Positive lookahead: succeeds if T matches (doesn’t consume)
Except<T> - Negative lookahead: succeeds if T does NOT match (doesn’t consume)

Both return zero-sized types and never advance the token position.

§Composing with `Cons`

Lookahead is most powerful when combined with Cons for sequential parsing:

Negative lookahead (ensure something is NOT next):

// Parse ButThis only if we're NOT at NotThis
type NegativeExample = Cons<Except<NotThis>, ButThis>;

Positive lookahead (check something follows):

// Parse This, then verify ThenThis follows (without consuming it)
type PositiveExample = Cons<This, Expect<ThenThis>>;

parse_with() transitions:
Another common pattern is to use lookahead together with parse_with() transitions. This is especially useful in custom Parser implementations. unsynn itself make frequent use of this.

// Parse an identifier and verify it's followed by '='
let mut tokens = "foo = 42".to_token_iter();
let ident: Ident = <Cons<Ident, Expect<Assign>>>::parse_with(
    &mut tokens,
    |t, _| Ok(t.first)
).unwrap();

assert_eq!(ident.to_string(), "foo");
assert_tokens_eq!(tokens, "= 42");

§Using `unsynn!` - conjunctive structs

When defining structs in unsynn!{}, fields are conjunctive, so you can write lookahead patterns directly:

Negative lookahead:

unsynn! {
    struct NegativeLookahead {
        // Ensure we're not at NotThis before parsing This
        _check: Except<NotThis>,
        pub this: This,
    }
}

Positive lookahead:

unsynn! {
    struct PositiveLookahead {
        pub first: First,
        // Verify Second follows (without consuming it)
        _check: Expect<Second>,
    }
}

§`NonEmptyOption<T>` - Optional Non-Empty Content

A common pattern is parsing optional content that should NOT match on empty input:

// Parse expression only if tokens remain
type TrailingExpr = NonEmptyOption<Expression>;

NonEmptyOption<T> is like Option<T> but only succeeds when tokens are available to parse. It prevents matching types that accept empty input (like Vec, Option, etc.).

§Quick Reference

Pattern	Type	Use Case
Parse until failure	`Vec<T>`	Unknown terminator, parse as much as possible
Parse until terminator	`LazyVec<T, S>`	Known terminator, consume it
Parse before terminator	`LazyVecUntil<T, S>`	Known terminator, don’t consume it
Delimited list	`DelimitedVec<T, D, P, MIN, MAX>`	Comma-separated, sized constraints
Optional non-empty	`NonEmptyOption<T>`	Parse T only if tokens remain
Positive lookahead	`Expect<T>`	Check what’s next without consuming
Negative lookahead	`Except<T>`	Ensure what’s next is NOT T
Conditional parse	`Either<Cons<Expect<A>, X>, Y>`	Parse X if A ahead, else Y

RangedRepeats type aliases: Any<T>, Many<T>, AtLeast<T,N>, AtMost<T,N>, Exactly<T,N>

§See Also

Container Types - Implementation of lazy containers
Combinator Types - Cons, Either, and other combinators
Fundamental Types - Expect, Except, EndOfStream

§Errors

Unsynn parsers return the first error encountered. It does not try error recovery on its own or being smart about what may have caused an error (typos, missing semicolons etc). The only exception to this is when parsing disjunct entities (Either or other enums) where errors are expected to happen on the first branches. When any branch succeeds the error is dropped and parsing goes on, when all branches fail then that error which made the most progress is returned. Progress is tracked with the ShadowCountedIter. This is implemented for enums created with the unsynn! macro as well for the Either::parser() method. This covers all normal cases.

When one needs to implement disjunct parsers manually this has to be taken into account. This is then done by creating an Error with ErrorKind::NoError by let mut err = Error::no_error() within the Parser::parse implementation. Then any parser that is called subsequently tries to err.upgrade(Item::parser(..)) which handles storing the error which made the most progress. Eventually a Ok(...) or the upgraded Err(err) is returned. For details look at the source of Either::parser.

Errors carry the failed token, the type name that was expected (possibly refined) and a iterator past the location where the error happened. This can be used for further inspection.

Some parser types in unsynn are ZST’s this means they don’t carry the token they parsed and consequently the have no Span thus the location of an error will be unavailable for them. If that poses to be a problem this might be revised in future unsynn versions.

In other cases where spans are wrong this is considered a happy accident, please fill a Bug or send a PR fixing this issue.

§Error Recovery

Trying to recover from syntax errors is usually not required/advised for compiled languages. When there is a syntax error, report it to the user and let them fix the source code.

Recoverable parsing would need knowledge of the grammar (or even context) being parsed. This can not be supported by unsynn itself as it does not define any grammars. When one needs recoverable parsers then this has to implemented into the grammar definition. Future versions of unsynn may provide some tools to assist with this. The actual approach is still in discussion.

§Writing Tests

When writing tests one often has to compare a parsed entity against some expected source code. The tokens_to_string() method provides a reliable way to get a canonical (not pretty printed) string representation. This is used by the assert_tokens_eq! macro (requires proc_macro2 feature for the string parsing):

// default construct some AST
let expression = <Cons<ConstInteger<1>, Plus, ConstInteger<2>>>::default();
// whitespace doesn't matter here, comments are stripped
assert_tokens_eq!(expression, " 1  +  2 // comments are stripped");

Note: tokens_to_string() works with both proc_macro and proc_macro2, but assert_tokens_eq! requires the proc_macro2 feature because it needs to parse the expected string into tokens for comparison.

§Grammar Debugging

When developing complex parsers, it can be helpful to see what tokens are being parsed at specific points in your grammar. The debug module provides StderrLog<T, N> for this purpose. T is a informal type, the name will be printed, N is a integer limit for how many tokens are printed (default: 5). Debug output is only generated when the debug_grammar feature is enabled, otherwise this types parser is no-op.

§Usage

Wrap any parser type with StderrLog to print debug information to stderr:

use unsynn::*;

// Enable with: cargo build --features debug_grammar

let mut tokens = "fn foo(bar: i32) -> Result<(), Error> { }".to_token_iter();

// With debug_grammar enabled, this prints to stderr:
// Type: proc_macro2::Ident
// Source: fn foo(bar : i32) -> Result …
// 
// Without debug_grammar, StderrLog is a complete no-op (doesn't parse or consume tokens)
let _ident: StderrLog<Ident, 10> = tokens.parse().unwrap();

§Debug Output Recursion

StderrLog automatically recurses into groups (parentheses, brackets, braces) and prints tokens from within them, respecting the N limit at each level.

§Best Practices

Add StderrLog wrappers at key points in your grammar during development
Use different N values to see more or less context
Leave them in place - without the feature they’re completely erased (zero-cost)
Enable the feature only during debugging: cargo test --features debug_grammar

See the debug module documentation for more details.

§Generating and Transforming Code

After parsing one usually wants to emit modified or generated code. This is done with the ToTokens trait. For convenience we also provide a quote!{} macro which allows one to template code in place. Unlike its big cousin (the quote crate) ours is rather simple and has less features (no #(...) repeats, but we have #{...} blocks). Eventually this may be extended.

§Implementation/Performance Notes

Unsynn is (as of now) implemented as recursive descent PEG with backtracking. This has worst-case exponential complexity, there is a plan to fix that in future releases. Currently to avoid these cases it is recommended to formulate disjunctive parsers so that they fail early and don’t share long prefixes.

For example things like Either<Cons<LongValidCode, OneThing>, Cons<LongValidCode, OtherThing>> should be rewritten as Cons<LongValidCode, Either<OneThing, OtherThing>>.

§Stability Guarantees

Unsynn is in development, nevertheless we give some stability promises. These will only be broken when we discover technical reasons that make it infeasible to keep them up. Eventually things in the unstable category below will move up to stable.

§Stable

Operator name definitions in operator::names
These follow common (rust) naming, if not this will be fixed, otherwise you can rely on these to be stable.
Functionality
Existing types and parsers are there to stay. A few things especially when they are freshly added may be shaken out and refined/extended but existing functionality should be preserved. In some cases changes may require minor adjustments on code using it. The unsynn! macro itself feels pretty good now. Few new features may be added and minor things fixed (internals, trait bounds etc.). Overall syntax probably stays

§Unstable

modules
The module organization is still in flux and may be refactored at any time. This shouldn’t matter because unsynn reexports everything at the crate root. The only exception here is the operator::names which we try to keep stable.
internal representation
Don’t rely on the internal representations there are some plans and ideas to change these. Some types that are currently ZST may become stateful, collections may become wrapped as Rc<Vec<T>> etc.
traits
Currently there are the main traits Parse, Parser, IParse and ToTokens. In future these may become refactored and split into smaller traits and some more may be added. This will then require some changes for the user. The existing functionality will be preserved nevertheless.
trait bounds
Trait bound are still in flux. Expect that types eventually should be at least Clone + Hash + PartialEq, some types will require Default. Possibly some helper trait needs to be implemented too. Details will be worked out over time.
TokenIter
Will be (again) completely rewritten in 0.4 for performance. No idea how well we can keep backward compatibility.
Parser Predicates
They work so far I am not totally happy with them. Decision if they stay as they are is postponed.
Debug module/StderrLog It’s a hack will certainly see some makeover, but maybe the StderrLog just stays as is.
Lifetimes
Where and when we can support lifetimes in bounds (reverted back to ’static) needs to be shaken out. Probably it stays with ’static

§AI Agent Reference Guide

For AI coding assistants working with unsynn, see ‘HOW-TO-USE-UNSYNN-BY-AI.md’ included in the distribution - a comprehensive, compressed reference designed for AI agents. This standalone guide can be placed directly into projects using unsynn.

Note for human readers: The AI guide is optimized for machine parsing and may be dense. This cookbook and the official documentation are better suited for human learning.

§Roadmap

With v0.1.0 we follow rusts practice of semantic versioning, there will be breaking changes on 0.x releases but we try to keep these at a minimum. The planned ‘unsynn-rust’ and along that a ‘unsynn-derive’ will be implemented. When thes are ready and no major deficiencies in ‘unsynn’ are found then it is time for a 1.0.0 release.

§Planned/Ideas

can we add prettyprint for tokens_to_string? should be done in a external crate
TODO: which types can implement default? … keywords, bool, character, can we reverse string from char
improve error handing
- document how errors are reported and what the user can do to handle them
- User can/should write forgiving grammars that are error tolerant
- add tests error/span handling
- future versions may improve the Span handling considerably. Probably by an extra feature flag. We aim for ergonomic/automagical correct spans, the user shouldnt be burdened by making things correct. Details for this need to be laied out. Maybe a SpanOf<T: Parse>
- can we have some Explain<T> that explains what was expected and why it failed to simplify complex errors?
transformer/feature case_convert https://crates.io/crates/heck
Brainfart: Dynamic parser construction
instead parse::<UnsynnType>() create a parse function dynamically from a str parsed by unsynn itself "Either<This, That>".to_parser().parse() this will need a trait DynUnsynn implementing the common/dynamic parts of these and a registry where all entities supporting dynamic construction are registered. This will likely be factored out into a unsynn-dyn crate Add some scanf like DSL to generate these parsers. xmacro may use it like $(foo@Ident: values)
Braintfart: Memoization
- not before v0.3 maybe much later, this may be a good opportunity to sponsor unsynn development in TokenIter:
```
 Rc< enum {
   Countdown(Cell(usize)),
   Memo(HashMap<
     (counter, typeid, Option<NonZeroU64 hash_over_extra_parameters>),
     (Result, TokenIter_after)
   >>),
 }>
```
Countdown counter which activates memoization only after certain number of tokens parsed, parsing small things does not need the overhead of memoizing. can we somehow (auto) Trait which types become memoized? Small things don’t need to be memoized.

Note to my future self:
```
  Result needs to be dyn Memo
  where  trait Memo: Clone   and clone is cheap:
           enum MaybeRc<T>{Direkt(T), Shared(Rc<T>)}
```
add rust types
- f32: 32-bit floating point number
- f64: 64-bit floating point number (default)
Benchmarking
- optimize the unsynn! macro (and keyword!, operator!)
  This can be done by refactoring the @aspect clauses into dedicated (#[doc(hidden)] unsynn_aspect!) macros and reordering some clauses. But to start this work we need a solid benchmark for the macros first.
- benchmark unsynn-rust grammars
Brainfart: a build-macro crate (benchmark!)
- library that expands *_unsynn.rs to *_parser.rs from build.rs
add Performance guide to COOKBOOK which opt level/lto for what usecase

§Design Priorities

Unsynn foremost goal is to make parsers easy and ergonomic to use. We deliberately provide some duplicated functionality and type aliases to prioritize expressiveness. Fast compile times with as little as necessary dependencies comes second. We do not focus explicitly on rust syntax, this will be addressed by other crates.

§Development

unsynn is meant to evolve opportunistically. When you spot a problem or need a new feature feel free to open an issue or (prefered!) send a PR.

Commits and other git operations are augmented and validated with bar. For contributors it is recommened to enable bar too by calling ./bar activate within a checked out unsynn repository.

§Support/Feature Requests/Sponsoring

When you need professional support or want some feature implemented feel free to contact ct.unsynn@pipapo.org, I’m happy help out, even more when it brings some food on the table.

§Contribution/Coding Guidelines

Chances to get contributions merged increase when you:

Include documentation following the existing documentation practice. Write examples/doctests.
Passing ./bar lints is a absolute requirement, I am not even looking at contributions that fail basic lint and formatting checks. Hint: try ./bar dwim to apply formatting and trivial fixes.
Ideally passing ./bar without errors or warnings. Although if some problems and errors remain to be discussed then a WIP-PR failing tests is temporary acceptable.
Passing test-coverage with ./bar cargo_mutants.
When you activated the githooks with ‘./bar activate’ then the policies for viable commits are enforced automatically.
Implement reasonable complete things. Not everything needs to be included in a first version, but it must be usable.

§Git Branches

main
Will be updated on new releases. When you plan to make a small contribution that should be merged soon then you can work on top of main. Will have linear history.
release-*
stable release may get their own branch for fixes and backported features- Will have linear history.
devel
Development branch which will eventually be merged into main. Non-trivial contributions that may take some time to develop should use devel as starting point. But be prepared to rebase frequently on top of the ongoing devel. May itself become rebased on fixes and features.
fix-*
Non trivial bugfixes are prepared in fix-* branches.
feature-*
More complex features and experiments are developed in feature branches. Any non trivial contribution should be done in a feature-* branch as well. Once complete they become merged into devel. Some of these experiments may stall or be abandoned, do not base your contribution on an existing feature branch.

For a history how unsynn evolved, check the CHANGELOG.

Modules§

CHANGELOG: Changelog
TrailingDelimiter: Policy for the delimiter of the last element in a sequence. Note that delimiters are after some element, for cases where you have leading delimiters you need to define grammars that start with Delimiter or Option<Delimiter>.
combinator: A unique feature of unsynn is that one can define a parser as a composition of other parsers on the fly without the need to define custom structures. This is done by using the Cons and Either types. The Cons type is used to define a parser that is a conjunction of two to four other parsers, while the Either type is used to define a parser that is a disjunction of two to four other parsers.
container: This module provides parsers for types that contain possibly multiple values. This includes stdlib types like Option, Vec, Box, Rc, RefCell and types for delimited and repeated values with numbered repeats.
debug: Debug utilities for token stream inspection
delimited: For easier composition we define the Delimited type here which is a T followed by a optional delimiting entity D. This is used by the DelimitedVec type to parse a list of entities separated by a delimiter.
dynamic: This module contains the types for dynamic transformations after parsing.
expressions: Expression parser building blocks for creating operator precedence parsers.
fundamental: This module contains the fundamental parsers. These are the basic tokens from proc_macro2/proc_macro and a few other ones defined by unsynn. These are the terminal entities when parsing tokens. Being able to parse TokenTree and TokenStream allows one to parse opaque entities where internal details are left out. The Cached type is used to cache the string representation of the parsed entity. The Nothing type is used to match without consuming any tokens. The Except type is used to match when the next token does not match the given type. The EndOfStream type is used to match the end of the stream when no tokens are left. The HiddenState type is used to hold additional information that is not part of the parsed syntax.
group: Groups are a way to group tokens together. They are used to represent the contents between (), {}, [] or no delimiters at all. This module provides parser implementations for opaque group types with defined delimiters and the GroupContaining types that parses the surrounding delimiters and content of a group type.
literal: This module provides a set of literal types that can be used to parse and tokenize literals. The literals are parsed from the token stream and can be used to represent the parsed value. unsynn defines only simplified literals, such as integers, characters and strings. The literals here are not full rust syntax, which will be defined in the unsynn-rust crate. There are Literal* for Integer, Character, String to parse simple literals and ConstInteger<V> and ConstCharacter<V> who must match an exact character. The later two also implement Default, thus they can be used to create constant tokens. There is no ConstString; constant literal strings can be constructed with IntoLiteralString<T>.
names: Unsynn does not implement rust grammar, for common Operators we make an exception because they are mostly universal and already partial lexed (Spacing::Alone/Joint) it would add a lot confusion when every user has to redefine common operator types. These operator names have their own module and are reexported at the crate root. This allows one to import only the named operators.
operator: Combined punctuation tokens are represented by Operator. The crate::operator! macro can be used to define custom operators.
predicates: Parse predicates for compile-time parser control.
punct: This module contains types for punctuation tokens. These are used to represent single and multi character punctuation tokens. For single character punctuation tokens, there are there are PunctAny, PunctAlone and PunctJoint types.
rust_types: Parsers for rusts types.
transform: This module contains the transforming parsers. This are the parsers that add, remove, replace or reorder Tokens while parsing.

Macros§

assert_tokens_eq: Helper macro that asserts that two entities implementing ToTokens result in the same TokenStream. Used in tests to ensure that the output of parsing is as expected. This macro allows two forms:
format_cached_ident: Generates a CachedIdent from a format specification.
format_ident: Generates a Ident from a format specification.
format_literal: Generates a Literal from a format specification. Unlike format_literal_string!, this does not add quotes and can be used to create any kind of literal, such as integers or floats.
format_literal_string: Generates a LiteralString from a format specification. Quote characters around the string are automatically added.
keyword: Define types matching keywords.
operator: Define types matching operators (punctuation sequences).
quote: unsynn provides its own quote!{} macro that translates tokens into a TokenStream while interpolating variables prefixed with a Pound sign (#). This is similar to what the quote macro from the quote crate does but not as powerful. There is no #(...) repetition (yet).
unsynn: This macro supports the definition of enums, tuple structs and normal structs and generates Parser and ToTokens implementations for them. It will derive Debug. Generics/Lifetimes are not supported on the primary type. Note: eventually a derive macro for Parser and ToTokens will become supported by a ‘unsynn-derive’ crate to give finer control over the expansion. #[derive(Copy, Clone)] have to be manually defined. Keyword and operator definitions can also be defined, they delegate to the keyword! and operator! macro described below. All entities can be prefixed by pub to make them public. Type aliases, function definitions, macros and use statements are passed through. This makes thing easier readable when you define larger unsynn macro blocks.

Structs§

AllOf: Logical AND: 2-4 predicates must all succeed.
AnyOf: Logical OR: 2-4 predicates, at least one must succeed.
BraceGroup: A opaque group of tokens within a Brace
BraceGroupContaining: Parseable content within a Brace
BracketGroup: A opaque group of tokens within a Bracket
BracketGroupContaining: Parseable content within a Bracket
Cached: Getting the underlying string expensive as it always allocates a new String. This type caches the string representation of a given entity. Note that this is only reliable for fundamental entities that represent a single token. Spacing between composed tokens is not stable and should be considered informal only.
Cons: Conjunctive A followed by B and optional C and D When C and D are not used, they are set to Nothing.
ConstCharacter: A constant char of value V. Must match V and also has Default implemented to create a LiteralCharacter with value V.
ConstInteger: A constant u128 integer of value V. Must match V and also has Default implemented to create a LiteralInteger with value V.
Delimited: This is used when one wants to parse a list of entities separated by delimiters. The delimiter is optional and can be None eg. when the entity is the last in the list. Usually the delimiter will be some simple punctuation token, but it is not limited to that.
DelimitedVec: Since the delimiter in Delimited<T,D> is optional a Vec<Delimited<T,D>> would parse consecutive values even without delimiters. DelimitedVec<T, D, MIN, MAX, P> will stop parsing by MIN/MAX number of elements and depending on the policy defined by P which can be one of TrailingDelimiter.
Disable: Always fails without consuming tokens.
Discard: Succeeds when the next token matches T. The token will be removed from the stream but not stored. Consequently the ToTokens implementations will panic with a message that it can not be emitted. This can only be used when a token should be present but not stored and never emitted.
DynNode: Parses a T (default: Nothing). Allows one to replace it at runtime, after parsing with anything else implementing ToTokens. This is backed by a Rc. One can replace any cloned occurrences or only the current one.
Enable: Always succeeds without consuming tokens.
EndOfStream: Matches the end of the stream when no tokens are left.
Error: Error type for parsing.
Except: Succeeds when the next token does not match T. Will not consume any tokens. Usually this has to be followed with a conjunctive match such as Cons<Except<T>, U> or followed by another entry in a struct or tuple.
Expect: Succeeds when the next token would match T. Will not consume any tokens. This is similar to peeking.
Group: A delimited token stream.
GroupContaining: Any kind of Group G with parseable content C. The content C must parse exhaustive, an EndOfStream is automatically implied.
HiddenState: Sometimes one want to compose types or create structures for unsynn that have members that are not part of the parsed syntax but add some additional information. This struct can be used to hold such members while still using the Parser and ToTokens trait implementations automatically generated by the unsynn!{} macro or composition syntax. HiddenState will not consume any tokens when parsing and will not emit any tokens when generating a TokenStream. On parsing it is initialized with a default value. It has Deref and DerefMut implemented to access the inner value.
Ident: A word of Rust code, which may be a keyword or legal variable name.
Insert: Injects tokens without parsing anything.
IntoIdent: Parses T and concats all its elements to a single identifier by removing all characters that are not valid in identifiers. When T implements Default, such as single string (non group) keywords, operators and Const* literals. Then it can be used to create IntoIdentifier on the fly. Note that construction may still fail when one tries to create a invalid identifier such as one starting with digits for example.
IntoLiteralString: Parses T and creates a LiteralString from it. When T implements Default, such as single string (non group) keywords, operators and Const* literals. It can be used to create IntoLiteralString on the fly.
IntoTokenStream: Parses T and keeps it as opaque TokenStream. This is useful when one wants to parse a sequence of tokens and keep it as opaque unit or re-parse it later as something else.
Invalid: A unit that always fails to match. This is useful as default for generics. See how Either<A, B, C, D> uses this for unused alternatives.
LazyVec: A Vec<T> that is filled up to the first appearance of an terminating S. This S may be a subset of T, thus parsing become lazy. This is the same as Cons<Vec<Cons<Except<S>,T>>,S> but more convenient and efficient.
LazyVecUntil: A Vec<T> that is filled up to the first appearance of an terminating S. This S may be a subset of T, thus parsing become lazy. Unlike LazyVec this variant does not consume the final terminator. This is the same as Vec<Cons<Except<S>,T>>> but more convenient.
LeftAssocExpr: Left-associative infix operator expression.
Literal: A literal string ("hello"), byte string (b"hello"), character ('a'), byte character (b'a'), an integer or floating point number with or without a suffix (1, 1u8, 2.3, 2.3f32).
LiteralCharacter: A single quoted character literal ('x').
LiteralInteger: A simple unsigned 128 bit integer. This is the most simple form to parse integers. Note that only decimal integers without any other characters, signs or suffixes are supported, this is not full rust syntax.
LiteralString: A double quoted string literal ("hello"). The quotes are included in the value. Note that this is a simplified string literal, and only double quoted strings are supported, this is not full rust syntax, eg. byte and C string literals are not supported.
NonEmptyOption: NonEmptyOption<T> prevents Option from matching when T can succeed with empty input. It ensures None is returned when no tokens remain, regardless of whether T could succeed on an empty stream. This is crucial when parsing optional trailing content that should only match if tokens are actually available to consume.
NonEmptyTokenStream: Since parsing a TokenStream succeeds even when no tokens are left, this type is used to parse a TokenStream that is not empty.
NonParseable: A unit that can not be parsed. This is useful as diagnostic placeholder for parsers that are (yet) unimplemented. The nonparseable feature flag controls if Parser and ToTokens will be implemented for it. This is useful in release builds that should not have any NonParseable left behind.
NoneGroup: A opaque group of tokens within a None
NoneGroupContaining: Parseable content within a None
Not: Logical NOT: succeeds if inner predicate fails.
Nothing: A unit that always matches without consuming any tokens. This is required when one wants to parse a Repeats without a delimiter. Note that using Nothing as primary entity in a Vec, LazyVec, DelimitedVec or Repeats will result in an infinite loop.
OneOf: Logical XOR: exactly one of 2-4 predicates must succeed.
Operator: Operators made from up to four ASCII punctuation characters. Unused characters default to \0. Custom operators can be defined with the crate::operator! macro. All but the last character are Spacing::Joint. Attention must be payed when operators have the same prefix, the shorter ones need to be tried first.
ParenthesisGroup: A opaque group of tokens within a Parenthesis
ParenthesisGroupContaining: Parseable content within a Parenthesis
PostfixExpr: Postfix unary operator expression.
PredicateCmp: Predicate that compares type A with type B at runtime.
PrefixExpr: Prefix unary operator expression.
Punct: A Punct is a single punctuation character like +, - or #.
PunctAlone: A single character punctuation token which is not followed by another punctuation character.
PunctAny: A single character punctuation token with any kind of Spacing,
PunctJoint: A single character punctuation token where the lexer joined it with the next Punct or a single quote followed by a identifier (rust lifetime).
RightAssocExpr: Right-associative infix operator expression.
Skip: Skips over expected tokens. Will parse and consume the tokens but not store them. Consequently the ToTokens implementations will not output any tokens.
Span: unsynn reexports the entities from proc_macro2 it implements Parse and ToTokens for. A region of source code, along with macro expansion information.
StderrLog: A debug parser that prints the typename of T and the next N tokens to stderr.
Swap: Swaps the order of two entities.
TokenIter: Iterator type for parsing token streams.
TokenStream: An abstract stream of tokens, or more concretely a sequence of token trees.
TokensRemain: Succeeds only when tokens remain in the stream.

Enums§

Delimiter: Describes how a sequence of token trees is delimited.
Either: Disjunctive A or B or optional C or D tried in that order. When C and D are not used, they are set to Invalid.
ErrorKind: Actual kind of an error.
Spacing: Whether a Punct is followed immediately by another Punct or followed by another token or whitespace.
TokenTree: A single token or a delimited sequence of token trees (e.g. [1, (), ..]).

Traits§

DynamicTokens: Trait alias for any type that can be used in dynamic ToTokens contexts.
GroupDelimiter: Access to the surrounding Delimiter of a GroupContaining and its variants.
IParse: Extension trait for TokenIter that calls Parse::parse().
IntoTokenIter: Extension trait to convert iterators into TokenIter.
Parse: This trait provides the user facing API to parse grammatical entities. It is implemented for anything that implements the Parser trait. The methods here encapsulating the iterator that is used for parsing into a transaction. This iterator is always Clone. Instead using a peekable iterator or implementing deeper peeking, parse clones this iterator to make access transactional, when parsing succeeds then the transaction becomes committed, otherwise it is rolled back.
Parser: The Parser trait that must be implemented by anything we want to parse. We are parsing over a TokenIter (TokenStream iterator).
PredicateOp: Marker trait for compile-time parser predicates.
RangedRepeats: A trait for parsing a repeating T with a minimum and maximum limit. Sometimes the number of elements to be parsed is determined at runtime eg. a number of header items needs a matching number of values.
RefineErr: Helper Trait for refining error type names. Every parser type in unsynn eventually tries to parse one of the fundamental types. When parsing fails then that fundamental type name is recorded as expected type name of the error. Often this is not desired, a user wants to know the type of parser that actually failed. Since we don’t want to keep a stack/vec of errors for simplicity and performance reasons we provide a way to register refined type names in errors. Note that this refinement should only be applied to leaves in the AST. Refining errors on composed types will lead to unexpected results.
ToTokenIter: Extension trait to convert TokenStreams into TokenIter.
ToTokens: unsynn defines its own ToTokens trait to be able to implement it for std container types. This is similar to the ToTokens from the quote crate but adds some extra methods and is implemented for more types. Moreover the to_token_iter() method is the main entry point for crating an iterator that can be used for parsing.
TokenCount: We track the position of the error by counting tokens. This trait is implemented for references to shadow counted TokenIter, and usize. The later allows to pass in a position directly or use usize::MAX in case no position data is available (which will make this error the be the final one when upgrading).
Transaction: Helper trait to make TokenIter transactional

Type Aliases§

And: &
AndAnd: &&
AndEq: &=
Any: Any number of T delimited by D or Nothing
AsDefault: Parse a T and replace it with its default value. This is a zero sized type. It can be used for no allocation replacement elements in a Vec since it has a optimization for zero-sized-types where it wont allocate any memory but just act as counter then.
Assign: =
At: @
AtLeast: At least N of T delimited by D or Nothing
AtMost: At most N of T delimited by D or Nothing
Backslash: \
Bang: !
CachedGroup: Group with cached string representation.
CachedIdent: Ident with cached string representation.
CachedLiteral: Literal with cached string representation.
CachedLiteralInteger: LiteralInteger with cached string representation.
CachedLiteralString: LiteralString with cached string representation.
CachedPunct: Punct with cached string representation.
CachedTokenTree: TokenTree (any token) with cached string representation.
Caret: ^
CaretEq: ^=
Colon: :
ColonDelimited: T followed by an optional :
ColonDelimitedVec: DelimitedVec of T delimited by : with P as policy for the last delimiter.
Comma: ,
CommaDelimited: T followed by an optional ,
CommaDelimitedVec: DelimitedVec of T delimited by , with P as policy for the last delimiter.
Dollar: $
Dot: .
DotDelimited: T followed by an optional .
DotDelimitedVec: DelimitedVec of T delimited by . with P as policy for the last delimiter.
DotDot: ..
DotDotEq: ..=
Ellipsis: ...
Equal: ==
Exactly: Exactly N of T delimited by D or Nothing
FatArrow: =>
Ge: >=
Gt: >
InfixExpr: Generic infix operator expression.
LArrow: <-
Le: <=
LifetimeTick: ' With Spacing::Joint
Lt: <
Many: One or more of T delimited by D or Nothing
Minus: -
MinusEq: -=
NonAssocExpr: Type alias for non-associative binary operators.
NotEqual: !=
Optional: Zero or one of T delimited by D or Nothing
Or: |
OrDefault: Tries to parse a T or inserts a D when that fails.
OrEq: |=
OrOr: ||
PathSep: ::
PathSepDelimited: T followed by an optional ::
PathSepDelimitedVec: DelimitedVec of T delimited by :: with P as policy for the last delimiter.
Percent: %
PercentEq: %=
Plus: +
PlusEq: +=
Pound: #
Question: ?
RArrow: ->
Repeats: DelimitedVec<T,D> with a minimum and maximum (inclusive) number of elements at first without defaults. Parsing will succeed when at least the minimum number of elements is reached and stop at the maximum number. The delimiter D defaults to Nothing to parse sequences which don’t have delimiters.
Replace: Parse-skip a T and inserts a U: Default in place. This is a zero sized type.
Result: Result type for parsing.
Semicolon: ;
SemicolonDelimited: T followed by an optional ;
SemicolonDelimitedVec: DelimitedVec of T delimited by ; with P as policy for the last delimiter.
Shl: <<
ShlEq: <<=
Shr: >>
ShrEq: >>=
Slash: /
SlashEq: /=
Star: *
StarEq: *=
Tilde: ~
TokenStreamUntil: Parses a TokenStream until, but excluding T. The presence of T is mandatory.

Crate unsynn

Crate unsynn Copy item path

§Examples

§Creating and Parsing Custom Types

§Custom Parsing and ToTokens

§Using Composition

§Custom Operators and Keywords

§Feature Flags

§Cookbook

§Parsing

§Parse::parse_with() transformations

§ToTokens

§Composition and Type Aliases

§The unsynn!{} Macro

§Custom Parsing with parse_with

§Custom Token Emission with to_tokens

§Combining parse_with and to_tokens

§Implementing Parsers

§Transactions

§Different ways to design Parsers

§Exact AST Representation

§High level representation

§Generic Types with Trait Bounds

§Examples

§Parse Predicates

§Type-Safe Context Predicates

§Type Identity Checking with PredicateCmp

§Expression Parsing

§Basic Usage

§No Nesting Required

§Design Patterns for Recursive Expressions

§Complete Example with Parentheses

§Lazy Parsing and Lookahead

§Greedy Container Types

§Length constrained Type Aliases

§Examples

§Lookahead and Composition

§Composing with Cons

§Using unsynn! - conjunctive structs

§NonEmptyOption<T> - Optional Non-Empty Content

§Quick Reference

§See Also

§Errors

§Error Recovery

§Writing Tests

§Grammar Debugging

§Usage

§Debug Output Recursion

§Best Practices

§Generating and Transforming Code

§Implementation/Performance Notes

§Stability Guarantees

§Stable

§Unstable

§AI Agent Reference Guide

§Roadmap

§Planned/Ideas

§Design Priorities

§Development

§Support/Feature Requests/Sponsoring

§Contribution/Coding Guidelines

§Git Branches

Modules§

Macros§

Structs§

Enums§

Traits§

Type Aliases§

Crate unsynn

§Custom Parsing and `ToTokens`

§`Parse::parse_with()` transformations

§The `unsynn!{}` Macro

§Custom Parsing with `parse_with`

§Custom Token Emission with `to_tokens`

§Combining `parse_with` and `to_tokens`

§Type Identity Checking with `PredicateCmp`

§Composing with `Cons`

§Using `unsynn!` - conjunctive structs

§`NonEmptyOption<T>` - Optional Non-Empty Content