Crate loess

Source
Expand description
README / Examples (click to expand)

§Loess

Loess is a parser library and parser generator for proc macros.

For a simple but representative example of using Loess, see the inline-json5 crate.

Here’s what to expect:

  • Fast builds. Loess’s core is compact, language agnostic, and useful without enabling a premade grammar.

    That said, even in cases where you do enable a grammar module, builds should still be fairly quick.

  • A simple, flexible API. Loess is relatively unopinionated about how or what you parse, and you can construct (and destructure) Input at any time.

  • Shallow parsing (by default). For tokens with groups, like Visibility, you can opt into deeper (or customised!) parsing via generics.

  • Public fields and one-time validation. The parser checks token specifics once when processing input, but trusts you otherwise.

  • A reasonably powerful parser-generator.

    grammar! can emit documentation (for enums) and PeekFrom, PopFrom and IntoTokens implementations on grammar types in general.

  • Really good error reporting from proc macros implemented with Loess, by default.

    This includes locating panics relative to the proc macro input, instead of squiggling the whole macro.

  • Lenient and partial parsing. The parsers can continue (after reporting an error) when a repeating parse fails in a delimited group.

    You can use this property to still emit as much output as possible, which avoids cascading errors.

  • Low-allocation workflow.

    Loess can (usually) move tokens from input to output without cloning them. (You can still clone all included grammar types explicitly.)

  • Some bugs. For example, none-delimited groups aren’t handled yet, which can cause issues when generating macro input with a macro_rules! macro.

Here’s what not to expect:

  • Complete coverage of Rust’s grammar. In fact, Loess really makes no attempt at all in this regard, since I only implement what I need.

    In particular, unstable grammar is generally out of scope of the included parsers. (Loess can help you supply it yourself!)

  • A Syn-replacement (at least not soon). While there’s no public interaction with Syn, some optional grammar tokens are for now opaque and do defer to Syn when enabled.

  • Debug-implementations on the included grammars. They aren’t that useful here in my experience, but they would increase compile-times.

  • Absence of major version bumps. Rust’s grammar is a moving target and Loess’s grammar tokens aren’t marked #[non_exhaustive] for ease of use.

    However, shallow parsing should make upgrades fairly painless and errors should alert you specifically to grammar changes that are relevant to you.

§Examples

(click to expand code block)
use loess::{
    grammar, parse_all, Input, Errors, PeekFrom, PopFrom, IntoTokens,
    rust_grammar::{ // With the `"rust_grammar"` feature.
        Await, CurlyBraces, Dot, Identifier, Parentheses, Semi, SquareBrackets,
    }
};
use proc_macro2::{Span, TokenStream};

// Generates parsers and pasters, according to the traits written after the type name.
//
// (This macro is hygienic, so you don't have to import the traits for this.)
grammar! {
    pub struct Child: PeekFrom, PopFrom, IntoTokens {
        pub identifier: ChildIdentifier,
        /// Groups are generic (and capture [`TokenStream`] by default.)
        pub new_args: Option<Parentheses>,
        pub dot_await: Option<DotAwait>,
        pub render_args: Option<SquareBrackets>,
        pub children: ChildChildren,
    }

    pub struct DotAwait: PeekFrom, PopFrom, IntoTokens {
        pub dot: Dot,
        pub r#await: Await,
    }

    // It's basic so far, but some documentation can be generated too.
    pub enum ChildIdentifier: doc, IntoTokens {
        Local(Identifier),
        Substrate(Identifier),
        Qualified(TokenStream),
    } else "Expected child identifier.";

    pub enum ChildChildren: PopFrom, IntoTokens {
        Void(Semi),
        Braces(CurlyBraces<Vec<Child>>),
    } else "Expected `;` or `{`.";
}

// Custom logic can be added through simple traits.
impl PeekFrom for ChildIdentifier {
    fn peek_from(input: &Input) -> bool {
        unimplemented!("Just an example.")
    }
}

impl PopFrom for ChildIdentifier {
    // Errors can be emitted even when the parser recovers.
    //
    // This allows multiple errors to be reported at once (subject to priority), and also
    // allows graceful degradation of macro output to avoid cascading errors elsewhere.
    fn pop_from(input: &mut Input, errors: &mut Errors) -> Result<Self, ()> {
        unimplemented!("Just an example.")
    }
}

// Loess has a flexible, unopinionated API:
fn macro_impl(input: TokenStream) -> TokenStream {
    let mut errors = Errors::new();

    // Turns panics into located errors and checks for exhaustiveness.
    // (Errors for unconsumed input have low priority to avoid distractions.)
    let children: Vec<Child> = parse_all(
            &mut Input {
                // This is a plain `VecDeque<TokenTree>`.
                tokens: input.into_iter().collect(),

                // Used to locate errors if the end of input was reached unexpectedly.
                // Nightly macros can use `Span::end` to get a better error location.
                end: Span::call_site(),
            },
            &mut errors,
        ).collect();

    // You can also step through `Input` via for `parse_once…` functions, but you should
    // always use a `parse_all…` function last to check for unconsumed input.

    let root = TokenStream::new();
    let mut output = TokenStream::new();

    errors.into_tokens(
        // Optional `root` module that reexports dependencies.
        // Mainly for wrapped macros that have access to `$crate`.
        //
        // Iff not empty, `Errors` assumes that `core` is reexported there.
        &root,
        &mut output,
    );

    // You can emit your output step-by-step, or exit early after emitting `errors`.
    children.into_tokens(&root, &mut output);

    output
}

// Alternatively:

fn macro_impl2(input: TokenStream) -> TokenStream {
    let mut errors = Errors::new();

    let root = TokenStream::new();

    grammar! {
        struct Grammar: PopFrom (
            Identifier,
            CurlyBraces<Vec<Child>>,
        );
    }

    let Some(Grammar(name, children)) = parse_all(
            &mut Input {
                // This is a plain `VecDeque<TokenTree>`.
                tokens: input.into_iter().collect(),

                // Used to locate errors if the end of input was reached unexpectedly.
                // Nightly macros can use `Span::end` to get a better error location.
                end: Span::call_site(),
            },
            &mut errors,
        ).next() else { return errors.collect_tokens(&root); };

    let mut output = errors.collect_tokens(&root);

    // Emit your output step-by-step.
    name.into_tokens(&root, &mut output);
    children.into_tokens(&root, &mut output);

    output
}

§Using $crate for full caller independence

loess::IntoTokens-methods take an (optionally empty) root: &TokenStream parameter, which all emitted fully qualified paths should be prefixed with.

In combination with a wrapper crate: This achieves full isolation regarding caller dependencies:

(click to expand code blocks)
// wrapper crate

#[macro_export]
macro_rules! my_macro {
    ($($tt:tt)*) => ( $crate::__::my_macro!([$crate] $($tt)*) );
}

#[doc(hidden)]
pub mod __ {
    pub use core; // Expected by `Errors`.
    pub use my_macro_impl::my_macro;
}
// my_macro_impl (proc macro)

use loess::{
    grammar, parse_once, parse_all,
    Errors, Input, IntoTokens,
    rust_grammar::{SquareBrackets},
};
use proc_macro2::{Span, TokenStream, TokenTree};

// […]

fn macro_impl(input: TokenStream) -> TokenStream {
    let mut input = Input {
        tokens: input.into_iter().collect(),
        end: Span::call_site(),
    };
    let mut errors = Errors::new();

    // `root` is implicitly a `TokenStream`.
    let Ok(SquareBrackets { contents: root, .. }) = parse_once(
            &mut input,
            &mut errors,
        ) else { return errors.collect_tokens(&TokenStream::new()) };

    grammar! {
        /// This represents your complete input grammar.
        /// This here is a placeholder, so it's empty.
        struct Grammar: PopFrom {}
    }

    // Checks for exhaustiveness.
    let parsed = parse_all(&mut input, &mut errors).next();
    let mut output = errors.collect_tokens(&root);

    if let Some(Grammar {}) = parsed {
        // Emit your output here.
    }

    output
}

Loess is a parser library and parser generator for proc macros.

For a simple but representative example of using Loess, see the inline-json5 crate.

In most cases you’ll want to:

  1. generate custom grammar implementations with grammar! (You can also easily implement parts manually.),
  2. create (mutable) instances of Input and Errors,
  3. step through the input with parse_once, parse_once_with and/or parse_once_with_infallible,
  4. consume the last of the input with parse_all, parse_all_with or parse_all_with_infallible,
  5. perform any fallible transforms you need, possibly pushing more Errors into your Errors,
  6. if errors is your Errors, have let mut output: proc_macro2::TokenStream = errors.collect_tokens(); convert it into the start of your output,
  7. emit your regular output with quote_into_mixed_site! (recommended), quote_into_with_exact_span! or quote_into_call_site!, which accept interpolation and control flow directives.

You can call either Iterator::collect (for repeats) or Iterator::next (for one value) on step 3. Either way, the parsing iterator will check for unconsumed tokens remaining in the Input when dropped and report to the Errors accordingly.

You can combine step 2 into step 3 with a grammar!-generated top-level grammar, but for proc macros embedded in a runtime library, in most cases I recommend getting $crate from a wrapper macro_rules!-macro first. (See full example above.)

Some parsing errors are recoverable, but still translate to compile_error! calls being generated in step 5. Your macro should seamlessly continue to operate in such cases, which helps prevent noise from cascading errors due to e.g. missing items, making it much easier for your macro’s users to find problems with the input.

You can download a .code-snippets file for Loess’s macros and quote macro directives here: https://github.com/Tamschi/Asteracea/blob/develop/.vscode/Loess.code-snippets

§Features

None are default, as DSL macros might not need Rust’s grammar at all.

§"rust_grammar"

Enables rust_grammar.

§"opaque_rust_grammar" enables "rust_grammar", depends on syn and quote

Adds additional opaque Rust grammar tokens, to consume, paste and clone for example Statements and Patterns.

These preliminary implementations are Syn-based and can’t be inspected.

Modules§

error_priorities
ConstErrorPriority types for use with Exhaustive and EndOfInput.
rust_grammar
With "rust_grammar": Tokens representing the stable Rust programming language, closely following The Rust Reference.

Macros§

grammar
Parser- and serialiser-generator macro.
quote_into_call_site
Like quote_into_mixed_site!, but resolved according to Span::call_site().
quote_into_mixed_site
Simple generic quotation (statement) macro that works well with Loess’s types.
quote_into_with_exact_span
Like quote_into_mixed_site!, but using $span directly for quoted tokens.
raw_quote_into_call_site
Like raw_quote_into_mixed_site!, but resolved according to Span::call_site().
raw_quote_into_mixed_site
Simple generic quotation (statement) macro that efficiently emits tokens verbatim.
raw_quote_into_with_exact_span
Like raw_quote_into_mixed_site!, but using $span directly for quoted tokens.

Structs§

Eager
Wraps a collection type to eagerly parse values that are PeekFrom, but to stop when PopFrom::peek_pop_from returns None.
EndOfInput
Fails to parse and emits an Error with the given ConstErrorPriority for any unconsumed tokens in Input.
Error
A Span-located proc macro error with ErrorPriority.
Usually submitted through Errors::push.
ErrorPriority
An opaque Error priority.
Errors
A collection of Errors submitted during e.g. parsing with PopFrom.
Exhaustive
Doesn’t fail to parse but emits an Error with the given ConstErrorPriority for any unconsumed tokens in Input after T.
HandledPanic
A substitute panic that isn’t reported as Error. (Read for panic handling info!)
Input
Input tokens with end-Span.
For use with PeekFrom and PopFrom.

Traits§

ConstErrorPriority
ErrorPriority as generic type argument.
IntoTokens
Spreads self into its contained or representative TokenTrees.
PeekFrom
Determines if Self may be be parseable from an Input.
This is often a cursory check!
PopFrom
Consumes from Input to create Result<Self, ()> and emit to Errors.
SimpleSpanned
Has a single Span.

Functions§

parse_all
Conveniently parses remaining Input through PopFrom, catching and submitting panics to the given Errors.
parse_all_with
Parses remaining Input through FnMut, catching and submitting panics to the given Errors.
parse_all_with_infallible
Low-level function that parses remaining Input through FnMut without also catching Err(()), catching and submitting panics to the given Errors.
parse_once
Convenient non-repeating PopFrom::pop_from-unwind-catcher that reports panics to the given Errors.
parse_once_with
FnOnce-unwind-catcher that reports panics to the given Errors.
parse_once_with_infallible
Low-level FnOnce-unwind-catcher that reports panics to the given Errors without also catching Err(()).