Crate chumsky[][src]

Expand description

Chumsky

crates.io crates.io License actions-badge

A friendly parser combinator crate that makes writing LL(k) parsers with error recovery and partial parsing easy.

Example usage with my own language, Tao

Note: Error diagnostic rendering is performed by Ariadne

Features

  • Lots of combinators!
  • Generic across input, output, error, and span types
  • Powerful error recovery strategies
  • Inline mapping to your AST
  • Text-specific parsers for both u8s and chars
  • Recursive parsers
  • Automatic support for backtracking, allowing the parsing of LL(k) grammars
  • Parsing of nesting inputs

What is a parser combinator?

Parser combinators are a technique for implementing parsers by defining them in terms of other parsers. The resulting parsers use a recursive descent strategy for transforming an input into an output. Using parser combinators to define parsers is roughly analagous to using Rust’s Iterator trait to define iterative algorithms: the type-driven API of Iterator makes it more difficult to make mistakes and easier to encode complicated iteration logic than if one were to write the same code by hand. The same is true of parsers and parser combinators.

Example Brainfuck Parser

See examples/brainfuck.rs for the full interpreter (cargo run --example brainfuck -- examples/sample.bf).

use chumsky::prelude::*;

#[derive(Clone)]
enum Instr {
    Left, Right,
    Incr, Decr,
    Read, Write,
    Loop(Vec<Self>),
}

fn parser() -> impl Parser<char, Vec<Instr>, Error = Simple<char>> {
    recursive(|bf| bf.delimited_by('[', ']').map(Instr::Loop)
        .or(just('<').to(Instr::Left))
        .or(just('>').to(Instr::Right))
        .or(just('+').to(Instr::Incr))
        .or(just('-').to(Instr::Decr))
        .or(just(',').to(Instr::Read))
        .or(just('.').to(Instr::Write))
        .repeated())
}

Other examples include:

Error Recovery

Chumsky has support for error recovery, meaning that it can encounter a syntax error, report the error, and then attempt to recover itself into a state in which it can continue parsing so that multiple errors can be produced at once and a partial AST can still be generated from the input for future compilation stages to consume.

However, there is no silver bullet strategy for error recovery. By definition, if the input to a parser is invalid then the parser can only make educated guesses as to the meaning of the input. Different recovery strategies will work better for different languages, and for different patterns within those languages.

Chumsky provides a variety of recovery strategies (each implementing the Strategy trait), but it’s important to understand that which you apply, where you apply them, and in what order will greatly affect the quality of the errors that Chumsky is able to produce, along with the extent to which it is able to recover a useful AST. Where possible, you should attempt more ‘specific’ recovery strategies first rather than those that mindlessly skip large swathes of the input.

It is recommended that you experiment with applying different strategies in different situations and at different levels of the parser to find a configuration that you are happy with. If none of the provided error recovery strategies cover the specific pattern you wish to catch, you can even create your own by digging into Chumsky’s internals and implementing your own strategies! If you come up with a useful strategy, feel free to open a PR against the main repo!

Planned Features

  • Intrusive parsers (parsers that parse patterns within nested inputs, allowing you to move delimiter parsing to the lexing stage)
  • A debugging mode (using track_caller) that allows backtrace-style debugging of parser behaviour to help you eliminate ambiguities, solve problems, and understand the route that the parser took through your grammar when processing inputs
  • An optimised ‘happy path’ parser mode that skips error recovery & error generation
  • An even faster ‘validation’ parser mode, guaranteed to not allocate, that doesn’t generate outputs but just verifies the validity of an input

Philosophy

Chumsky should:

  • Be easy to use, even if the user doesn’t understand the complexity that underpins parsing
  • Be type-driven, pushing users away from anti-patterns at compile-time
  • Be a mature, ‘batteries-included’ solution for context-free parsing by default. If you need to implement either Parser or Strategy by hand, that’s a problem that needs fixing
  • Be ‘fast enough’, but no faster (i.e: when there is a tradeoff between error quality and performance, Chumsky will always take the former option)
  • Be modular and extensible, allowing users to implement their own parsers, recovery strategies, error types, spans, and be generic over both input tokens and the output AST

Other Information

My apologies to Noam for choosing such an absurd name.

License

Chumsky is licensed under the MIT license (see LICENSE) in the main repository.

Re-exports

pub use crate::error::Error;
pub use crate::span::Span;
pub use crate::stream::Stream;

Modules

Traits that allow chaining parser outputs together.

Combinators that allow combining and extending existing parsers.

Utilities for debugging parsers.

Error types, traits and utilities.

Commonly used functions, traits and types.

Parser primitives that accept specific token patterns.

Types and traits that facilitate error recovery.

Recursive parsers (parser that include themselves within their patterns).

Types and traits related to spans.

Token streams and behaviours.

Text-specific parsers and utilities.

Structs

An internal type used to facilitate error prioritisation. You shouldn’t need to interact with this type during normal use of the crate.

Traits

A trait implemented by parsers.