[][src]Crate manger

A performant, low-level, lightweight and intuitive combinatoric parser library.

Manger allows for translation of the intuition developed for Rust's primitive and standard library types into your intuition for using this library. Most of the behaviour is defined with the Consumable trait, which can be easily implemented using the consume_struct and consume_enum macros.

This library is suited for deterministic regular languages. It is optimally used in addition to a predefined syntax. For example, if you have a predefined EBNF, it is really easy to implement the syntax within this crate.

Getting Started

To get started with implementing Consumable on your own traits, I suggest taking a look at the consume_struct or consume_enum documentation. Then you can come back here and look at some common patterns.

Common patterns

Parsing and thus consuming has a lot of often used patterns. Ofcourse, these are very easily available here aswell.

Concatenation

Often we want to express that two patterns follow eachother in a source string. For example, you might want to express that every Line is followed by a ';'. In manger there are two ways to do this.

Macro's

The first way, and the preferred way, is with the consume_struct or consume_enum macros where you can present sequential consume instructions. You can see in the following example that we are first consuming a '(', followed by a i32, followed by a closing ')'.

use manger::{ Consumable, consume_struct };

struct EncasedInteger(i32);
consume_struct!(
    EncasedInteger => [
        > '(',
        value: i32,
        > ')';
        (value)
    ]
);

Tuples

Another way to represent the same concept is with the tuple type syntax. This can be done with up to 10 types. Here we are again parsing the same (i32) structure.

use manger::chars;

type EncasedInteger = (chars::OpenParenthese, i32, chars::CloseParenthese);

Repetition

Most of the time you want to represent some kind of repetition. There are a lot of different way to represent repetition. Here there are two easy ways.

Vec

The easiest way to do repetition is with the Vec<T>. This will consume 0 or more instances of type T. Ofcourse, the type T has have has Consumable implemented. Here you can see how what that looks like:

Since Vec<T> will consume instances of type T until it finds a error, it can never fail itself. You are therefore safe to unwrap the result.

use manger::{ Consumable, consume_struct };

struct EncasedInteger(i32);
consume_struct!(
    EncasedInteger => [
        > '[',
        value: i32,
        > ']';
        (value)
    ]
);

let source = "[3][-4][5]";

let (encased_integers, _) = <Vec<EncasedInteger>>::consume_from(source)?;

let sum: i32 = encased_integers
    .iter()
    .map(|EncasedInteger(value)| value)
    .sum();

assert_eq!(sum, 4);

OneOrMore

The other easy way to do repetition is with OneOrMore<T>. This allows for consuming 1 or more instances of type T. And again, type T has to have Consumable implemented. Here you can see what that looks like:

use manger::{ Consumable, consume_struct };
use manger::common::OneOrMore;

struct EncasedInteger(i32);
consume_struct!(
    EncasedInteger => [
        > '[',
        value: i32,
        > ']';
        (value)
    ]
);

let source = "[3][-4][5]";

let (encased_integers, _) = <OneOrMore<EncasedInteger>>::consume_from(source)?;

let product: i32 = encased_integers
    .into_iter()
    .map(|EncasedInteger(value)| value)
    .product();

assert_eq!(product, -60);

Optional value

To express optional values you can use the Option<T> standard rust type. This will consume either 0 or 1 of type T.

Since Option<T> will consume a instance of type T if it finds no error, it can never fail itself. You are therefore safe to unwrap the result.

use manger::consume_struct;
use manger::chars;

struct PossiblyEncasedInteger(i32);
consume_struct!(
    PossiblyEncasedInteger => [
        : Option<chars::OpenParenthese>,
        value: i32,
        : Option<chars::CloseParenthese>;
        (value)
    ]
);

Recursion

Another common pattern seen within combinatoric parsers is recursion. Since rust types need to have a predefined since, we cannot do direct type recursion and we need to do heap allocation with the [Box<T>][std::box::Box] type from the standard library. We can make a prefixed math expression parser as followed:

use manger::consume_enum;
use manger::common::{OneOrMore, Whitespace};

enum Expression {
    Times(Box<Expression>, Box<Expression>),
    Plus(Box<Expression>, Box<Expression>),
    Constant(u32),
}

consume_enum!(
    Expression {
        Times => [
            > '*',
            : OneOrMore<Whitespace>,
            left: Box<Expression>,
            : OneOrMore<Whitespace>,
            right: Box<Expression>;
            (left, right)
        ],
        Plus => [
            > '+',
            : OneOrMore<Whitespace>,
            left: Box<Expression>,
            : OneOrMore<Whitespace>,
            right: Box<Expression>;
            (left, right)
        ],
        Constant => [
            value: u32;
            (value)
        ]
    }
);

Whitespace

For whitespace we can use the [manger::common::Whitespace] struct. This will consume any utf-8 character that is identified as a whitespace character by the char::is_whitespace function.

Either

If two possibilities are present for consuming there are two options to choose from. Both are valid in certain scenarios.

Macro

Using the consume_enum you can create an struct which can be consuming in a number of options and you can see which option was selected. If you need to see which of the different options was selected, this should be your choice.

Either<L, R>

You can also use the [Either<L, R>][either::Either] type to represent the either relationship. This option is preferred if we do not care about which option is selected.

Modules

chars

Types for consuming individual utf-8 characters.

common

Types for common structures within consuming.

Macros

consume_enum

A macro used for defining the way a enum should be consumed. It will implement Consumable for this enum.

consume_struct

A macro used for defining the way a struct should be consumed. It will implement Consumable for this struct.

Structs

ConsumeError

A list of errors that occured while consuming from a source.

ConsumeIter

Iterator over a source for a Consumable type T.

Enums

ConsumeErrorType

One item in ConsumeError. these can occur while consuming from a `source.

Traits

Consumable

Trait that defines whether a trait can be interpretted for a source string or not. It is the trait that defines most behaviour for manger.

ConsumeSource

Trait that exposes some functions for easier consuming syntax on &str.

SelfConsumable

Trait which allows for consuming of instances and literals from a string.