aoc-parse 0.1.5

# aoc-parse

A parser library designed for Advent of Code.

This library mainly provides a macro, `parser!`, that lets you write
a custom parser for your [AoC] puzzle input in seconds.

For example, my puzzle input for [December 2, 2015][example] looked like this:

```
4x23x21
22x29x19
11x4x11
8x10x5
24x18x16
...
```

The parser for this format is a one-liner:
`parser!(lines(u64 "x" u64 "x" u64))`.

## How to use aoc-parse

**If you are NOT using [aoc-runner],** you can use aoc-parse like this:

```rust
use aoc_parse::{parser, prelude::*};

let p = parser!(lines(u64 "x" u64 "x" u64));
assert_eq!(
    p.parse("4x23x21\n22x29x19\n").unwrap(),
    vec![(4, 23, 21), (22, 29, 19)]
);
```

**If you ARE using aoc-runner,** do this instead:

```rust
use aoc_runner_derive::*;

#[aoc_generator(day2)]
fn parse_input(text: &str) -> anyhow::Result<Vec<(u64, u64, u64)>> {
    use aoc_parse::{parser, prelude::*};
    let p = parser!(lines(u64 "x" u64 "x" u64));
    aoc_parse(text, p)
}

assert_eq!(
    parse_input("4x23x21\n22x29x19").unwrap(),
    vec![(4, 23, 21), (22, 29, 19)]
);
```

## Patterns

The argument you need to pass to the `parser!` macro is a *pattern*;
all aoc-parse does is **match** strings against your chosen pattern
and **convert** them into Rust values.

Here are the pieces that you can use in a pattern:

*   `i8`, `i16`, `i32`, `i64`, `i128`, `isize` - These match an integer,
    written out using decimal digits, with an optional `+` or `-` sign
    at the start, like `0` or `-11474`.

    It's an error if the string contains a number too big to fit in the
    type you chose. For example, `parser!(i8).parse("1000")` is an error.
    (It matches the string, but fails during the "convert" phase.)

*   `u8`, `u16`, `u32`, `u64`, `u128`, `usize` - The same, but without
    the sign.

*   `i8_bin`, `i16_bin`, `i32_bin`, `i64_bin`, `i128_bin`, `isize_bin`,
    `u8_bin`, `u16_bin`, `u32_bin`, `u64_bin`, `u128_bin`, `usize_bin`,
    `i8_hex`, `i16_hex`, `i32_hex`, `i64_hex`, `i128_hex`, `isize_hex`,
    `u8_hex`, `u16_hex`, `u32_hex`, `u64_hex`, `u128_hex`, `usize_hex` -
    Match an integer in base 2 or base 16. The `_hex` parsers allow both
    uppercase and lowercase digits `A`-`F`.

*   `bool` - Matches either `true` or `false` and converts it to the
    corresponding `bool` value.

*   `alpha`, `alnum`, `upper`, `lower` - Match single characters of
    various categories. (These use the Unicode categories, even though
    Advent of Code historically sticks to ASCII.)

*   `digit`, `digit_bin`, `digit_hex` - Match a single ASCII character
    that's a digit in base 10, base 2, or base 16, respectively.
    The digit is converted to its numeric value, as a `usize`.

*   `any_char`: Match the next character, no matter what it is (like `.`
    in a regular expression, except that `any_char` matches newline
    characters).

*   `"x"` - A Rust string, in quotes, is a pattern that matches that exact
    string only.

    Exact patterns don't produce a value.

*   <code><var>pattern1 pattern2 pattern3</var>...</code> - Patterns can be
    concatenated to form larger patterns. This is how
    `parser!(u64 "x" u64 "x" u64)` matches the string `4x23x21`. It simply
    matches each subpattern in order. It converts the match to a tuple if
    there are two or more subpatterns that produce values.

*   <code><var>parser_var</var></code> - You can use previously defined
    parsers that you've stored in local variables.

    For example, the `amount` parser below makes use of the `fraction` parser
    defined on the previous line.

    ```
    let fraction = parser!(i64 "/" u64);
    let amount = parser!(fraction " tsp");

    assert_eq!(amount.parse("1/4 tsp").unwrap(), (1, 4));
    ```

*   <code>string(<var>pattern</var>)</code> - Matches the given *pattern*,
    but instead of converting it to some value, simply return the matched
    characters as a `String`.

    By default, `alpha+` returns a `Vec<char>`, and sometimes that is handy
    in AoC, but often it's better to have it return a `String`.

Repeating patterns:

*   <code><var>pattern</var>*</code> - Any pattern followed by an asterisk
    matches that pattern zero or more times. It converts the results to a
    `Vec`. For example, `parser!("A"*)` matches the strings `A`, `AA`,
    `AAAAAAAAAAAAAA`, and so on, as well as the empty string.

*   <code><var>pattern</var>+</code> - Matches the pattern one or more times, producing a `Vec`.
    `parser!("A"+)` matches `A`, `AA`, etc., but not the empty string.

*   <code><var>pattern</var>?</code> - Optional pattern, producing a Rust `Option`. For
    example, `parser!("x=" i32?)` matches `x=123`, producing `Some(123)`;
    it also matches `x=`, producing the value `None`.

    These behave just like the `*`, `+`, and `?` special characters in
    regular expressions.

*   <code>repeat_sep(<var>pattern</var>, <var>separator</var>)</code> -
    Match the given *pattern* any number of times, separated by the *separator*.
    This converts only the bits that match *pattern* to Rust values, producing
    a `Vec`. Any parts of the string matched by *separator* are not converted.

Custom conversion:

*   <code>... (<var>name1</var>: <var>pattern1</var>) ... => <var>expr</var></code> -
    On successfully matching the patterns to the left of `=>`, evaluate the Rust
    expression *expr* to convert the results to a single Rust value.

    Use this to convert input to structs or enums. For example, suppose we have
    input that looks like `(3,66)-(27,8)` and we want to produce these structs:

    ```
    #[derive(Debug, PartialEq)]
    struct Point(i64, i64);

    #[derive(Debug, PartialEq)]
    struct Line {
        p1: Point,
        p2: Point,
    }
    ```

    The patterns we need are:

    ```
    let point = parser!("(" (x: i64) "," (y: i64) ")" => Point(x, y));
    let line_parser = parser!((p1: point) "-" (p2: point) => Line { p1, p2 });

    assert_eq!(
        line_parser.parse("(3,66)-(27,8)").unwrap(),
        Line { p1: Point(3, 66), p2: Point(27, 8) },
    );
    ```

Patterns with two or more alternatives:

*   <code>{<var>pattern1</var>, <var>pattern2</var>, ...}</code> -
    First try matching *pattern1*; if it matches, stop. If not, try
    *pattern2*, and so on. All the patterns must produce the same type of
    Rust value.

    For example, `parser!({"<" => -1, ">" => 1})` either matches `<`,
    returning the value `-1`, or matches `>`, returing `1`.

Lines and sections:

*   <code>line(<var>pattern</var>)</code> - Matches a single line of text that
    matches *pattern*, and the newline at the end of the line.

    This is like <code>^<var>pattern</var>\n</code> in regular expressions,
    except <code>line(<var>pattern</var>)</code> will only ever match exactly
    one line of text, even if *pattern* could match more newlines.

    `line(string(any_char+))` matches a line of text, strips off the newline
    character, and returns the rest as a `String`.

    `line("")` matches a blank line.

*   <code>lines(<var>pattern</var>)</code> - Matches any number of lines of
    text matching *pattern*. Each line must be terminated by a newline, `'\n'`.

    Equivalent to <code>line(<var>pattern</var>)*</code>.

[AoC]: https://adventofcode.com/
[example]: https://adventofcode.com/2015/day/2
[aoc-runner]: https://lib.rs/crates/aoc-runner

License: MIT