# aoc-parse
A parser library designed for Advent of Code.
This library mainly provides a macro, `parser!`, that lets you write
a custom parser for your [AoC] puzzle input in seconds.
For example, my puzzle input for [December 2, 2015][example] looked like this:
```
4x23x21
22x29x19
11x4x11
8x10x5
24x18x16
...
```
The parser for this format is a one-liner:
`parser!(lines(u64 "x" u64 "x" u64))`.
## How to use aoc-parse
It's pretty easy.
```rust
use aoc_parse::{parser, prelude::*};
let p = parser!(lines(u64 "x" u64 "x" u64));
assert_eq!(
p.parse("4x23x21\n22x29x19\n").unwrap(),
vec![(4, 23, 21), (22, 29, 19)]
);
```
If you're using [aoc-runner], it might look like this:
```rust
use aoc_runner_derive::*;
use aoc_parse::{parser, prelude::*};
#[aoc_generator(day2)]
fn parse_input(text: &str) -> Vec<(u64, u64, u64)> {
let p = parser!(lines(u64 "x" u64 "x" u64));
p.parse(text).unwrap()
}
```
## Patterns
The argument you need to pass to the `parser!` macro is a *pattern*;
all aoc-parse does is **match** strings against your chosen pattern
and **convert** them into Rust values.
Here are some examples of patterns:
```rust
lines(i32) // matches a list of integers, one per line
// converts them to a Vec<i32>
line(lower+) // matches a single line of one or more lowercase letters
// converts them to a Vec<char>
lines({ // matches lines made up of the characters < = >
"<" => -1, // converts them to a Vec<Vec<i32>> filled with -1, 0, and 1
"=" => 0,
">" => 1
}+)
```
Here are the pieces that you can use in a pattern:
* `i8`, `i16`, `i32`, `i64`, `i128`, `isize` - These match an integer,
written out using decimal digits, with an optional `+` or `-` sign
at the start, like `0` or `-11474`.
It's an error if the string contains a number too big to fit in the
type you chose. For example, `parser!(i8).parse("1000")` is an error.
(It matches the string, but fails during the "convert" phase.)
* `u8`, `u16`, `u32`, `u64`, `u128`, `usize` - The same, but without
the sign.
* `i8_bin`, `i16_bin`, `i32_bin`, `i64_bin`, `i128_bin`, `isize_bin`,
`u8_bin`, `u16_bin`, `u32_bin`, `u64_bin`, `u128_bin`, `usize_bin`,
`i8_hex`, `i16_hex`, `i32_hex`, `i64_hex`, `i128_hex`, `isize_hex`,
`u8_hex`, `u16_hex`, `u32_hex`, `u64_hex`, `u128_hex`, `usize_hex` -
Match an integer in base 2 or base 16. The `_hex` parsers allow both
uppercase and lowercase digits `A`-`F`.
* `bool` - Matches either `true` or `false` and converts it to the
corresponding `bool` value.
* `'x'` or `"hello"` - A Rust character or string, in quotes, is a pattern
that matches that exact text only.
Exact patterns don't produce a value.
* <code><var>pattern1 pattern2 pattern3</var>...</code> - Patterns can be
concatenated to form larger patterns. This is how
`parser!(u64 "x" u64 "x" u64)` matches the string `4x23x21`. It simply
matches each subpattern in order. It converts the match to a tuple if
there are two or more subpatterns that produce values.
* <code><var>parser_var</var></code> - You can use previously defined
parsers that you've stored in local variables.
For example, the `amount` parser below makes use of the `fraction` parser
defined on the previous line.
```
let fraction = parser!(i64 "/" u64);
let amount = parser!(fraction " tsp");
assert_eq!(amount.parse("1/4 tsp").unwrap(), (1, 4));
```
An identifier can also refer to a string or character constant.
Repeating patterns:
* <code><var>pattern</var>*</code> - Any pattern followed by an asterisk
matches that pattern zero or more times. It converts the results to a
`Vec`. For example, `parser!("A"*)` matches the strings `A`, `AA`,
`AAAAAAAAAAAAAA`, and so on, as well as the empty string.
* <code><var>pattern</var>+</code> - Matches the pattern one or more times, producing a `Vec`.
`parser!("A"+)` matches `A`, `AA`, etc., but not the empty string.
* <code><var>pattern</var>?</code> - Optional pattern, producing a Rust `Option`. For
example, `parser!("x=" i32?)` matches `x=123`, producing `Some(123)`;
it also matches `x=`, producing the value `None`.
These behave just like the `*`, `+`, and `?` special characters in
regular expressions.
* <code>repeat_sep(<var>pattern</var>, <var>separator</var>)</code> -
Match the given *pattern* any number of times, separated by the *separator*.
This converts only the bits that match *pattern* to Rust values, producing
a `Vec`. Any parts of the string matched by *separator* are not converted.
Matching single characters:
* `alpha`, `alnum`, `upper`, `lower` - Match single characters of
various categories. (These use the Unicode categories, even though
Advent of Code historically sticks to ASCII.)
* `digit`, `digit_bin`, `digit_hex` - Match a single ASCII character
that's a digit in base 10, base 2, or base 16, respectively.
The digit is converted to its numeric value, as a `usize`.
* `any_char` - Match the next character, no matter what it is (like `.`
in a regular expression, except that `any_char` matches newline
characters too).
* <code>char_of(<var>str</var>)</code> - Match the next character if it's
one of the characters in *str*. For example, `char_of(">^<v")` matches
exactly one character, either `>`, `^`, `<`, or `v`. Returns the index
of the character within the list of options (in this case, `0`, `1`,
`2`, or `3`).
Matching multiple characters:
* <code>string(<var>pattern</var>)</code> - Matches the given *pattern*,
but instead of converting it to some value, simply return the matched
characters as a `String`.
By default, `alpha+` returns a `Vec<char>`, and sometimes that is handy
in AoC, but often it's better to have it return a `String`.
Custom conversion:
* <code>... <var>name1</var>:<var>pattern1</var> ... => <var>expr</var></code> -
On successfully matching the patterns to the left of `=>`, evaluate the Rust
expression *expr* to convert the results to a single Rust value.
Use this to convert input to structs. For instance, suppose your puzzle input
contains each elf's name and height:
```text
Holly=33
Ivy=7
DouglasFir=1093
```
and you'd like to turn this into a vector of `struct Elf` values. The
code you need is:
```
struct Elf {
name: String,
height: u32,
}
let p = parser!(lines(
elf:string(alpha+) '=' ht:u32 => Elf { name: elf, height: ht }
));
```
The name `elf` applies to the pattern `string(alpha+)` and the name
`ht` applies to the pattern `i32`. The bit after the `=>` is
plain old Rust code.
The *name*s are in scope only for the following *expr* in the same
set of matching parentheses or braces.
Alternatives:
* <code>{<var>pattern1</var>, <var>pattern2</var>, ...}</code> -
Matches any one of the *patterns*. First try matching *pattern1*; if it
matches, stop. If not, try *pattern2*, and so on. All the patterns must
produce the same type of Rust value.
This is sort of like a Rust `match` expression.
For example, `parser!({"<" => -1, ">" => 1})` either matches `<`,
returning the value `-1`, or matches `>`, returing `1`.
Alternatives are handy when you want to convert the input into an enum.
For example, my puzzle input for December 23, 2015 was a list of instructions
that looked (in part) like this:
```text
jie a, +4
tpl a
inc a
jmp +2
hlf a
jmp -7
```
This can be easily parsed into a vector of beautiful enums, like so:
```
enum Reg {
A,
B,
}
enum Insn {
Hlf(Reg),
Tpl(Reg),
Inc(Reg),
Jmp(isize),
Jie(Reg, isize),
Jio(Reg, isize),
}
use Reg::*;
use Insn::*;
let reg = parser!({"a" => A, "b" => B});
let p = parser!(lines({
"hlf " r:reg => Hlf(r),
"tpl " r:reg => Tpl(r),
"inc " r:reg => Inc(r),
"jmp " offset:isize => Jmp(offset),
"jie " r:reg ", " offset:isize => Jie(r, offset),
"jio " r:reg ", " offset:isize => Jio(r, offset),
}));
```
Lines and sections:
* <code>line(<var>pattern</var>)</code> - Matches a single line of text that
matches *pattern*, and the newline at the end of the line.
This is like <code>^<var>pattern</var>\n</code> in regular expressions,
with two minor differences:
- <code>line(<var>pattern</var>)</code> will only ever match exactly
one line of text, even if *pattern* could match more newlines.
- If your input does not end with a newline,
<code>line(<var<pattern</var>)</code> can still match the
non-newline-terminated "line" at the end.
`line(string(any_char+))` matches a line of text, strips off the newline
character, and returns the rest as a `String`.
`line("")` matches a blank line.
* <code>lines(<var>pattern</var>)</code> - Matches any number of lines of
text matching *pattern*. Each line must be terminated by a newline, `'\n'`.
Equivalent to <code>line(<var>pattern</var>)*</code>.
```
let p = parser!(lines(repeat_sep(digit, " ")));
assert_eq!(
p.parse("1 2 3\n4 5 6\n").unwrap(),
vec![vec![1, 2, 3], vec![4, 5, 6]],
);
```
* <code>section(<var>pattern</var>)</code> - Matches zero or more nonblank lines,
followed by either a blank line or the end of input. The nonblank lines must match
*pattern*.
`section()` consumes the blank line. *pattern* should not expect to see it.
It's common for an AoC puzzle input to have several lines of data, then
a blank line, and then a different kind of data. You can parse this with
<code>section(<var>p1</var>) section(<var>p2</var>)</code>.
`section(lines(u64))` matches a section that's a list of numbers, one per line.
* <code>sections(<var>pattern</var>)</code> - Matches any number of sections
matching *pattern*. Equivalent to <code>section(<var>pattern</var>)*</code>
Bringing it all together to parse a complex example:
```
let example = "\
Wiring Diagram #1:
a->q->E->z->J
D->f->D
Wiring Diagram #2:
g->r->f
g->B
";
let p = parser!(sections(
line("Wiring Diagram #" usize ":")
lines(repeat_sep(alpha, "->"))
));
assert_eq!(
p.parse(example).unwrap(),
vec![
(1, vec![vec!['a', 'q', 'E', 'z', 'J'], vec!['D', 'f', 'D']]),
(2, vec![vec!['g', 'r', 'f'], vec!['g', 'B']]),
],
);
```
[AoC]: https://adventofcode.com/
[example]: https://adventofcode.com/2015/day/2
[aoc-runner]: https://lib.rs/crates/aoc-runner
License: MIT