aoc_parse/
lib.rs

1//! A parser library designed for Advent of Code.
2//!
3//! This library mainly provides a macro, `parser!`, that lets you write
4//! a custom parser for your [AoC] puzzle input in seconds.
5//!
6//! For example, my puzzle input for [December 2, 2015][example] looked like this:
7//!
8//! ```text
9//! 4x23x21
10//! 22x29x19
11//! 11x4x11
12//! 8x10x5
13//! 24x18x16
14//! ...
15//! ```
16//!
17//! The parser for this format is a one-liner: `parser!(lines(u64 "x" u64 "x" u64))`.
18//!
19//! # How to use aoc-parse
20//!
21//! It's pretty easy.
22//!
23//! ```
24//! use aoc_parse::{parser, prelude::*};
25//!
26//! let p = parser!(lines(u64 "x" u64 "x" u64));
27//! assert_eq!(
28//!     p.parse("4x23x21\n22x29x19\n").unwrap(),
29//!     vec![(4, 23, 21), (22, 29, 19)]
30//! );
31//! ```
32//!
33//! If you're using [aoc-runner], it might look like this:
34//!
35//! ```
36//! use aoc_runner_derive::*;
37//! use aoc_parse::{parser, prelude::*};
38//!
39//! #[aoc_generator(day2)]
40//! fn parse_input(text: &str) -> Vec<(u64, u64, u64)> {
41//!     let p = parser!(lines(u64 "x" u64 "x" u64));
42//!     p.parse(text).unwrap()
43//! }
44//! ```
45//!
46//! # Patterns
47//!
48//! The argument you need to pass to the `parser!` macro is a *pattern*; all aoc-parse does is
49//! **match** strings against your chosen pattern and **convert** them into Rust values.
50//!
51//! Here are some examples of patterns:
52//!
53//! ```
54//! # use aoc_parse::{parser, prelude::*};
55//! # let _p1 = parser!(
56//! lines(i32)      // matches a list of integers, one per line
57//!                 // converts them to a Vec<i32>
58//! # );
59//!
60//! # let _p2 = parser!(
61//! line(lower+)    // matches a single line of one or more lowercase letters
62//!                 // converts them to a Vec<char>
63//! # );
64//!
65//! # let _p3 = parser!(
66//! lines({         // matches lines made up of the characters < = >
67//!     "<" => -1,  // converts them to a Vec<Vec<i32>> filled with -1, 0, and 1
68//!     "=" => 0,
69//!     ">" => 1
70//! }+)
71//! # );
72//! ```
73//!
74//! Here are the pieces that you can use in a pattern:
75//!
76//! ## Basic patterns
77//!
78//! `i8`, `i16`, `i32`, `i64`, `i128`, `isize`, `big_int` - These match an integer, written out
79//! using decimal digits, with an optional `+` or `-` sign at the start, like `0` or `-11474`.
80//!
81//! It's an error if the string contains a number too big to fit in the type you chose. For
82//! example, `parser!(i8).parse("1000")` is an error. (It matches the string, but fails during the
83//! "convert" phase.)
84//!
85//! `big_int` parses a [`num_bigint::BigInt`].
86//!
87//! `u8`, `u16`, `u32`, `u64`, `u128`, `usize`, `big_uint` - The same, but without the sign.
88//!
89//! `i8_bin`, `i16_bin`, `i32_bin`, `i64_bin`, `i128_bin`, `isize_bin`, `big_int_bin`,
90//! `u8_bin`, `u16_bin`, `u32_bin`, `u64_bin`, `u128_bin`, `usize_bin`, `big_uint_bin`,
91//! `i8_hex`, `i16_hex`, `i32_hex`, `i64_hex`, `i128_hex`, `isize_hex`, `big_int_hex`,
92//! `u8_hex`, `u16_hex`, `u32_hex`, `u64_hex`, `u128_hex`, `usize_hex`, `big_uint_hex` -
93//! Match an integer in base 2 or base 16. The `_hex` parsers allow both uppercase and lowercase
94//! digits `A`-`F`.
95//!
96//! `f32` ,`f64` - These match a floating-point number written out using decimal digits, in [this
97//! format](https://doc.rust-lang.org/std/primitive.f64.html#impl-FromStr-for-f64). (No Advent of
98//! Code puzzle has ever hinged on floating-point numbers, but it doesn't hurt to be prepared.)
99//!
100//! `bool` - Matches either `true` or `false` and converts it to the corresponding `bool` value.
101//!
102//! `'x'` or `"hello"` - A Rust character or string, in quotes, is a pattern that matches that
103//! exact text only.
104//!
105//! Exact patterns don't produce a value.
106//!
107//! <code><var>pattern1 pattern2 pattern3</var>...</code> - Patterns can be concatenated to form
108//! larger patterns. This is how `parser!(u64 "x" u64 "x" u64)` matches the string `4x23x21`. It
109//! simply matches each subpattern in order. It converts the match to a tuple if there are two or
110//! more subpatterns that produce values.
111//!
112//! <code><var>parser_var</var></code> - You can use previously defined parsers that you've stored
113//! in local variables.
114//!
115//! For example, the `amount` parser below makes use of the `fraction` parser defined on the
116//! previous line.
117//!
118//! ```
119//! # use aoc_parse::{parser, prelude::*};
120//! let fraction = parser!(i64 "/" u64);
121//! let amount = parser!(fraction " tsp");
122//!
123//! assert_eq!(amount.parse("1/4 tsp").unwrap(), (1, 4));
124//! ```
125//!
126//! An identifier can also refer to a string or character constant.
127//!
128//! ## Repeating patterns
129//!
130//! <code><var>pattern</var>*</code> - Any pattern followed by an asterisk matches that pattern
131//! zero or more times. It converts the results to a `Vec`. For example, `parser!("A"*)` matches
132//! the strings `A`, `AA`, `AAAAAAAAAAAAAA`, and so on, as well as the empty string.
133//!
134//! <code><var>pattern</var>+</code> - Matches the pattern one or more times, producing a `Vec`.
135//! `parser!("A"+)` matches `A`, `AA`, etc., but not the empty string.
136//!
137//! <code><var>pattern</var>?</code> - Optional pattern, producing a Rust `Option`. For example,
138//! `parser!("x=" i32?)` matches `x=123`, producing `Some(123)`; it also matches `x=`, producing
139//! the value `None`.
140//!
141//! These behave just like the `*`, `+`, and `?` special characters in regular expressions.
142//!
143//! <code>repeat_sep(<var>pattern</var>, <var>separator</var>)</code> - Match the given *pattern*
144//! any number of times, separated by the *separator*. This converts only the bits that match
145//! *pattern* to Rust values, producing a `Vec`. Any parts of the string matched by *separator* are
146//! not converted.
147//!
148//! ## Matching single characters
149//!
150//! `alpha`, `alnum`, `upper`, `lower` - Match single characters of various categories. (These use
151//! the Unicode categories, even though Advent of Code historically sticks to ASCII.)
152//!
153//! `digit`, `digit_bin`, `digit_hex` - Match a single ASCII character that's a digit in base 10,
154//! base 2, or base 16, respectively. The digit is converted to its numeric value, as a `usize`.
155//!
156//! `any_char` - Match the next character, no matter what it is (like `.` in a regular expression,
157//! except that `any_char` matches newline characters too).
158//!
159//! <code>char_of(<var>str</var>)</code> - Match the next character if it's one of the characters
160//! in *str*. For example, `char_of(">^<v")` matches exactly one character, either `>`, `^`, `<`,
161//! or `v`. Returns the index of the character within the list of options (in this case, `0`, `1`,
162//! `2`, or `3`).
163//!
164//! ## Matching multiple characters
165//!
166//! <code>string(<var>pattern</var>)</code> - Matches the given *pattern*, but instead of
167//! converting it to some value, simply return the matched characters as a `String`.
168//!
169//! By default, `alpha+` returns a `Vec<char>`, and sometimes that is handy in AoC, but often it's
170//! better to have it return a `String`.
171//!
172//! ## Custom conversion
173//!
174//! <code>... <var>name1</var>:<var>pattern1</var> ... => <var>expr</var></code> - On successfully
175//! matching the patterns to the left of `=>`, evaluate the Rust expression *expr* to convert the
176//! results to a single Rust value.
177//!
178//! Use this to convert input to structs. For instance, suppose your puzzle input contains each
179//! elf's name and height:
180//!
181//! ```text
182//! Holly=33
183//! Ivy=7
184//! DouglasFir=1093
185//! ```
186//!
187//! and you'd like to turn this into a vector of `struct Elf` values. The code you need is:
188//!
189//! ```
190//! # use aoc_parse::{parser, prelude::*};
191//! struct Elf {
192//!     name: String,
193//!     height: u32,
194//! }
195//!
196//! let p = parser!(lines(
197//!     elf:string(alpha+) '=' ht:u32 => Elf { name: elf, height: ht }
198//! ));
199//! ```
200//!
201//! The name `elf` applies to the pattern `string(alpha+)` and the name `ht` applies to the pattern
202//! `i32`. The bit after the `=>` is plain old Rust code.
203//!
204//! The *name*s are in scope only for the following *expr* in the same set of matching parentheses
205//! or braces.
206//!
207//! ## Alternatives
208//!
209//! <code>{<var>pattern1</var>, <var>pattern2</var>, ...}</code> - Matches any one of the
210//! *patterns*. First try matching *pattern1*; if it matches, stop. If not, try *pattern2*, and so
211//! on. All the patterns must produce the same type of Rust value.
212//!
213//! This is sort of like a Rust `match` expression.
214//!
215//! For example, `parser!({"<" => -1, ">" => 1})` either matches `<`, returning the value `-1`, or
216//! matches `>`, returing `1`.
217//!
218//! Alternatives are handy when you want to convert the input into an enum. For example, my puzzle
219//! input for December 23, 2015 was a list of instructions that looked (in part) like this:
220//!
221//! ```text
222//! jie a, +4
223//! tpl a
224//! inc a
225//! jmp +2
226//! hlf a
227//! jmp -7
228//! ```
229//!
230//! This can be easily parsed into a vector of beautiful enums, like so:
231//!
232//! ```
233//! # use aoc_parse::{parser, prelude::*};
234//! enum Reg {
235//!     A,
236//!     B,
237//! }
238//!
239//! enum Insn {
240//!     Hlf(Reg),
241//!     Tpl(Reg),
242//!     Inc(Reg),
243//!     Jmp(isize),
244//!     Jie(Reg, isize),
245//!     Jio(Reg, isize),
246//! }
247//!
248//! use Reg::*;
249//! use Insn::*;
250//!
251//! let reg = parser!({"a" => A, "b" => B});
252//! let p = parser!(lines({
253//!     "hlf " r:reg => Hlf(r),
254//!     "tpl " r:reg => Tpl(r),
255//!     "inc " r:reg => Inc(r),
256//!     "jmp " offset:isize => Jmp(offset),
257//!     "jie " r:reg ", " offset:isize => Jie(r, offset),
258//!     "jio " r:reg ", " offset:isize => Jio(r, offset),
259//! }));
260//! ```
261//!
262//! ## Rule sets
263//!
264//! <code>rule <var>name1</var>: <var>type1</var> = <var>pattern1</var>;</code> - Introduce a
265//! "rule", a named subparser.
266//!
267//! This supports parsing text with nesting parentheses or brackets.
268//!
269//! ```
270//! # use aoc_parse::{parser, prelude::*};
271//! enum Formation {
272//!     Elf(char),
273//!     Stack(Vec<Formation>),
274//! }
275//!
276//! let p = parser!(
277//!     // First rule: A "formation" has return type Formation and is either
278//!     // a letter or a stack.
279//!     rule formation: Formation = {
280//!         s:alpha => Formation::Elf(s),
281//!         v:stack => Formation::Stack(v),
282//!     };
283//!
284//!     // Second rule: A "stack" is one or more formations, wrapped in
285//!     // matching parentheses.
286//!     rule stack: Vec<Formation> = '(' v:formation+ ')' => v;
287//!
288//!     // After all rules, the pattern that .parse() will actually match.
289//!     lines(formation+)
290//! );
291//!
292//! assert!(p.parse("px(fo(i)(RR(c)))j(Q)zww\n").is_ok());
293//!
294//! assert!(p.parse("x(fo))\n").is_err());  // parens not balanced
295//! ```
296//!
297//! Ordinarily `let` suffices for parsers used by other parsers; but `rule` is needed for parsers
298//! that refer to themselves or to each other, cyclically, like `formation` and `stack` above.
299//! Rust's `let` doesn't support that.
300//!
301//! Note: Left-recursive grammars don't work, as usual for PEG parsers.
302//!
303//! ## Lines and sections
304//!
305//! <code>line(<var>pattern</var>)</code> - Matches a single line of text that matches *pattern*,
306//! and the newline at the end of the line.
307//!
308//! This is like <code>^<var>pattern</var>\n</code> in regular expressions, with two minor
309//! differences:
310//!
311//! -   <code>line(<var>pattern</var>)</code> will only ever match exactly one line of text, even
312//!     if *pattern* could match more newlines.
313//!
314//! -   If your input does not end with a newline, <code>line(<var<pattern</var>)</code> can still
315//!     match the non-newline-terminated "line" at the end.
316//!
317//! `line(string(any_char+))` matches a line of text, strips off the newline character, and returns
318//! the rest as a `String`. `line("")` matches a blank line.
319//!
320//! <code>lines(<var>pattern</var>)</code> - Matches any number of lines of text matching
321//! *pattern*. Equivalent to <code>line(<var>pattern</var>)*</code>.
322//!
323//! ```
324//! # use aoc_parse::{parser, prelude::*};
325//! let p = parser!(lines(repeat_sep(digit, " ")));
326//! assert_eq!(
327//!     p.parse("1 2 3\n4 5 6\n").unwrap(),
328//!     vec![vec![1, 2, 3], vec![4, 5, 6]],
329//! );
330//! ```
331//!
332//! <code>section(<var>pattern</var>)</code> - Matches zero or more nonblank lines, followed by
333//! either a blank line or the end of input. The nonblank lines must match *pattern*. For example,
334//! `section(lines(u64))` matches a section that's a list of numbers, one per line.
335//!
336//! It's common for an AoC puzzle input to have several lines of data, then a blank line, and then
337//! a different kind of data. You can parse this with
338//! <code>section(<var>p1</var>) section(<var>p2</var>)</code>.
339//!
340//! <code>sections(<var>pattern</var>)</code> - Matches any number of sections matching *pattern*.
341//! Equivalent to <code>section(<var>pattern</var>)*</code>.
342//!
343//! ## Collections
344//!
345//! <code>hash_set(<var>pattern</var>)</code>, <code>hash_map(<var>pattern</var>)</code>,
346//! <code>btree_set(<var>pattern</var>)</code>, <code>btree_map(<var>pattern</var>)</code>,
347//! <code>vec_deque(<var>pattern</var>)</code> - These match some text using *pattern*, then put
348//! the resulting values in a `HashSet` or other collection.
349//!
350//! The *pattern* must produce an [iterable](std::iter::IntoIterator) type. These functions work by
351//! calling `.into_iter()` on whatever *pattern* produces, then using `.collect()` to produce the
352//! new collection.
353//!
354//! The *pattern* itself needs a `*` or `+`, or something else that makes it match multiple values:
355//!
356//! ```
357//! # use std::collections::HashSet;
358//! # use aoc_parse::{parser, prelude::*};
359//! let p = parser!(hash_set(digit+));  // <-- note the `+`
360//! assert_eq!(p.parse("3127").unwrap(), HashSet::from([1, 2, 3, 7]));
361//! ```
362//!
363//! A map is built from a sequence of pairs:
364//!
365//! ```
366//! # use std::collections::HashMap;
367//! # use aoc_parse::{parser, prelude::*};
368//! let p = parser!(hash_map(
369//!     lines(string(alpha+) ": " any_char)   // <-- this produces a vector of (String, char) pairs
370//! ));
371//!
372//! assert_eq!(
373//!     p.parse("Midge: @\nToyler: #\nKnitley: &\n").unwrap(),
374//!     HashMap::from([
375//!         ("Midge".to_string(), '@'),
376//!         ("Toyler".to_string(), '#'),
377//!         ("Knitley".to_string(), '&'),
378//!     ]),
379//! );
380//! ```
381//!
382//! ----
383//!
384//! Bringing it all together to parse a complex example:
385//!
386//! ```
387//! # use aoc_parse::{parser, prelude::*};
388//! let example = "\
389//! Wiring Diagram #1:
390//! a->q->E->z->J
391//! D->f->D
392//!
393//! Wiring Diagram #2:
394//! g->r->f
395//! g->B
396//! ";
397//!
398//! let p = parser!(sections(
399//!     line("Wiring Diagram #" usize ":")
400//!     lines(repeat_sep(alpha, "->"))
401//! ));
402//! assert_eq!(
403//!     p.parse(example).unwrap(),
404//!     vec![
405//!         (1, vec![vec!['a', 'q', 'E', 'z', 'J'], vec!['D', 'f', 'D']]),
406//!         (2, vec![vec!['g', 'r', 'f'], vec!['g', 'B']]),
407//!     ],
408//! );
409//! ```
410//!
411//! [AoC]: https://adventofcode.com/
412//! [example]: https://adventofcode.com/2015/day/2
413//! [aoc-runner]: https://lib.rs/crates/aoc-runner
414
415#![deny(missing_docs)]
416
417mod context;
418mod error;
419#[doc(hidden)]
420pub mod macros;
421mod parsers;
422#[cfg(test)]
423mod testing;
424mod traits;
425mod types;
426mod util;
427
428pub use context::{ParseContext, Reported};
429pub use error::ParseError;
430use error::Result;
431pub use traits::{ParseIter, Parser};
432
433/// A giant sack of toys and goodies to import along with `parser!`.
434///
435/// The `parser!()` macro will work fine without this, so you can explicitly
436/// import the names you want to use instead of doing `use aoc_parse::{parser,
437/// prelude::*};`.
438///
439/// This includes some constants that have the same name as a built-in Rust
440/// type: `i32`, `usize`, `bool`, and so on. There's no conflict because Rust
441/// types and constants live in separate namespaces.
442pub mod prelude {
443    pub use crate::traits::Parser;
444
445    pub use crate::util::aoc_parse;
446
447    pub use crate::parsers::{
448        alnum, alpha, any_char, big_int, big_int_bin, big_int_hex, big_uint, big_uint_bin,
449        big_uint_hex, bool, btree_map, btree_set, char_of, digit, digit_bin, digit_hex, f32, f64,
450        hash_map, hash_set, i128, i128_bin, i128_hex, i16, i16_bin, i16_hex, i32, i32_bin, i32_hex,
451        i64, i64_bin, i64_hex, i8, i8_bin, i8_hex, isize, isize_bin, isize_hex, lower, u128,
452        u128_bin, u128_hex, u16, u16_bin, u16_hex, u32, u32_bin, u32_hex, u64, u64_bin, u64_hex,
453        u8, u8_bin, u8_hex, upper, usize, usize_bin, usize_hex, vec_deque,
454    };
455
456    pub use crate::parsers::{line, lines, repeat_sep, section, sections};
457
458    /// Parse using `parser`, but instead of converting the matched text to a
459    /// Rust value, simply return it as a `String`.
460    ///
461    /// By default, `parser!(alpha+)` returns a `Vec<char>`, and sometimes that
462    /// is handy in AoC, but often it's better to have it return a `String`.
463    /// That can be done with `parser!(string(alpha+))`.
464    pub fn string<P: Parser>(parser: P) -> crate::parsers::StringParser<P> {
465        crate::parsers::StringParser { parser }
466    }
467}