aoc-parse 0.2.17

A little library for parsing your Advent of Code puzzle input
Documentation
The basic design tensions here are as follows:

-   "Macros reduce visual clutter" vs. "Macros are opaque and angrymaking".

    I decided very early on to use macros to the fullest. This might have been
    a mistake.

-   "Core parsers should be type-ignorant" vs. "We want `=>` to smoothly
    integrate the parser DSL with Rust code".

    It would be great if the parser-language didn't have to know about types,
    e.g. `lines(int)` parses anything matching `/(-?(0|[1-9]\d+)\n)*/` and
    then if you want an i64 out of that, it's easy to get. This would make
    the parser-language less tied to Rust.

    However `=>` ruins all that, and I think it's indispensible. So I chose to
    make the parser-language as strongly typed as Rust stuff usually is, which
    has been a difficult road.

-   "Parser types should follow the lead of Rust's `Iterator` and `Future` traits"
    vs. "these types are insane can we chill".
    
    So far I've followed the lead of `Iterator`. Lifetimes make it all a little
    weird.

-   Standard library: oodles of built-in parsers and combinators vs.
    learnability.

----

I designed a language first, and decided to figure out how to implement it
later.

The problem is that it's clearly not implementable as designed. It requires
some very clever type inference that I don't know how to do; you'd have to do
lots of magic at macro time and generate type-expressions that do the rest of
the work at typeck time.

Now I'm not totally sure what to do. I think I might be able to do somewhat
better but I'm not sure.

There are lots of puzzles.

- how to figure out the types of labeled expressions so that `=>` works right

- how to create structs at run time

Also I kept making decisions and then going back on them.

-   I decided to keep the types super dynamic, but then I couldn't figure out
    how to do `=>`, almost at all, and just abandoned that and implemented an
    incredibly static system that's hard to maintain.

Types must be known at macro time because many computations, including almost
anything to do with tuples, are impossible in the type system (without
specialization).

Now, to do it right I think the macro has to have a database of all the types
of everything. But it can't, because of `=>` (the type is not inferred until
run time). There is therefore literally no way to get that type and reason
about it.

I will try making a list of the problems.


## Types and special cases

The type parsed by a string-literal, like `parser!("foo")`, should be `()`.

Any time an optional phrase would parse as `Option<()>`, it should instead use
`bool`.

Any time a repeated phrase would parse as `Vec<char>`, it should instead use
`String`.

It's hard to implement rules like this statically in Rust due to the lack of
specialization. I plan to make most parsers return a tuple type; maybe others
can be made to return a special non-tuple type to indicate that they have
special behavior when modified with `?` or `*`.

(In the end I decided against this, in part because `Vec<char>` has ergonomic
advantages for the programmer over `String` -- random access to characters!
Crazy.)


## Macro-expand time vs type-check time

A tension emerged between what can be known at macro-expand time vs. what can
be known at typeck time.

**Macros can't know types,** because I let users define local variables and
functions outside of `parser!` and then use them inside `parser!`.

Suppose we have `const SP: char = ' ';`. Then `parser!(u32 ' ' u32)` and
`parser!(u32 SP u32)` expand to effectively the same parser. But have an Output
type of `(u32, u32)`.

But consider how `=>` has to be expanded. If we add labels and a mapper,

    let p1 = parser!((x: u32) ' ' (y: u32) => Point(x, y));
    let p2 = parser!((x: u32) SP (y: u32) => Point(x, y));

this currently expands to something like

    let p1 = parser!(u32 ' ' u32).map(|(x, y)| Point(x, y));
    let p2 = parser!(u32 SP u32).map(|(x, _, y)| Point(x, y));

This is a bug.

The problem is exactly the DWIM that we do in ignoring exact-match parts of the
pattern when building the output type.

To fix this bug, I need a dumber SequenceParser.

    let p1 = obtuse_sequence((u32, ' ', u32)).map(|(x, _, y)| Point(x, y));
    let p2 = obtuse_sequence((u32, SP, u32)).map(|(x, _, y)| Point(x, y));

Note that the ordinary behavior 




I would like `parser!(term1 ... termN)` to compile to

`map(seq(term1, seq(..., termN)), TupleMapperN)`

where `TupleMapperN` is a type that can map `(T1, (T2, (T3, ... TN)))` to `(T1,
T2, ..., TN)`. But there is a problem which is this doesn't take into account
exact-string and exact-char parsers which should generate nothing.



## The environment

A parser-expression occurs in an environment that might include

-   Built-in functions and parsers (`lines`, `uint`)

-   Labeled expressions (`(j: uint) => j + 1`)

-   User-defined parsers (`rule`)

This interacts with types and it gets tricky.

Questions:

-   Is the set of built-in parsers open (you can define your own parsers in
    Rust and simply import them into the module where your `parser!()` is) or
    closed (you get my built-ins and that's it). After all it's a toy language.

-   Suppose a `parser!()` is inside a function. Are local variables in scope
    for the `=>` mapping-expressions in that parser? It's possible to
    rule that out by wrapping those expressions, or all the generated code for
    the parser, in a `fn` or `mod`.

    If they are in scope, are they `move`? Lifetimes make it difficult to
    support capturing locals by reference.


## Timing of mappers

Are mapping-expressions (`=>`) evaluated during parsing or only after a
successful parse? A parser that can construct output on the fly can be faster;
we claim not to care about speed, though.

One factor here is backtracking. If we evaluate during parsing, that means
possibly evaluating many times, and certainly means evaluating in some cases
where parsing will ultimately fail. This exposes internal details of the parser
as user-visible behavior.


## Timing and types

There are three times types might be figured out:

1.  At macro-expand time
2.  At type-check time ("statically")
3.  At run time ("dynamically")

My original plan was to do it all dynamically. This sounds easy, and it
involves giving up things that feel OK to give up:

-   Compile-time type checking isn't possible if you never expose types to the
    compiler. But that's OK because I can generate better error messages at run
    time anyway.

-   It might be slow, but that's OK.

But...

The catch is `=>`. There's no ad hoc polymorphism in Rust, nor reflection, nor
any dynamic analogue to types (except `std::any::TypeId`). Thus, in a parser like

```
parser!((x: alpha+) " " (y: uint) => (y, x))
```

there is no way to put the expression `(y, x)` into Rust code, and feed it
values `x` and `y`, without the Rust code somehow knowing the types of `alpha`
and `uint`, or arranging things so that they can be inferred somehow.

This seems very difficult. Many expressions in rust, including maybe all method
calls, require the type to be known a priori. Getting the mapper to know the
type, if we don't already know it at macro-expand time, seems impossible.

I don't know how often I need to know the types of things.

Certainly it is necessary to know them...

1.  When passing a labeled value to a mapper, e.g.

    ```
    parser!((x: banana) " " (y: gorilla) => f(x, y))
    ```
    
    Here it is necessary to know the type parsed by `banana`
    in order to build the closure `|x: Food, y: Animal| f(x, y)`.
    
    BUT perhaps we can solve this by skipping the annotations when we don't
    know: `|x, y: Animal| f(x, y)`. This often works, as in this case where
    Rust can infer the type of `x` from the type of `f`. When it doesn't work,
    users can add an annotation, like:
    
    ```
    parser!((x: banana as Food) " " (y: gorilla) => f(x, y))
    ```

    or maybe:
    
    ```
    parser!(
        (x: banana) " " (y: gorilla) => {
            let x: Food = x;
            f(x, y)
        }
    )
    ```

2.  When we convert from static types to dynamic types, in order to attach type
    information (more than just a type-id) to the dynamic value.

    BUT perhaps it would be possible in this situation to convert to serde's
    data model instead.

A special case arises where the output of one `=>` is used as the input to another,
as in `parser!((x:uint => x<<q)

The solution may be simply to look them up on demand.

## Ways to implement `=>`

Take the example above:

```
parser!((x: banana) " " (y: gorilla) => f(x, y))
```

What could this possibly expand to? Keep in mind there's no `type_of` macro in
Rust, so we have to stick to constructs where Rust's type inference will work.

```
    banana
        .and_then(parser!(" "))
        .and_then(gorilla)
        .map(|x, _, y| f(x, y))
```

```
    SeqParser((banana, parser!(" "), gorilla))
        .map(|(x, _, y)| f(x, y))
```

```
    SequenceParser::new(banana, SequenceParser::new(parser!(" "), gorilla))
        .map(|(x, ((), y))| f(x, y))
```

The last seems most likely, although we will probably rearrange the types
later. This means we need a macro to expand a parser-expression to a pattern of
the right type, like `(x, ((), y))`.