Crate inpt

Source
Expand description

Inpt is a derive crate for dumb type-level text parsing.

§Introduction

Imagine you need to chop up an annoying string and convert all the bits to useful types. You could write that sort of code by hand using split and from_str, but the boiler-plate of unwrapping and checking quickly looses all charm. Especially since that sort of parsing shows up a lot in timed programming competitions like advent of code.

Inpt tries to write that sort of parsing code for you, automatically splitting input strings based on field types and an optional regex. Inpt is absolutely not performant, strict, or formal. Whenever possible, it does the obvious thing:

#[inpt::main]
fn main(x: f32, y: f32) {
    println!("{}", x * y);
}
$ echo '6,7' | cargo run
42

§Contents

§Example

use inpt::{Inpt, inpt};

#[derive(Inpt)]
#[inpt(regex = r"(.)=([-\d]+)\.\.([-\d]+),?")]
struct Axis {
    name: char,
    start: i32,
    end: i32,
}

#[derive(Inpt)]
#[inpt(regex = "target area:")]
struct Target {
    #[inpt(after)]
    axes: Vec<Axis>,
}

impl Target {
    fn area(&self) -> i32 {
        self.axes.iter().map(|Axis { start, end, ..}| end - start).product()
    }
}


let target = inpt::<Target>("target area: x=119..176, y=-114..84").unwrap();
assert_eq!(target.area(), 11286);

§Struct Syntax

The Inpt derive macro can do a few neat tricks, listed here. In its default setting, the fields of the struct are parsed in order, with each field consuming as much of the input as possible before moving on:

#[derive(Inpt, Debug, PartialEq)]
struct OrderedFields<'s>(char, i32, &'s str);

assert_eq!(
    inpt::<OrderedFields>("A113 is a classroom").unwrap(),
    OrderedFields('A', 113, "is a classroom"),
)

This behavior is also implemented for arrays, tuples, and a number of collection types.

§regex

When the #[inpt(regex = r".*")] struct attribute is given, the fields are no longer parsed one after the another. Instead the regex is matched against the remaining input, and the fields are parsed from the regex’s numbered capture groups. I recommend that regexes are given as raw strings to avoid double-escapes and quoting.

#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"(.*) number ([a-zA-Z])(\d+)")]
struct RegexFields<'s>(&'s str, char, i32);

assert_eq!(
    inpt::<RegexFields>("classroom number A113").unwrap(),
    RegexFields("classroom", 'A', 113),
)

Ungreedy/lazy repetitions can be very useful when splitting inputs. Like rewriting a while loop as an until loop, a regex ([^!]*)! can be rewritten as (.*?)!. This is particularly helpful when we want to stop after finding multiple characters, like the 3 quotes that end a multi-line string in Python or Julia: """(.*?)""".

Be aware that when such a regex is used multiple times to parse a sequence of fields, the last regex match is forced to parse all remaining input, even if normally lazy:

#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"(.+?),")]
struct Part<'s>(&'s str);

assert_eq!(
    inpt::<[Part; 3]>("my,list,of,many,words,").unwrap(),
    [Part("my"), Part("list"), Part("of,many,words")],
)

§from, try_from

When the #[inpt(from = "T")] or #[inpt(try_from = "T")] struct attributes are given, T is parsed instead of the struct itself, and the From or TryFrom traits are used to convert.

use inpt::split::{Group, Line};

#[derive(Inpt)]
#[inpt(try_from = "Group<Vec<Line<Vec<T>>>>")]
struct Grid<T> {
    width: usize,
    table: Vec<T>,
}

#[derive(Debug)]
struct UnevenGridError;
impl fmt::Display for UnevenGridError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.write_str("grid rows must have even length")
    }
}
impl Error for UnevenGridError {}

impl<'s, T> TryFrom<Group<Vec<Line<Vec<T>>>>> for Grid<T> {
    type Error = UnevenGridError;

    fn try_from(Group { inner: lines }: Group<Vec<Line<Vec<T>>>>)
            -> Result<Self, Self::Error>
    {
        let mut width = None;
        let mut table = Vec::new();
        for Line { inner: mut line } in lines {
            width = match width {
                Some(w) if w == line.len() => Some(w),
                Some(_) => return Err(UnevenGridError),
                None => Some(line.len()),
            };
            table.append(&mut line);
        }
        Ok(Grid {
            width: width.ok_or(UnevenGridError)?,
            table,
        })
    }
}

assert_eq!(inpt::<Grid<char>>("##\n##").unwrap().width, 2);

§skip

The #[inpt(skip)] field attribute can be used to ignore fields when parsing and instead insert their Default::default().

§option

If a capture group corresponds to a field with type Option, the field will be set to None when the group is not captured by the match, rather than producing an error.

#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"(.*) letter ([a-zA-Z])(\d+)?")]
struct RegexFields<'s>(&'s str, char, Option<i32>);

assert_eq!(
    inpt::<RegexFields>("classroom letter A").unwrap(),
    RegexFields("classroom", 'A', None),
)

§before, after

Any fields marked with the #[inpt(before)] attribute will be parsed sequentially, consuming input prior to matching the given regex. After the regex is matched, remaining input is consumed by any fields marked #[inpt(after)]. Having such a field causes the regex to again behave lazily in the example above.

#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"is a")]
struct RegexFields<'s>(
    #[inpt(before)] char,
    #[inpt(before)] i32,
    #[inpt(after)] &'s str,
);

assert_eq!(
    inpt::<RegexFields>("A113 is a classroom").unwrap(),
    RegexFields('A', 113, "classroom"),
)

§bounds

By default the derive macro adds T: Inpt<'s> bounds to every parsed field of a struct, as well as a Self: 's bound. This greatly improves error messages and improves the ergonomics around generic structs. However, it is sometimes necessary to replace those automatic bounds entirely. If you ever get “error[E0275]: overflow evaluating the requirement `T: Inpt<’_>`”, try solving it with a #[inpt(bounds = "")] attribute.

use inpt::InptError;

#[derive(Inpt)]
#[inpt(regex = "(.)(.+)?")]
#[inpt(bounds = "")]
struct Recursive(char, Option<Box<Recursive>>);

let chars: Recursive = inpt("abc").unwrap();

§from_str

Although Rust integers and strings all implement the Inpt trait, some types can only be parsed using FromStr. The derive macro can be told to use a type’s FromStr implementation with the #[inpt(from_str)] field attribute. Because the from_str function consumes an entire string instead of chopping off just the beginning, the attribute can only be placed on the last field of a struct, or on fields receiving regex capture groups.

use std::net::{IpAddr};

#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"route from (\S+) to")]
struct Routing {
    #[inpt(from_str)]
    from: IpAddr,
    #[inpt(from_str, after)]
    to: IpAddr,
}

let route: Routing = inpt("route from 192.168.1.2 to 127.0.0.1").unwrap();

§from_iter

It is quite easy to repeatedly parse a type, either by using Vec’s own inpt implementation, or parsing then collecting a InptIter. This can also be accessed inside the derive macro using the #[inpt(from_iter = "T")] field attribute, which calls into FromIterator<T>. The item type has to be specified because some collections can be built from multiple different item types (e.g. String can be collected from an iterator of char, an iterator of &str, or an iterator of String). Like the from_str attribute, the from_iter attribute consumes an entire string and so must appear at the end of the struct, or otherwise parse a regex capture group.

use std::collections::HashMap;

#[derive(Inpt, Debug, PartialEq)]
struct Rooms {
    #[inpt(from_iter = "(char, u32)")]
    letter_to_number: HashMap<char, u32>,
}

assert_eq!(
    inpt::<Rooms>("B5 A113 F111").unwrap().letter_to_number,
    [('A', 113), ('B', 5), ('F', 111)].into_iter().collect::<HashMap<_, _>>(),
)

§trim

By default, inpt trims all whitespace between fields. However, some types implement more specific trimming rules. For example, all number types additionally trim adjacent commas and semicolons:

assert_eq!(
    inpt::<Vec<i32>>("1,2;3 4").unwrap(),
    vec![1, 2, 3, 4],
)

Users of this crate can specify characters to trim with the #[inpt(trim = r"\s")] struct attribute. The attribute syntax is the same as for regex character classes including ranges, negation, intersection, and unicode class names.

#[derive(Inpt)]
#[inpt(trim = r"\p{Punctuation}")]
struct Sentence<'s>(&'s str);

assert_eq!(
    inpt::<Sentence>("¡I love regexes 💕!").unwrap().0,
    "I love regexes 💕",
)

The trim attribute is also available on fields. In this case, the attribute will forcibly override the trimming behavior of the field’s immediate type. This works particularly well with the from_iter attribute.

#[derive(Inpt)]
struct PhoneNumber {
    #[inpt(from_iter = "u32", trim = r"+\-()\s")]
    digits: Vec<u32>,
}

assert_eq!(
    inpt::<PhoneNumber>("+(1)(425) 555-0100").unwrap().digits,
    vec![1, 425, 555, 0100],
)

Trimming can be broadly disabled by setting trim = "" on a wrapper struct (e.g. NoTrim), as the default trimmable character class is inherited by types deeper in the parse tree.

§split

Sometimes a whole regex is overkill to separate fields, and you only need some kind of delimiter. The wrapper types in inpt::split accomplish exactly this: they stop consuming input as soon as the corresponding delimiter is reached. The field attribute #[inpt(split = "T")] is used to parse a field as if it were wrapped in the given type.

#[derive(Inpt, Debug, PartialEq)]
struct Request<'s> {
    #[inpt(split = "Line")]
    method: &'s str,
    body: &'s str,
}

assert_eq!(
    inpt::<Request>("
         PUT
         crabs are perfect animals
    ").unwrap(),
    Request {
        method: "PUT",
        body: "crabs are perfect animals",
    },
)

§Enum Syntax

Structs and enums support all the same attributes, listed above. But the process of parsing an enum is somewhat different. Inpt will attempt to parse each variant, returning the first that is successfully parsed.

#[derive(Inpt)]
enum Math {
    #[inpt(regex = r"(.*)\+(.*)")]
    Add(f64, f64),
    #[inpt(regex = r"(.*)\*(.*)")]
    Mul(f64, f64),
}

impl Math {
    fn solve(self) -> f64 {
        match self {
            Math::Add(a, b) => a + b,
            Math::Mul(a, b) => a * b,
        }
    }
}

assert_eq!(inpt::<Math>("2.6+5.0").unwrap().solve(), 7.6);
assert_eq!(inpt::<Math>("2.6*5.0").unwrap().solve(), 13.0);

§enum regex

Although a #[regex = r".*"] attribute is not required on every variant, it is strongly encouraged. Without a regex to pick the correct set of fields, inpt has to guess-and-check each individually. Not only can this cause parsing cost to explode exponentially, it makes bugs and errors almost impossible to track down.

When a regex is specified:

  • if an error occurs before the regex match, the next variant may be tried
  • if the regex does not match, the next variant is always tried
  • if an error occurs inside capture group or after the regex match, an error is immediately produced

§Main

Although inpt can be used with any source of text, it is most common to parse stdin and report errors on stderr. The #[inpt::main] attribute macro is built to facilitate this. Applied to a function, it works exactly like #[derive(Inpt)] except arguments behave like fields, and the function as a whole behaves like a struct. The created function will have the same name, visibility, and return type, but will parse stdin instead of receiving arguments.

#[inpt::main(regex = r"(?:my name is|i am) (.+)")]
fn main(name: &'static str) {
    println!("hello {name}!");
}

If stdin can not be parsed, the cause of the error is clearly reported by error::InptError::annotated_stderr

$ echo ‘call me sam’ | cargo run --example hello INPT ERROR in stdin:1:1 <hello::main::Arguments>< /(?:my name is|i am) (.+)/ >call me sam</regex></hello::main::Arguments>

Note that lifetime elision does not currently work, so all borrows must use either 'static or a generic lifetime.

Modules§

  • Wrapper types used to split up input in common ways.

Macros§

Structs§

  • A class of characters, as defined by char_class! and used for trimming.
  • Provides information about parsing errors.
  • Lazily parse input as a sequence of type T.
  • The output of a single parsing step.
  • A const regex used internally by Inpt.
  • Broadly disables the default whitespace trimming on the inner type.
  • Prevents infinite parse trees.

Constants§

  • The “\s” character class, used by default to trim types during parsing.

Traits§

  • The core parsing trait.
  • An extension trait for InptResult, used to provide error context.

Functions§

  • The point of this crate. Parse T from the given string.
  • Parse T from stdin and print any errors on stderr.
  • Parse T from the beginning of the given string.

Type Aliases§

Attribute Macros§

  • Apply to a main function to parse stdin.

Derive Macros§

  • Apply to a struct so that it can be parsed.