Expand description
Inpt is a derive crate for dumb type-level text parsing.
§Introduction
Imagine you need to chop up an annoying string and convert all the bits to useful types.
You could write that sort of code by hand using split
and from_str
, but the boiler-plate
of unwrapping and checking quickly looses all charm. Especially since that sort of parsing
shows up a lot in timed programming competitions like advent of code.
Inpt tries to write that sort of parsing code for you, automatically splitting input strings based on field types and an optional regex. Inpt is absolutely not performant, strict, or formal. Whenever possible, it does the obvious thing:
#[inpt::main]
fn main(x: f32, y: f32) {
println!("{}", x * y);
}
$ echo '6,7' | cargo run
42
§Contents
§Example
use inpt::{Inpt, inpt};
#[derive(Inpt)]
#[inpt(regex = r"(.)=([-\d]+)\.\.([-\d]+),?")]
struct Axis {
name: char,
start: i32,
end: i32,
}
#[derive(Inpt)]
#[inpt(regex = "target area:")]
struct Target {
#[inpt(after)]
axes: Vec<Axis>,
}
impl Target {
fn area(&self) -> i32 {
self.axes.iter().map(|Axis { start, end, ..}| end - start).product()
}
}
let target = inpt::<Target>("target area: x=119..176, y=-114..84").unwrap();
assert_eq!(target.area(), 11286);
§Struct Syntax
The Inpt
derive macro can do a few neat tricks, listed here. In its default setting,
the fields of the struct are parsed in order, with each field consuming as much of the input as
possible before moving on:
#[derive(Inpt, Debug, PartialEq)]
struct OrderedFields<'s>(char, i32, &'s str);
assert_eq!(
inpt::<OrderedFields>("A113 is a classroom").unwrap(),
OrderedFields('A', 113, "is a classroom"),
)
This behavior is also implemented for arrays, tuples, and a number of collection types.
§regex
When the #[inpt(regex = r".*")]
struct attribute is given, the fields are no longer
parsed one after the another. Instead the regex is matched against the remaining input, and
the fields are parsed from the regex’s numbered capture groups. I recommend that regexes are given as
raw strings to avoid
double-escapes and quoting.
#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"(.*) number ([a-zA-Z])(\d+)")]
struct RegexFields<'s>(&'s str, char, i32);
assert_eq!(
inpt::<RegexFields>("classroom number A113").unwrap(),
RegexFields("classroom", 'A', 113),
)
Ungreedy/lazy repetitions can be very useful when splitting inputs. Like rewriting a while loop as an until loop,
a regex ([^!]*)!
can be rewritten as (.*?)!
. This is particularly helpful when we want to stop after finding multiple characters,
like the 3 quotes that end a multi-line string in Python or Julia: """(.*?)"""
.
Be aware that when such a regex is used multiple times to parse a sequence of fields, the last regex match is forced to parse all remaining input, even if normally lazy:
#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"(.+?),")]
struct Part<'s>(&'s str);
assert_eq!(
inpt::<[Part; 3]>("my,list,of,many,words,").unwrap(),
[Part("my"), Part("list"), Part("of,many,words")],
)
§from, try_from
When the #[inpt(from = "T")]
or #[inpt(try_from = "T")]
struct attributes are given, T is parsed instead
of the struct itself, and the From or TryFrom traits are used to convert.
use inpt::split::{Group, Line};
#[derive(Inpt)]
#[inpt(try_from = "Group<Vec<Line<Vec<T>>>>")]
struct Grid<T> {
width: usize,
table: Vec<T>,
}
#[derive(Debug)]
struct UnevenGridError;
impl fmt::Display for UnevenGridError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str("grid rows must have even length")
}
}
impl Error for UnevenGridError {}
impl<'s, T> TryFrom<Group<Vec<Line<Vec<T>>>>> for Grid<T> {
type Error = UnevenGridError;
fn try_from(Group { inner: lines }: Group<Vec<Line<Vec<T>>>>)
-> Result<Self, Self::Error>
{
let mut width = None;
let mut table = Vec::new();
for Line { inner: mut line } in lines {
width = match width {
Some(w) if w == line.len() => Some(w),
Some(_) => return Err(UnevenGridError),
None => Some(line.len()),
};
table.append(&mut line);
}
Ok(Grid {
width: width.ok_or(UnevenGridError)?,
table,
})
}
}
assert_eq!(inpt::<Grid<char>>("##\n##").unwrap().width, 2);
§skip
The #[inpt(skip)]
field attribute can be used to ignore fields when parsing
and instead insert their Default::default()
.
§option
If a capture group corresponds to a field with type Option
, the field will be set to None
when the group is not captured
by the match, rather than producing an error.
#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"(.*) letter ([a-zA-Z])(\d+)?")]
struct RegexFields<'s>(&'s str, char, Option<i32>);
assert_eq!(
inpt::<RegexFields>("classroom letter A").unwrap(),
RegexFields("classroom", 'A', None),
)
§before, after
Any fields marked with the #[inpt(before)]
attribute will be parsed sequentially, consuming input prior to matching the given regex.
After the regex is matched, remaining input is consumed by any fields marked #[inpt(after)]
. Having such a field causes the regex
to again behave lazily in the example above.
#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"is a")]
struct RegexFields<'s>(
#[inpt(before)] char,
#[inpt(before)] i32,
#[inpt(after)] &'s str,
);
assert_eq!(
inpt::<RegexFields>("A113 is a classroom").unwrap(),
RegexFields('A', 113, "classroom"),
)
§bounds
By default the derive macro adds T: Inpt<'s>
bounds to every parsed field of a struct, as well as a Self: 's
bound.
This greatly improves error messages and improves the ergonomics around generic structs. However, it is sometimes necessary
to replace those automatic bounds entirely. If you ever get
“error[E0275]: overflow evaluating the requirement `T: Inpt<’_>`
”, try solving it
with a #[inpt(bounds = "")]
attribute.
use inpt::InptError;
#[derive(Inpt)]
#[inpt(regex = "(.)(.+)?")]
#[inpt(bounds = "")]
struct Recursive(char, Option<Box<Recursive>>);
let chars: Recursive = inpt("abc").unwrap();
§from_str
Although Rust integers and strings all implement the Inpt
trait, some types can only be parsed using FromStr
.
The derive macro can be told to use a type’s FromStr
implementation with the #[inpt(from_str)]
field attribute.
Because the from_str
function consumes an entire string instead of chopping off just the beginning, the attribute
can only be placed on the last field of a struct, or on fields receiving regex capture groups.
use std::net::{IpAddr};
#[derive(Inpt, Debug, PartialEq)]
#[inpt(regex = r"route from (\S+) to")]
struct Routing {
#[inpt(from_str)]
from: IpAddr,
#[inpt(from_str, after)]
to: IpAddr,
}
let route: Routing = inpt("route from 192.168.1.2 to 127.0.0.1").unwrap();
§from_iter
It is quite easy to repeatedly parse a type, either by using Vec
’s own inpt implementation,
or parsing then collecting a InptIter
. This can also be accessed inside the derive macro using the
#[inpt(from_iter = "T")]
field attribute, which calls into FromIterator<T>
.
The item type has to be specified because some collections can be built from multiple different item types
(e.g. String
can be collected from an iterator of char
, an iterator of &str
, or an iterator of String
).
Like the from_str attribute, the from_iter attribute consumes an entire string and so must appear at the end
of the struct, or otherwise parse a regex capture group.
use std::collections::HashMap;
#[derive(Inpt, Debug, PartialEq)]
struct Rooms {
#[inpt(from_iter = "(char, u32)")]
letter_to_number: HashMap<char, u32>,
}
assert_eq!(
inpt::<Rooms>("B5 A113 F111").unwrap().letter_to_number,
[('A', 113), ('B', 5), ('F', 111)].into_iter().collect::<HashMap<_, _>>(),
)
§trim
By default, inpt trims all whitespace between fields. However, some types implement more specific trimming rules. For example, all number types additionally trim adjacent commas and semicolons:
assert_eq!(
inpt::<Vec<i32>>("1,2;3 4").unwrap(),
vec![1, 2, 3, 4],
)
Users of this crate can specify characters to trim with the #[inpt(trim = r"\s")]
struct attribute. The attribute
syntax is the same as for regex character classes including
ranges, negation, intersection, and unicode class names.
#[derive(Inpt)]
#[inpt(trim = r"\p{Punctuation}")]
struct Sentence<'s>(&'s str);
assert_eq!(
inpt::<Sentence>("¡I love regexes 💕!").unwrap().0,
"I love regexes 💕",
)
The trim attribute is also available on fields. In this case, the attribute will forcibly override the trimming behavior of the field’s immediate type. This works particularly well with the from_iter attribute.
#[derive(Inpt)]
struct PhoneNumber {
#[inpt(from_iter = "u32", trim = r"+\-()\s")]
digits: Vec<u32>,
}
assert_eq!(
inpt::<PhoneNumber>("+(1)(425) 555-0100").unwrap().digits,
vec![1, 425, 555, 0100],
)
Trimming can be broadly disabled by setting trim = ""
on a wrapper struct (e.g. NoTrim
), as the default
trimmable character class is inherited by types deeper in the parse tree.
§split
Sometimes a whole regex is overkill to separate fields, and you only need some kind of delimiter.
The wrapper types in inpt::split
accomplish exactly this: they stop consuming
input as soon as the corresponding delimiter is reached.
The field attribute #[inpt(split = "T")]
is used to parse a field
as if it were wrapped in the given type.
#[derive(Inpt, Debug, PartialEq)]
struct Request<'s> {
#[inpt(split = "Line")]
method: &'s str,
body: &'s str,
}
assert_eq!(
inpt::<Request>("
PUT
crabs are perfect animals
").unwrap(),
Request {
method: "PUT",
body: "crabs are perfect animals",
},
)
§Enum Syntax
Structs and enums support all the same attributes, listed above. But the process of parsing an enum is somewhat different. Inpt will attempt to parse each variant, returning the first that is successfully parsed.
#[derive(Inpt)]
enum Math {
#[inpt(regex = r"(.*)\+(.*)")]
Add(f64, f64),
#[inpt(regex = r"(.*)\*(.*)")]
Mul(f64, f64),
}
impl Math {
fn solve(self) -> f64 {
match self {
Math::Add(a, b) => a + b,
Math::Mul(a, b) => a * b,
}
}
}
assert_eq!(inpt::<Math>("2.6+5.0").unwrap().solve(), 7.6);
assert_eq!(inpt::<Math>("2.6*5.0").unwrap().solve(), 13.0);
§enum regex
Although a #[regex = r".*"]
attribute is not required on every variant, it is strongly encouraged. Without
a regex to pick the correct set of fields, inpt has to guess-and-check each individually. Not only can this
cause parsing cost to explode exponentially, it makes bugs and errors almost impossible to track down.
When a regex is specified:
- if an error occurs before the regex match, the next variant may be tried
- if the regex does not match, the next variant is always tried
- if an error occurs inside capture group or after the regex match, an error is immediately produced
§Main
Although inpt can be used with any source of text, it is most common to parse
stdin and report errors on stderr. The #[inpt::main]
attribute macro is built
to facilitate this. Applied to a function, it works exactly like #[derive(Inpt)]
except
arguments behave like fields, and the function as a whole behaves like a struct. The created function
will have the same name, visibility, and return type, but will parse stdin instead of receiving arguments.
#[inpt::main(regex = r"(?:my name is|i am) (.+)")]
fn main(name: &'static str) {
println!("hello {name}!");
}
If stdin can not be parsed, the cause of the error is clearly reported by error::InptError::annotated_stderr
$ echo ‘call me sam’ | cargo run --example hello
INPT ERROR in stdin:1:1
<hello::main::Arguments>< /(?:my name is|i am) (.+)/ >call me sam</regex></hello::main::Arguments>
Note that lifetime elision does not currently work, so all borrows must use either 'static
or a generic lifetime.
Modules§
- Wrapper types used to split up input in common ways.
Macros§
- Parse a regex character class, and return an instance of
CharClass
.
Structs§
- A class of characters, as defined by
char_class!
and used for trimming. - Provides information about parsing errors.
- Lazily parse input as a sequence of type
T
. - The output of a single parsing step.
- A const regex used internally by
Inpt
. - Broadly disables the default whitespace trimming on the inner type.
- Prevents infinite parse trees.
Constants§
- The “\s” character class, used by default to trim types during parsing.
Traits§
- The core parsing trait.
- An extension trait for
InptResult
, used to provide error context.
Functions§
- The point of this crate. Parse
T
from the given string. - Parse
T
from stdin and print any errors on stderr. - Parse
T
from the beginning of the given string.
Type Aliases§
- The result type of parsing
T
.
Attribute Macros§
- Apply to a main function to parse stdin.
Derive Macros§
- Apply to a struct so that it can be parsed.