Expand description
This crate provides a library for generating efficient regular expressions
represent a non-recursive grammar and a mechanism to build a parse tree from capturing groups in
the expression. It uses the regex
crate for its parsing engine.
Usage
This crate is on crates.io and can be
used by adding pidgin
to your dependencies in your project’s Cargo.toml
.
[dependencies]
pidgin = "0.2.0"
and this to your crate root:
#[macro_use]
extern crate pidgin;
Example: find a date
let date = grammar!{
(?ibB)
date -> <weekday> (",") <month> <monthday> (",") <year>
date -> <month> <monthday> | <weekday> | <monthday> <month> <year>
date -> <month> <monthday> (",") <year>
date -> <numeric_date>
numeric_date -> <year> ("/") <numeric_month> ("/") <numeric_day>
numeric_date -> <year> ("-") <numeric_month> ("-") <numeric_day>
numeric_date -> <numeric_month> ("/") <numeric_day> ("/") <year>
numeric_date -> <numeric_month> ("-") <numeric_day> ("-") <year>
numeric_date -> <numeric_day> ("/") <numeric_month> ("/") <year>
numeric_date -> <numeric_day> ("-") <numeric_month> ("-") <year>
year => r(r"\b[12][0-9]{3}|[0-9]{2}\b")
weekday => [
"Sunday Monday Tuesday Wednesday Thursday Friday Saturday"
.split(" ")
.into_iter()
.flat_map(|s| vec![s, &s[0..2], &s[0..3]])
.collect::<Vec<_>>()
]
weekday => (?-i) [["M", "T", "W", "R", "F", "S", "U"]]
monthday => [(1..=31).into_iter().collect::<Vec<_>>()]
numeric_day => [
(1..=31)
.into_iter()
.flat_map(|i| vec![i.to_string(), format!("{:02}", i)])
.collect::<Vec<_>>()
]
month => [
vec![
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
].into_iter().flat_map(|s| vec![s, &s[0..3]]).collect::<Vec<_>>()
]
numeric_month => [
(1..=31)
.into_iter()
.flat_map(|i| vec![i.to_string(), format!("{:02}", i)])
.collect::<Vec<_>>()
]
};
let matcher = date.matcher().unwrap();
// we let whitespace vary
assert!(matcher.is_match(" June 6, 1969 "));
// we made it case-insensitive
assert!(matcher.is_match("june 6, 1969"));
// but we want to respect word boundaries
assert!(!matcher.is_match("jejune 6, 1969"));
// we can inspect the parse tree
let m = matcher.parse("2018/10/6").unwrap();
assert!(m.name("numeric_date").is_some());
assert_eq!(m.name("year").unwrap().as_str(), "2018");
let m = matcher.parse("Friday").unwrap();
assert!(!m.name("numeric_date").is_some());
assert!(m.name("weekday").is_some());
// still more crazy things we allow
assert!(matcher.is_match("F"));
assert!(matcher.is_match("friday"));
assert!(matcher.is_match("Fri"));
// but we said single-letter days had to be capitalized
assert!(!matcher.is_match("f"));
This macro is the raison d’etre of pidgin. It gives you a Grammar
which can itself be used in other
Grammar
s via the g(grammar)
element, it can server as a library of Grammar
s via the rule
method,
or via its matcher
method it can give you a Matcher
object which will allow you to
parse a string to produce a Match
parse tree.
Macros
Structs
A compiled collection of rules ready for the building of a
Matcher
or for use in the definition of a new rule.This is a node in a parse tree. It is functionally similar to
regex::Match
,
in fact providing much the same API, but unlike a regex::Match
a pidgin::Match
always corresponds to some rule, it knows what rule it corresponds to,
and it records any sub-matches involved in its parsing.This is functionally equivalent to a
Regex
: you can use it repeatedly to
search a string. It cannot itself be used directly to split strings, but
its regular expression is public and may be so used. It improves on regular
expressions in that the Match
object it returns is the root node in a
parse tree, so its matches preserve parse structure.