Crate pidgin

source ·
Expand description

This crate provides a library for generating efficient regular expressions represent a non-recursive grammar and a mechanism to build a parse tree from capturing groups in the expression. It uses the regex crate for its parsing engine.

Usage

This crate is on crates.io and can be used by adding pidgin to your dependencies in your project’s Cargo.toml.

[dependencies]
pidgin = "0.2.0"

and this to your crate root:

#[macro_use]
extern crate pidgin;

Example: find a date

let date = grammar!{
    (?ibB)

    date -> <weekday> (",") <month> <monthday> (",") <year>
    date -> <month> <monthday> | <weekday> | <monthday> <month> <year>
    date -> <month> <monthday> (",") <year>
    date -> <numeric_date>

    numeric_date -> <year> ("/") <numeric_month> ("/") <numeric_day>
    numeric_date -> <year> ("-") <numeric_month> ("-") <numeric_day>
    numeric_date -> <numeric_month> ("/") <numeric_day> ("/") <year>
    numeric_date -> <numeric_month> ("-") <numeric_day> ("-") <year>
    numeric_date -> <numeric_day> ("/") <numeric_month> ("/") <year>
    numeric_date -> <numeric_day> ("-") <numeric_month> ("-") <year>

    year    => r(r"\b[12][0-9]{3}|[0-9]{2}\b")
    weekday => [
            "Sunday Monday Tuesday Wednesday Thursday Friday Saturday"
                .split(" ")
                .into_iter()
                .flat_map(|s| vec![s, &s[0..2], &s[0..3]])
                .collect::<Vec<_>>()
        ]
    weekday     => (?-i) [["M", "T", "W", "R", "F", "S", "U"]]
    monthday    => [(1..=31).into_iter().collect::<Vec<_>>()]
    numeric_day => [
            (1..=31)
                .into_iter()
                .flat_map(|i| vec![i.to_string(), format!("{:02}", i)])
                .collect::<Vec<_>>()
        ]
    month => [
        vec![
            "January",
            "February",
            "March",
            "April",
            "May",
            "June",
            "July",
            "August",
            "September",
            "October",
            "November",
            "December",
        ].into_iter().flat_map(|s| vec![s, &s[0..3]]).collect::<Vec<_>>()
      ]
    numeric_month => [
            (1..=31)
                .into_iter()
                .flat_map(|i| vec![i.to_string(), format!("{:02}", i)])
                .collect::<Vec<_>>()
        ]
};
let matcher = date.matcher().unwrap();

// we let whitespace vary
assert!(matcher.is_match(" June   6,    1969 "));
// we made it case-insensitive
assert!(matcher.is_match("june 6, 1969"));
// but we want to respect word boundaries
assert!(!matcher.is_match("jejune 6, 1969"));
// we can inspect the parse tree
let m = matcher.parse("2018/10/6").unwrap();
assert!(m.name("numeric_date").is_some());
assert_eq!(m.name("year").unwrap().as_str(), "2018");
let m = matcher.parse("Friday").unwrap();
assert!(!m.name("numeric_date").is_some());
assert!(m.name("weekday").is_some());
// still more crazy things we allow
assert!(matcher.is_match("F"));
assert!(matcher.is_match("friday"));
assert!(matcher.is_match("Fri"));
// but we said single-letter days had to be capitalized
assert!(!matcher.is_match("f"));

This macro is the raison d’etre of pidgin. It gives you a Grammar which can itself be used in other Grammars via the g(grammar) element, it can server as a library of Grammars via the rule method, or via its matcher method it can give you a Matcher object which will allow you to parse a string to produce a Match parse tree.

Macros

Compiles a Grammar.

Structs

A compiled collection of rules ready for the building of a Matcher or for use in the definition of a new rule.
This is a node in a parse tree. It is functionally similar to regex::Match, in fact providing much the same API, but unlike a regex::Match a pidgin::Match always corresponds to some rule, it knows what rule it corresponds to, and it records any sub-matches involved in its parsing.
This is functionally equivalent to a Regex: you can use it repeatedly to search a string. It cannot itself be used directly to split strings, but its regular expression is public and may be so used. It improves on regular expressions in that the Match object it returns is the root node in a parse tree, so its matches preserve parse structure.