Crate pidgin[][src]

This crate provides a library for generating efficient regular expressions programmatically that can represent a simple, non-recursive grammar. It uses the regex crate for its parsing engine.


This crate is on and can be used by adding pidgin to your dependencies in your project's Cargo.toml.

pidgin = "0.1"

and this to your crate root:

extern crate pidgin;

Example: find a date

This is like the regex example, but considerably more expressive, and once you've matched a date with a Pidgin matcher it is easier to determine how you matched it and thus convert the match into useful semantics.

use pidgin::Pidgin;

// set up the initial building for our pidgin grammar
let mut p = Pidgin::new()
// you can build word lists and add them in
let weekdays = vec![
// various abbreviations
for s in weekdays {
let g = p.compile();
p.rule("weekday", &g);
// for these ones we care about case
p = p.case_insensitive(false);
// you can build and compile all in one go
let g = p.grammar(&vec!["M", "T", "W", "R", "F", "S", "U"]);
// add a case to an existing rule
p.rule("weekday", &g);
// back to case insensitivity
p = p.case_insensitive(true);
// we can add words to a rule piecemeal
let months = vec![
for m in months {
let g = p.compile();
p.rule("month", &g);
for i in 1..31 {
let g = p.compile();
p.rule("monthday", &g);
for i in 1..31 {
    // allow both 1 and 01, etc.
    // adding a word such as "10" twice has no ill effect
    p.add_str(&format!("{:02}", i));
let g = p.compile();
p.rule("numeric_days", &g);
for i in 1..12 {
    p.add_str(&format!("{:02}", i));
let g = p.compile();
p.rule("numeric_months", &g);
// sometimes you may need to add in a handwritten regex
// take care with named groups -- names cannot be repeated
p.foreign_rule("year", "[12][0-9]{3}|[0-9]{2}").unwrap();
// for the following patterns make whitespace optional
p = p.normalize_whitespace(false);
let g = p.grammar(&vec![
    "year / numeric_months / numeric_days",
    "numeric_months / numeric_days / year",
    "numeric_days / numeric_months / year",
    "year - numeric_months - numeric_days",
    "numeric_months - numeric_days - year",
    "numeric_days - numeric_months - year",
p.rule("numeric_date", &g);
// for the remaining rules, whitespace is required if present
p = p.normalize_whitespace(true);
// and finally, the pattern we've been working towards
let date = p.grammar(&vec![
    "weekday, month monthday, year",
    "month monthday",
    "monthday month year",
    "month monthday, year",

// now test it

let matcher = date.matcher().unwrap();

// we let whitespace vary
assert!(matcher.is_match(" June   6,    1969 "));
// we made it case-insensitive
assert!(matcher.is_match("june 6, 1969"));
// but we want to respect word boundaries
assert!(!matcher.is_match("jejune 6, 1969"));
// we can inspect the parse tree
let m = matcher.parse("2018/10/6").unwrap();
assert_eq!("year").unwrap().as_str(), "2018");
let m = matcher.parse("Friday").unwrap();
// still more crazy things we allow
// but we said single-letter days had to be capitalized



A compiled collection of rules ready for the building of a pidgin::Matcher or for use in the definition of a new rule.


This is a node in a parse tree. It is functionally similar to regex::Match, in fact providing much the same API, but unlike a regex::Match a pidgin::Match always corresponds to some rule, it knows what rule it corresponds to, and it records any sub-matches involved in its parsing.


This is functionally equivalent to a Regex: you can use it repeatedly to search a string. It cannot itself be used directly to split strings, but its regular expression is public and may be so used. It improves on regular expressions in that the Match object it returns is the root node in a parse tree, so its matches preserve parse structure.


This is a grammar builder. It keeps track of the rules defined, the alternates participating in the rule currently being defined, whether these alternates should be bounded left and right by word boundaries, string boundaries, or line boundaries, and the set of regex flags -- case-sensitivity, for instance, that will govern the rule it produces.