pub struct Pidgin { /* private fields */ }
Expand description
This is a grammar builder. It keeps track of the rules defined, the alternates participating in the rule currently being defined, whether these alternates should be bounded left and right by word boundaries, string boundaries, or line boundaries, and the set of regex flags – case-sensitivity, for instance, that will govern the rule it produces.
Defined rules will be used to process the new rule’s alternates. If there is
a “foo” rule, the alternate "foo foo"
will be understood to require that
this “foo” rule match twice with a space between matches.
Because rule names can overlap, they are applied longest to shortest. If there is both a “foo” rule and a “f” rule, “f foo” will be understood to involve one match for each – the “f” rule only gets the single “f”.
In addition to rules identified like this by a name, there are also regex rules. These are substituted into alternates wherever their definitional pattern matches. Regex rules are sought in alternates only in what is left over after ordinary rules are found. Regex rules are applied in inverse order by the length of their string representation and then in alphabetical order. They may optionally also have names.
Pidgin
has numerous configuration methods which consume and return their
invocant.
let mut p = Pidgin::new()
.enclosed(true)
.word_bound()
.case_insensitive(true);
Implementations§
source§impl Pidgin
impl Pidgin
sourcepub fn new() -> Pidgin
pub fn new() -> Pidgin
Constructs a new Pidgin
with the default state: no rules, no alternates
for the current rule, case-sensitive, not multiline, not dot-all (.
matches a newline), unicode-compliant, and not enclosed.
sourcepub fn add(&mut self, phrases: &[&str]) -> &mut Pidgin
pub fn add(&mut self, phrases: &[&str]) -> &mut Pidgin
Adds the given list of alternates to the rule currently under construction.
This method is chainable.
sourcepub fn add_str(&mut self, s: &str) -> &mut Pidgin
pub fn add_str(&mut self, s: &str) -> &mut Pidgin
Adds the given alternate to the rule currently under construction.
This method is chainable.
sourcepub fn compile(&mut self) -> Grammar
pub fn compile(&mut self) -> Grammar
Compiles the current rule, clearing the alternate list in preparation for constructing the next rule.
pub fn compile_bounded(
&mut self,
left: bool,
right: bool,
apply_symbols: bool
) -> Grammar
sourcepub fn grammar(&mut self, words: &[&str]) -> Grammar
pub fn grammar(&mut self, words: &[&str]) -> Grammar
A convenience method equivalent to add(&words).compile()
.
sourcepub fn rule(&mut self, name: &str, g: &Grammar)
pub fn rule(&mut self, name: &str, g: &Grammar)
Define the rule name
.
NOTE Multiple rules defined with the same name are treated as
alternates. The order of their adding will define the order in which
they are tried. See remove_rule
.
sourcepub fn rx_rule(
&mut self,
rx: &str,
g: &Grammar,
name: Option<&str>
) -> Result<(), Error>
pub fn rx_rule(
&mut self,
rx: &str,
g: &Grammar,
name: Option<&str>
) -> Result<(), Error>
Defines a rule replacing matched portion’s of the rule’s alternates with the given regex.
The rx
argument finds matched portions of an alternate. The g
argument defines the rule. The name
argument
provides the optional name for the rule.
For an example of its practical use, the normalize_whitespace
method
is implemented via foreign_rx_rule
.
let mut p = Pidgin::new();
let g = p.grammar(&vec!["foo", "bar"]);
p.rx_rule(r"\s+", &g, Some("whitespace_is_weird"))?;
let m = p.grammar(&vec!["FUNKY CHICKEN"]).matcher()?;
let mtch = m.parse("FUNKYfooCHICKEN").unwrap();
assert!(mtch.has("whitespace_is_weird"));
Errors
foreign_rx_rule
returns an error if rx
fails to compile.
sourcepub fn foreign_rule(&mut self, name: &str, pattern: &str) -> Result<(), Error>
pub fn foreign_rule(&mut self, name: &str, pattern: &str) -> Result<(), Error>
Defines a rule based on an ad hoc regular expression.
Currently foreign_rule
is the only way to define a rule with unbounded
repetition.
pidgin.foreign_rule("us_local_phone", r"\b[0-9]{3}-?[0-9]{4}\b")?;
Errors
foreign_rule
returns an error if the foreign regex fails to compile.
sourcepub fn foreign_rx_rule(
&mut self,
rx: &str,
pattern: &str,
name: Option<&str>
) -> Result<(), Error>
pub fn foreign_rx_rule(
&mut self,
rx: &str,
pattern: &str,
name: Option<&str>
) -> Result<(), Error>
Defines a rule, optionally named, replacing matched portion’s of the rule’s alternates with the given regex.
The rx
argument finds matched portions of an alternate. The pattern
argument defines the regular expression of the rule. The name
argument
provides the optional name for the rule.
pidgin.foreign_rx_rule(r"\s+", r"\t+", Some("whitespace_means_tabs"))?;
Errors
foreign_rx_rule
returns an error if either rx
or pattern
fails to compile.
sourcepub fn build_rule(&mut self, name: &str, components: Vec<RuleFragment>)
pub fn build_rule(&mut self, name: &str, components: Vec<RuleFragment>)
Defines a rule using a vector of RuleFragment
s. This facilitates the
insertion of rules with defined repetition limits.
#Examples
let mut p = Pidgin::new().word_bound().normalize_whitespace(true);
let animal = p.grammar(&vec!["cat", "cow", "camel", "mongoose"]);
p.rule("animal", &animal);
let animal_space = p.add_str("animal ").compile();
p.build_rule("animal_proof", vec![gf(animal_space.reps_min(1)?), sf("QED")]);
let m = p.add_str("animal_proof").matcher()?;
assert!(m.is_match("camel camel cat cow mongoose QED"));
sourcepub fn remove_rule(&mut self, name: &str)
pub fn remove_rule(&mut self, name: &str)
Removes a rule from the list known to the Pidgin
.
sourcepub fn remove_rx_rule(&mut self, name: &str) -> Result<(), Error>
pub fn remove_rx_rule(&mut self, name: &str) -> Result<(), Error>
Like remove_rule
but the rule identifier is a regex rather than a
rule name.
sourcepub fn clear(&mut self)
pub fn clear(&mut self)
Removes all alternates and rule definitions from the Pidgin
. Flags
controlling case sensitivity and such remain.
sourcepub fn case_insensitive(self, case: bool) -> Pidgin
pub fn case_insensitive(self, case: bool) -> Pidgin
Toggles whether Pidgin
creates case-insensitive rules.
By default this is false.
sourcepub fn multi_line(self, case: bool) -> Pidgin
pub fn multi_line(self, case: bool) -> Pidgin
Toggles whether Pidgin
creates multi-line rules. This governs the
behavior of ^
and $
anchors, whether they match string boundaries
or after and before newline characters.
By default this is false.
sourcepub fn dot_all(self, case: bool) -> Pidgin
pub fn dot_all(self, case: bool) -> Pidgin
Toggles whether Pidgin
creates rules wherein .
can match newline
characters. This is the so-called “single line” mode of Perl-compatible
regular expressions.
By default this is false.
sourcepub fn unicode(self, case: bool) -> Pidgin
pub fn unicode(self, case: bool) -> Pidgin
Toggles whether Pidgin
creates Unicode-compliant rules.
By default this is true.
sourcepub fn enclosed(self, case: bool) -> Pidgin
pub fn enclosed(self, case: bool) -> Pidgin
Toggles whether Pidgin
creates rules that can safely be modified by
a repetition expression. (?:ab)
is enclosed. ab
is not.
This parameter is generally of interest only when using Pidgin
to
create elements of other regular expressions.
By default this is false.
sourcepub fn reverse_greed(self, case: bool) -> Pidgin
pub fn reverse_greed(self, case: bool) -> Pidgin
Toggles the U flag of Rust regexen. Per the documentation, U “swap[s] the meaning of x* and x*?”, thus turning a stingy match greedy and a greedy match stingy.
By default this is false.
sourcepub fn normalize_whitespace(self, required: bool) -> Pidgin
pub fn normalize_whitespace(self, required: bool) -> Pidgin
Treat any white space found in an alternate as “some amount of white space”.
if the required
parameter is true
, it means “at least some white
space”. If it is false, it means “maybe some white space”.
sourcepub fn word_bound(self) -> Pidgin
pub fn word_bound(self) -> Pidgin
The left and right edges of all alternates, when applicable, should be
word boundaries – \b
. If the alternate has a non-word character at the
boundary in question, such as “@” or “(”, then it is left alone, but if
it is a word character, it should be bounded by a \b
in the regular
expression generated.
sourcepub fn left_word_bound(self) -> Pidgin
pub fn left_word_bound(self) -> Pidgin
Alternates should have word boundaries, where applicable, on the left margin.
sourcepub fn right_word_bound(self) -> Pidgin
pub fn right_word_bound(self) -> Pidgin
Alternates should have word boundaries, where applicable, on the right margin.
sourcepub fn line_bound(self) -> Pidgin
pub fn line_bound(self) -> Pidgin
Alternates should match entire lines.
NOTE This turns multi-line matching on for the rule.
sourcepub fn left_line_bound(self) -> Pidgin
pub fn left_line_bound(self) -> Pidgin
Alternates should match at the beginning of the line on their left margin.
NOTE This turns multi-line matching on for the rule.
sourcepub fn right_line_bound(self) -> Pidgin
pub fn right_line_bound(self) -> Pidgin
Alternates should match at the beginning of the line on their right margin.
NOTE This turns multi-line matching on for the rule.
sourcepub fn string_bound(self) -> Pidgin
pub fn string_bound(self) -> Pidgin
The rule should match the entire string.
sourcepub fn left_string_bound(self) -> Pidgin
pub fn left_string_bound(self) -> Pidgin
The left margin of every alternate should be the beginning of the line.
sourcepub fn right_string_bound(self) -> Pidgin
pub fn right_string_bound(self) -> Pidgin
The right margin of every alternate should be the beginning of the line.