Skip to main content

Crate bnf

Crate bnf 

Source
Expand description

§bnf

.github/workflows/ci.yml coveralls Crates.io Version Crates.io Documentation LICENSE

A library for parsing Backus–Naur form context-free grammars.

§Installation

Add to your Cargo.toml:

[dependencies]
bnf = "0.6"

§What does a parsable BNF grammar look like?

The following grammar from the Wikipedia page on Backus-Naur form exemplifies a compatible grammar. (*Note: parser allows for an optional ‘;’ to indicate the end of a production)

 <postal-address> ::= <name-part> <street-address> <zip-part>

      <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                    | <personal-part> <name-part>

  <personal-part> ::= <initial> "." | <first-name>

 <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

       <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>

<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
    <opt-apt-num> ::= <apt-num> | ""

§Extended syntax (groups and optionals)

When parsing grammar text (e.g. str::parse or Grammar::parse_from), the parser accepts two shortcuts:

§Parenthesized groups ( ... )

Group alternatives so they act as one unit in a sequence.

Without parentheses, | binds loosely. This rule:

<s> ::= <a> | <b> <c>

means “<a> or <b> <c>”. So "a" matches, and "b c" matches, but "a c" does not.

With parentheses, you get “(a or b) then c”:

<s> ::= (<a> | <b>) <c>

So only "a c" and "b c" match.

§Optionals [ ... ]

Zero or one of the grouped alternatives (like ? in regex).

<word> ::= <letter> [<digit>]

means: a letter, optionally followed by a digit. Both "x" and "x1" match.

Equivalent long form without extended syntax:

<word>      ::= <letter> <opt-digit>
<opt-digit> ::= <digit> | ""

§Normalization

Groups and optionals are normalized into a grammar that uses only plain nonterminals and terminals: each group or optional is turned into a fresh internal nonterminal (e.g. __anon0, __anon1). The public Term type has only Term::Terminal and Term::Nonterminal; parsing and generation use this normalized form.

Round-trip: Formatting a grammar (e.g. format!("{}", grammar)) does not preserve ( ) or [ ]; the result uses __anon* names. Re-parsing yields an equivalent grammar.

Empty groups or optionals — () or [] with nothing inside — are invalid; at least one alternative is required.

§Output

Take the following grammar for DNA sequences to be input to this library’s parse function.

<dna> ::= <base> | <base> <dna>
<base> ::= "A" | "C" | "G" | "T"

The output is a Grammar object representing a tree that looks like this:

Grammar
├── <dna> ::=
│   ├── <base>
│   └── <base> <dna>
└── <base> ::=
    ├── "A"
    ├── "C"
    ├── "G"
    └── "T"

Once the Grammar object is populated, to generate a random sentence from it call the object’s generate function. grammar.generate(). For the above grammar you could expect something like TGGC or AG.

If the generate function can’t find a production for a nonterminal it tries to evaluate it will print the identifer as a nonterminal, i.e. <identifier>.

The generate function will return an error if it detects an infinite loop caused by a production such as <PATTERN> ::= <PATTERN>.

§Parse Example

use bnf::Grammar;

let input =
    "<postal-address> ::= <name-part> <street-address> <zip-part>

        <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                        | <personal-part> <name-part>

    <personal-part> ::= <initial> '.' | <first-name>

    <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

        <zip-part> ::= <town-name> ',' <state-code> <ZIP-code> <EOL>

    <opt-suffix-part> ::= 'Sr.' | 'Jr.' | <roman-numeral> | ''
        <opt-apt-num> ::= <apt-num> | ''";

let grammar: Result<Grammar, _> = input.parse();
match grammar {
    Ok(g) => println!("{:#?}", g),
    Err(e) => println!("Failed to make grammar from String: {}", e),
}

§Generate Example

use bnf::Grammar;

let input =
    "<dna> ::= <base> | <base> <dna>
    <base> ::= 'A' | 'C' | 'G' | 'T'";
let grammar: Grammar = input.parse().unwrap();
let sentence = grammar.generate();
match sentence {
    Ok(s) => println!("random sentence: {}", s),
    Err(e) => println!("something went wrong: {}!", e)
}

§Parse Sentence via Grammar

use bnf::Grammar;

let input =
    "<dna> ::= <base> | <base> <dna>
    <base> ::= 'A' | 'C' | 'G' | 'T'";
let grammar: Grammar = input.parse().unwrap();

// Create a parser from the grammar (validates all nonterminals are defined)
let parser = grammar.build_parser().unwrap();

let sentence = "GATTACA";

let mut parse_trees = parser.parse_input(sentence);
match parse_trees.next() {
    Some(parse_tree) => println!("{}", parse_tree),
    _ => println!("Grammar could not parse sentence"),
}

By default, parse_input implicitly starts from the first rule. To match another rule, parse_input_starting_with can be used:

use bnf::{Grammar, Term};

let input =
    "<dna> ::= <base> | <base> <dna>
    <base> ::= 'A' | 'C' | 'G' | 'T'";
let grammar: Grammar = input.parse().unwrap();

// Create a parser from the grammar (validates all nonterminals are defined)
let parser = grammar.build_parser().unwrap();

let sentence = "G";
let target_production = Term::Nonterminal("base".to_string());

let mut parse_trees = parser.parse_input_starting_with(sentence, &target_production);
match parse_trees.next() {
    Some(parse_tree) => println!("{}", parse_tree),
    _ => println!("Grammar could not parse sentence"),
}

Re-exports§

pub use rand;

Macros§

expression
Create an expression from a series of terms
grammar
Construct a Grammar from a series of semicolon separated productions.
production
Macro to create a Production from a right-hand side definition
term
Creates a Terminal if the input is a string literal or a Nonterminal if the input is inside angle brackets

Structs§

ABNF
BNF
CoverageGuided
Prefers expressions that have not yet been exercised; tracks (nonterminal, index) coverage.
DepthBounded
At or above max_depth, only chooses expressions that are all terminals, guaranteeing termination. Below max_depth, behaves like RandomWalk.
Expression
An Expression is comprised of any number of Terms
Grammar
A Grammar is comprised of any number of Productions.
GrammarParser
A reusable parser built from a Grammar that validates all nonterminals are defined at construction time.
ParseTree
A tree derived by successfully parsing an input string via crate::GrammarParser::parse_input()
Production
A Production is comprised of any number of Expressions
RandomWalk
Uniform random choice over all expressions (current default behavior).
Weighted
Weights expressions by whether they contain a recursive reference to the current nonterminal (higher weight for recursive, for fuzzing / stress testing).

Enums§

Error
ParseTreeNode
A node of a ParseTree, either terminating or continuing the ParseTree
Term
A Term can represent a Terminal or Nonterminal node.

Traits§

Format
GenerationStrategy
Strategy for choosing which RHS alternative to use when expanding a nonterminal during grammar sentence generation.

Functions§

escape_mermaid_label
Escapes a label string for use inside Mermaid flowchart node ["label"] syntax. Uses Mermaid entity codes: each special character is replaced by #<codepoint>; (e.g. "#34;). Other characters are left unchanged.