Crate nbnf

Source
Expand description

§nbnf

A parser generator based on nom, with syntax inspired by EBNF and regex.

§Syntax overview

A grammar is a series of rules containing expressions. Whitespace is ignored, rules must end with a semicolon:

rule = ...;
rule2 =
    ...
    ...;
...

A rule generates a parser function as Rust code, and so its name must be a valid Rust identifier. The output type of the generated function can be specified, defaulting to &str if omitted:

rule<Output> = ...;

Any valid Rust code denoting a type is permitted between the chevrons.

Expressions can invoke any parser function defined in Rust, with other rules simply being resolved as symbols in the same enclosing module:

top = inner external_rule nbnf::nom::combinator::eof;
inner = ...;

Rules can match literal chars, strings, or regex-like character ranges; and supports Rust-like escapes:

top = 'a' "bc" [de-g] '\x2A' "\"\0\r\n\t\x7F\u{FF}";

Expressions can be grouped with parentheses, and alternated between with slash:

top = ('a' 'b') / ('c' 'd');

Expressions can be repeated with regex-like syntax:

r1 = 'a'?;      // zero or one
r1 = 'b'*;      // zero or more
r2 = 'c'+;      // one or more
r3 = 'd'{2};    // exactly two
r4 = 'e'{2,};   // at least two
r5 = 'f'{,2};   // at most two
r6 = 'g'{2,4};  // between two to four

Expressions can be tagged with various modifiers, wrapping them in combinators:

  • !! (cut) prevents backtracking, e.g. when you know no other expressions can match
json_object_pair<(String, Json)> = string !!(-':' json_value);
  • ! (not) matches only when the expression does not match, consuming no input
ident = -![0-9] ~[a-zA-Z0-9_]+;
  • ~ (recognize) will discard the output and instead yield the portion of the input that was matched
r1<(i32, f64)> = ...;
r2<&str> = ~r1;

Expressions can be discarded from output by prefixing them with -:

string<&str> = -'"' ~(string_char+) -'"'

For this particular grammar, foregoing the discards would require a tuple as the return type because the quote chars are included:

string<(char, &str, char)> = ...;

The empty string can be matched with &, allowing various interesting grammar constructs:

parens = ~('(' parens ')') / ~&;

Types and output values can be massaged in a few ways by passing any valid Rust expression:

  • @<...> (value) discards output and instead returns the given literal
token<Token> =
    ... /
    '/'@<Token::Slash> /
    ...;
  • |<...> (map) runs a mapping function over the output
object<HashMap> =
    -'{' object_pair+ -'}'
    |<HashMap::from_iter>;
  • |?<...> (map_opt) runs a mapping function returning Option over the output
even_int<i32> =
    int
    |?<|v| (v & 1 == 0).then_some(v)>;
  • |!<...> (map_res) runs a mapping function returning Result over the output
number<i32> =
    ~([0-9]+)
    |!<i32::from_str>

§Example Usage

The main entrypoint is nbnf::nbnf, a proc macro that expands to parsers generated from the given grammar. Note that the input must be passed as a string (preferably a raw string,) as certain expressions which are valid grammars are invalid Rust (e.g. the unbalanced quote in [^"].)

use nbnf::nbnf;

nbnf!(r#"
    top = ~('a' top 'b') / ~&;
"#);

fn main() {
    let input = "aabbc";
    let (rest, output) = top.parse(input).unwrap();
    assert_eq!(rest, "c");
    assert_eq!(output, "aabb");
}

§Example JSON parser

use std::collections::HashMap;
use std::str::FromStr;

use nbnf::nbnf;
use nbnf::nom::IResult;
use nbnf::nom::bytes::complete::take_while;
use nbnf::nom::bytes::tag;
use nbnf::nom::combinator::value;
use nbnf::nom::multi::separated_list0;

fn main() {
	let input = r#"[1, 2.3, "four", {"five": false, "six": 7}, null, "abc \"def\" ghi\n"]"#;
	_ = dbg!(json.parse(input));
}

#[derive(Clone, Copy, Debug)]
pub enum Number {
	Int(i128),
	Float(f64),
}

#[derive(Clone, Debug)]
pub enum Json {
	Null,
	Bool(bool),
	Number(Number),
	String(String),
	Array(Vec<Json>),
	Object(HashMap<String, Json>),
}

#[rustfmt::skip]
nbnf!(r#"
	json<Json> = -ws json_inner -ws;
	json_inner<Json> =
		null /
		boolean /
		number /
		string|<Json::String> /
		array /
		object /
		nom::combinator::eof@<Json::Null>;

	null<Json> = "null"@<Json::Null>;
	boolean<Json> =
		"true"@<Json::Bool(true)> /
		"false"@<Json::Bool(false)>;

	number<Json> =
		(number_float / number_int / number_hex)
		|<Json::Number>;
	number_float<Number> =
		~([0-9]+ '.' [0-9]*)
		|!<|str| f64::from_str(str).map(Number::Float)>;
	number_int<Number> =
		~([0-9]+)
		|!<|str| i128::from_str(str).map(Number::Int)>;
	number_hex<Number> =
		(-('0' [xX]) ~([0-9a-fA-F]+))
		|!<|str| i128::from_str_radix(str, 16).map(Number::Int)>;

	string<String> =
		(-'"' string_inner* -'"')
		|<String::from_iter>;
	string_inner<char> =
		"\\\""@<'"'> /
		"\\n"@<'\n'> /
		"\\r"@<'\r'> /
		"\\t"@<'\t'> /
		"\\0"@<'\0'> /
		"\\"@<'\\'> /
		[^"];

	array<Json> =
		(-'[' array_inner -']')
		|<Json::Array>;

	object<Json> =
		(-'{' object_inner -'}')
		|<HashMap::from_iter>
		|<Json::Object>;
	object_pair<(String, Json)> = -ws string -ws -':' -ws json -ws;
"#);

fn array_inner(input: &str) -> IResult<&str, Vec<Json>> {
	separated_list0(tag(","), json).parse(input)
}

fn object_inner(input: &str) -> IResult<&str, Vec<(String, Json)>> {
	separated_list0(tag(","), object_pair).parse(input)
}

fn ws(input: &str) -> IResult<&str, ()> {
	value((), take_while(char::is_whitespace)).parse(input)
}

Re-exports§

pub extern crate nom;

Modules§

generator
lexer
parser

Macros§

nbnf
Expands to Rust code implementing the given grammar passed as a string literal.

Structs§

Grammar
A parsed grammar.

Enums§

Expr
A grammar expression.
Literal
A literal character, string, or character range to be matched.
Token
A parsed token.

Functions§

generate_parser
Generate a String of Rust source implememting parsers for the given grammar.
generate_parser_tokens
Generate a TokenStream with implementations for the given grammar.
lex
Parse a grammar into a list of tokens.
parse
Parse a list of tokens into a grammar.
parse_grammar
Shortcut to parse a grammar directly from a string.