Expand description
Language parsing tool (lang_pt) is a library to generate a recursive descent top-down parser to parse languages or text into Abstract Syntax Tree (AST).
Overview
Parsers written for the languages like Javascript are often custom handwritten due to the complexity of the languages. However, writing custom parser code often increases development and maintenance costs for the parser. With an intention to reduce development efforts, the library has been created for building a parser for a high-level language (HLL). The goal for this library is to develop a flexible library to support a wide range of grammar keeping a fair performance in comparison to a custom-written parser.
Design
A language parser is usually developed either by writing custom code by hand or using a parser generator tool. While building a parser using a parser generator, grammar for the language is implemented in a Domain Specific Language (DSL) specified by the generator tool. The generator will then compile the grammar and generate a parser code in the target runtime language. However, this parser library uses a set of production utilities to implement grammar in the rust programming language. Therefore, instead of writing grammar in the generator-specified language, one can make use of utilities like Concat, Union, etc. to implement concatenation and alternative production of symbols.
This parsing tool is also equipped with utilities like Lookahead, Validator, and NonStructural to support custom validation, precedence-based parsing, etc. This parsing library can be used to parse a wide range of languages which often require custom functionality to be injected into the grammar. Moreover, the library also includes production utilities like SeparatedList, and Suffixes, to ease writing grammar for a language.
Example
Following is the JSON program implementation using lang_pt.
// # Tokenization
use lang_pt::production::ProductionBuilder;
use lang_pt::{
lexeme::{Pattern, Punctuations},
production::{Concat, EOFProd, Node, SeparatedList, TokenField, TokenFieldSet, Union},
DefaultParser, NodeImpl, TokenImpl, Tokenizer,
};
use std::rc::Rc;
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]
// JSON token
pub enum JSONToken {
EOF,
String,
Space,
Colon,
Comma,
Number,
Constant,
OpenBrace,
CloseBrace,
OpenBracket,
CloseBracket,
}
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]
// Node value for AST
pub enum JSONNode {
Key,
String,
Number,
Constant,
Array,
Object,
Item,
Main,
NULL,
}
impl TokenImpl for JSONToken {
fn eof() -> Self { JSONToken::EOF }
fn is_structural(&self) -> bool {
match self {
JSONToken::Space => false,
_ => true,
}
}
}
impl NodeImpl for JSONNode {
fn null() -> Self { JSONNode::NULL }
}
let punctuations = Rc::new(
Punctuations::new(vec![
("{", JSONToken::OpenBrace),
("}", JSONToken::CloseBrace),
("[", JSONToken::OpenBracket),
("]", JSONToken::CloseBracket),
(",", JSONToken::Comma),
(":", JSONToken::Colon),
])
.unwrap(),
);
let dq_string = Rc::new(
Pattern::new(
JSONToken::String,
r#"^"([^"\\\r\n]|(\\[^\S\r\n]*[\r\n][^\S\r\n]*)|\\.)*""#, //["\\bfnrtv]
)
.unwrap(),
);
let lex_space = Rc::new(Pattern::new(JSONToken::Space, r"^\s+").unwrap());
let number_literal = Rc::new(
Pattern::new(JSONToken::Number, r"^([0-9]+)(\.[0-9]+)?([eE][+-]?[0-9]+)?").unwrap(),
);
let const_literal = Rc::new(Pattern::new(JSONToken::Constant, r"^(true|false|null)").unwrap());
let tokenizer = Tokenizer::new(vec![
lex_space,
punctuations,
dq_string,
number_literal,
const_literal,
]);
// # Parser
let eof = Rc::new(EOFProd::new(None));
let json_key = Rc::new(TokenField::new(JSONToken::String, Some(JSONNode::Key)));
let json_primitive_values = Rc::new(TokenFieldSet::new(vec![
(JSONToken::String, Some(JSONNode::String)),
(JSONToken::Constant, Some(JSONNode::Constant)),
(JSONToken::Number, Some(JSONNode::Number)),
]));
let hidden_open_brace = Rc::new(TokenField::new(JSONToken::OpenBrace, None));
let hidden_close_brace = Rc::new(TokenField::new(JSONToken::CloseBrace, None));
let hidden_open_bracket = Rc::new(TokenField::new(JSONToken::OpenBracket, None));
let hidden_close_bracket = Rc::new(TokenField::new(JSONToken::CloseBracket, None));
let hidden_comma = Rc::new(TokenField::new(JSONToken::Comma, None));
let hidden_colon = Rc::new(TokenField::new(JSONToken::Colon, None));
let json_object = Rc::new(Concat::init("json_object"));
let json_value_union = Rc::new(Union::init("json_value_union"));
let json_object_item = Rc::new(Concat::new(
"json_object_item",
vec![
json_key.clone(),
hidden_colon.clone(),
json_value_union.clone(),
],
));
let json_object_item_node = Rc::new(Node::new(&json_object_item, Some(JSONNode::Item)));
let json_object_item_list =
Rc::new(SeparatedList::new(&json_object_item_node, &hidden_comma, true).into_nullable());
let json_array_item_list =
Rc::new(SeparatedList::new(&json_value_union, &hidden_comma, true).into_nullable());
let json_array_node = Rc::new(
Concat::new(
"json_array",
vec![
hidden_open_bracket.clone(),
json_array_item_list.clone(),
hidden_close_bracket.clone(),
],
)
.into_node(Some(JSONNode::Array)),
);
let json_object_node = Rc::new(Node::new(&json_object, Some(JSONNode::Object)));
json_value_union
.set_symbols(vec![
json_primitive_values.clone(),
json_object_node.clone(),
json_array_node.clone(),
])
.unwrap();
json_object
.set_symbols(vec![
hidden_open_brace.clone(),
json_object_item_list,
hidden_close_brace.clone(),
])
.unwrap();
let main = Rc::new(Concat::new("root", vec![json_value_union, eof]));
let main_node = Rc::new(Node::new(&main, Some(JSONNode::Main)));
let parser = DefaultParser::new(Rc::new(tokenizer), main_node).unwrap();