1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241
//! # RustyLR
//! GLR, LR(1) and LALR(1) parser generator for Rust.
//!
//! RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate GLR, LR(1) and LALR(1) parser.
//! The generated parser will be a pure Rust code, and the calculation of building DFA will be done at compile time.
//! Reduce action can be written in Rust,
//! and the error messages are **readable and detailed**.
//! For huge and complex grammars, it is recommended to use the [buildscipt](#integrating-with-buildrs).
//!
//! #### `features` in `Cargo.toml`
//! - `build` : Enable buildscript tools.
//! - `fxhash` : In parser table, replace `std::collections::HashMap` with `FxHashMap` from [`rustc-hash`](https://github.com/rust-lang/rustc-hash).
//! - `tree` : Enable automatic Tree construction.
//! This feature should be used on debug purpose only, since it will consume much more memory and time.
//!
//! ## Features
//! - pure Rust implementation
//! - readable error messages, both for grammar building and parsing
//! - compile-time DFA construction from CFGs
//! - customizable reduce action
//! - resolving conflicts of ambiguous grammar
//! - regex patterns partially supported
//! - tools for integrating with `build.rs`
//!
//! ## proc-macro
//! Below procedural macros are provided:
//! - [`lr1!`] : LR(1) parser
//! - [`lalr1!`] : LALR(1) parser
//!
//! These macros will generate structs:
//! - `Parser` : contains DFA tables and production rules
//! - `ParseError` : type alias for `Error` returned from `feed()`
//! - `Context` : contains current state and data stack
//! - `enum NonTerminals` : a list of non-terminal symbols
//! - [`Rule`](`ProductionRule`) : type alias for production rules
//! - `State` : type alias for DFA states
//!
//! All structs above are prefixed by `<StartSymbol>`.
//! In most cases, what you want is the `Parser` and `ParseError` structs, and the others are used internally.
//!
//! ## Integrating with `build.rs`
//! This buildscripting tool will provide much more detailed, pretty-printed error messages than the procedural macros.
//! If you are writing a huge, complex grammar, it is recommended to use buildscript than the procedural macros.
//! Generated code will contain the same structs and functions as the procedural macros. In your actual source code, you can `include!` the generated file.
//!
//! The program searches for `%%` in the input file, not the `lr1!`, `lalr1!` macro.
//! The contents before `%%` will be copied into the output file as it is.
//! And the context-free grammar must be followed by `%%`.
//!
//! ```rust
//! // parser.rs
//! use some_crate::some_module::SomeStruct;
//!
//! enum SomeTypeDef {
//! A,
//! B,
//! C,
//! }
//!
//! %% // <-- input file splitted here
//!
//! %tokentype u8;
//! %start E;
//! %eof b'\0';
//!
//! %token a b'a';
//! %token lparen b'(';
//! %token rparen b')';
//!
//! E: lparen E rparen
//! | P
//! ;
//!
//! P: a;
//! ```
//!
//! You must enable the feature `build` to use in the build script.
//! ```toml
//! [build-dependencies]
//! rusty_lr = { version = "...", features = ["build"] }
//! ```
//!
//! ```rust
//! // build.rs
//! use rusty_lr::build;
//!
//! fn main() {
//! println!("cargo::rerun-if-changed=src/parser.rs");
//!
//! let output = format!("{}/parser.rs", std::env::var("OUT_DIR").unwrap());
//! build::Builder::new()
//! .file("src/parser.rs") // path to the input file
//! // .lalr() // to generate LALR(1) parser
//! .build(&output); // path to the output file
//! }
//! ```
//!
//! In your source code, include the generated file.
//! ```rust
//! include!(concat!(env!("OUT_DIR"), "/parser.rs"));
//! ```
//!
//! ## Start Parsing
//! The `Parser` struct has the following functions:
//! - `new()` : create new parser
//! - `begin(&self)` : create new context
//! - `feed(&self, &mut Context, TerminalType, &mut UserData) -> Result<(), ParseError>` : feed token to the parser
//!
//! Note that the parameter `&mut UserData` is omitted if `%userdata` is not defined.
//! All you need to do is to call `new()` to generate the parser, and `begin()` to create a context.
//! Then, you can feed the input sequence one by one with `feed()` function.
//! Once the input sequence is feeded (including `eof` token), without errors,
//! you can get the value of start symbol by calling `context.accept()`.
//!
//! ```rust
//! let parser = Parser::new();
//! let context = parser.begin();
//! for token in input_sequence {
//! match parser.feed(&context, token) {
//! Ok(_) => {}
//! Err(e) => { // e: ParseError
//! println!("{}", e);
//! return;
//! }
//! }
//! }
//! let start_symbol_value = context.accept();
//! ```
//!
//!
//! ## Syntax Tree
//! With the `tree` feature, `feed()` function will automatically construct the parse tree.
//! By calling `context.to_tree_list()`,
//! you can get current syntax tree. Simply print the tree list with `Display` or `Debug` will give you the pretty-printed tree.
//!
//! ```rust
//! let parser = Parser::new();
//! let mut context = parser.begin();
//! /// feed tokens...
//! println!( "{:?}", context.to_tree_list() ); // print tree list with `Debug` trait
//! println!( "{}", context.to_tree_list() ); // print tree list with `Display` trait
//! ```
//!
//! ```text
//! TreeList
//! ├─A
//! │ └─M
//! │ └─P
//! │ └─Number
//! │ ├─WS0
//! │ │ └─_space_Star1
//! │ │ └─_space_Plus0
//! │ │ ├─_space_Plus0
//! │ │ │ └─' '
//! │ │ └─' '
//! │ ├─_Digit_Plus3
//! │ │ └─Digit
//! │ │ └─_TerminalSet2
//! │ │ └─'1'
//! │ └─WS0
//! │ └─_space_Star1
//! │ └─_space_Plus0
//! │ └─' '
//! ├─'+'
//! ├─M
//! │ └─P
//! │ └─Number
//! │ ├─WS0
//! │ │ └─_space_Star1
//! │ │ └─_space_Plus0
//! │ │ ├─_space_Plus0
//! ... continue
//! ```
//!
//! Note that default `Display` and `Debug` trait will print the whole tree recursively.
//! If you want to limit the depth of the printed tree, you can use [`Tree::pretty_print()`] or [`TreeList::pretty_print()`] function with `max_level` parameter.
//!
//!
//! ## GLR Parser
//! The GLR (Generalized LR parser) can be generated by `%glr;` directive in the grammar.
//! ```
//! // generate GLR parser;
//! // from now on, shift/reduce, reduce/reduce conflicts will not be treated as errors
//! %glr;
//! ...
//! ```
//! GLR parser can handle ambiguous grammars that LR(1) or LALR(1) parser cannot.
//! When it encounters any kind of conflict during parsing,
//! the parser will diverge into multiple states, and will try every paths until it fails.
//! Of course, there must be single unique path left at the end of parsing (the point where you feed `eof` token).
//!
//! ### Resolving Ambiguities
//! You can resolve the ambiguties through the reduce action.
//! Simply, returning `Result::Err(Error)` from the reduce action will revoke current path.
//! The `Error` variant type can be defined by `%err` directive.
//!
//! ### Note on GLR Parser
//! - Still in development, not have been tested enough (patches are welcome!).
//! - Since there are multiple paths, the reduce action can be called multiple times, even if the result will be thrown away in the future.
//! - Every `RuleType` and `Term` must implement `Clone` trait.
//! - `clone()` will be called carefully, only when there are multiple paths.
//! - User must be aware of the point where shift/reduce or reduce/reduce conflicts occur.
//! Every time the parser diverges, the calculation cost will increase.
//!
//!
//!
//! ## Syntax
//! To start writing down a context-free grammar, you need to define necessary directives first.
//! This is the syntax of the procedural macros.
//!
//! ```rust
//! lr1! {
//! // %directives
//! // %directives
//! // ...
//! // %directives
//!
//! // NonTerminalSymbol(RuleType): ProductionRules
//! // NonTerminalSymbol(RuleType): ProductionRules
//! // ...
//! }
//! ```
//!
//! [`lr1!`] macro will generate a parser struct with LR(1) DFA tables.
//! If you want to generate LALR(1) parser, use [`lalr1!`] macro.
//! Every line in the macro must follow the syntax below.
//!
//! Syntax can be found in [repository](https://github.com/ehwan/RustyLR/tree/main?tab=readme-ov-file#syntax).
//!
//!
// re-exports
pub use rusty_lr_core::*;
pub use rusty_lr_derive::*;
/// tools for build.rs
#[cfg(feature = "build")]
pub mod build {
pub use rusty_lr_buildscript::*;
}