1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
#![deny(missing_docs)] //! # Piston-Meta //! A DSL parsing library for human readable text documents //! //! ### Introduction //! //! Piston-Meta makes it easy to write parsers for human readable text documents. //! It can be used for language design, custom formats and data driven development. //! //! Meta parsing is a development technique that goes back to the first modern computer. //! The idea is to turn pieces of a computer program into a programmable pipeline, //! thereby accelerating development. //! An important, but surprisingly reusable part across projects, is the concept of generating //! structured data from text, since text is easy to modify and reason about. //! //! Most programs that work with text use the following pipeline: //! //! ```ignore //! f : text -> data //! ``` //! //! The problem with this approach is that `f` changes from project to project, //! and the task of transforming text into a data structure can get very complex. //! For example, to create a parser for the syntax of a programming language, //! one might need several thousands lines of code. //! This slows down development and increases the chance of making errors. //! //! Meta parsing is a technique where `f` gets split into two steps: //! //! ```ignore //! f <=> f2 . f1 //! f1 : text -> meta data //! f2 : meta data -> data //! ``` //! //! The first step `f1` takes text and converts it into meta data. //! A DSL (Domain Specific Language) is used to describe how this transformation happens. //! The second step `f2` converts meta data into data, and this is often written as code. //! //! ### Rules //! //! The meta language is used to describe how to read other documents. //! First you define some strings to reuse, then some node rules. //! The last node is used to read the entire document. //! //! `20 document = [.l(string:"string") .l(node:"node") .w?]` //! //! Strings start with an underscore and can be reused among the rules: //! //! `_opt: "optional"` //! //! Nodes start with a number that gets multiplied with 1000 and used as debug id. //! If you get an error `#4003`, then it was caused by a rule in the node starting with 4. //! //! |Rule|Description| //! |----|-----------| //! |.l(rule)|Separates sub rule with lines.| //! |.l+(rule)|Separates sub rule with lines, with indention (whitespace sensitive)| //! |.r?(rule)|Repeats sub rule until it fails, allows zero repetitions.| //! |.r!(rule)|Repeats sub rule until it fails, requires at least one repetition.| //! |...any_characters?:name|Reads a string until any characters, allows zero characters. Name is optional.| //! |...any_characters!:name|Reads a string until any characters, requires at least one character. Name is optional.| //! |..any_characters?:name|Reads a string until any characters or whitespace, allows zero characters. Name is optional.| //! |..any_characters!:name|Reads a string until any characters or whitespace, requires at least one character. Name is optional.| //! |.w?|Reads whitespace. The whitespace is optional.| //! |.w!|Reads whitespace. The whitespace is required.| //! |?rule|Makes the rule optional.| //! |"token":name|Expects a token, sets name to `true`. Name is optional.| //! |"token":!name|Expects a token, sets name to `false`. Name is required.| //! |!"token":name|Fails if token is read, sets name to `true` if it is not read. Name is optional.| //! |!"token":!name|Fails if token is read, sets name to `false` if it is not read. Name is required.| //! |!rule|Fails if rule is read.| //! |.s?(by_rule rule)|Separates rule by another rule, allows zero repetitions.| //! |.s!(by_rule rule)|Separates rule by another rule, requires at least one repetition.| //! |.s?.(by_rule rule)|Separates rule by another rule, allows trailing.| //! |{rules}|Selects a rule. Tries the first rule, then the second, etc. Rules are separated by whitespace.| //! |[rules]|A sequence of rules. Rules are separated by whitespace.| //! |node|Uses a node without a name. The read data is put in the current node.| //! |node:name|Uses a node with a name. The read data is put in a new node with the name.| //! |.t?:name|Reads a JSON string with a name. The string can be empty. Name is optional.| //! |.t!:name|Reads a JSON string with a name. The string can not be empty. Name is optional.| //! |.$:name|Reads a number with a name. The name is optional.| //! |.$_:name|Reads a number with underscore as visible separator, for example `10_000`. The name is optional.| //! //! ### "Hello world" in Piston-Meta //! //! ```rust //! extern crate piston_meta; //! //! use piston_meta::*; //! //! fn main() { //! let text = r#"hi James!"#; //! let rules = r#" //! 1 say_hi = ["hi" .w? {"James":"james" "Peter":"peter"} "!"] //! 2 document = say_hi //! "#; //! // Parse rules with meta language and convert to rules for parsing text. //! let rules = match syntax_errstr(rules) { //! Err(err) => { //! println!("{}", err); //! return; //! } //! Ok(rules) => rules //! }; //! let mut data = vec![]; //! match parse_errstr(&rules, text, &mut data) { //! Err(err) => { //! println!("{}", err); //! return; //! } //! Ok(()) => {} //! }; //! json::print(&data); //! } //! ``` //! //! ### Bootstrapping //! //! When the meta language changes, bootstrapping is used to hoist the old meta syntax into the new meta syntax. Here is how it works: //! //! 1. Piston-Meta contains composable rules that can parse many human readable text formats. //! 2. Piston-Meta knows how to parse and convert to its own rules, known as "bootstrapping". //! 3. Therefore, you can tell Piston-Meta how to parse other text formats using a meta language! //! 4. Including the text format describing how to parse its own syntax, which generates equivalent rules to the ones hard coded in Rust. //! 5. New versions of the meta language can describe older versions to keep backwards compatibility, by changing the self syntax slightly, so it can read an older version of itself. //! extern crate read_token; extern crate range; #[macro_use] extern crate lazy_static; pub use parse_error_handler::{ stderr_unwrap, ParseErrorHandler }; pub use parse_error::ParseError; pub use meta_rules::{ parse, parse_errstr, parse_errstr_with_indent, parse_with_indent, Rule }; pub use bootstrap::Convert; /// The type of debug id used to track down errors in rules. pub type DebugId = usize; use std::sync::Arc; use std::fs::File; use std::path::Path; pub use range::Range; pub mod bootstrap; pub mod json; pub mod meta_rules; pub mod tokenizer; mod parse_error; mod parse_error_handler; pub mod optimize; mod all { pub use super::*; } /// Represents meta data. #[derive(PartialEq, Clone, Debug)] pub enum MetaData { /// Starts node. StartNode(Arc<String>), /// Ends node. EndNode(Arc<String>), /// Sets bool property. Bool(Arc<String>, bool), /// Sets f64 property. F64(Arc<String>, f64), /// Sets string property. String(Arc<String>, Arc<String>), } /// Stores syntax. #[derive(Clone, Debug, PartialEq)] pub struct Syntax { /// Rule data. pub rules: Vec<Rule>, /// Name of rules. pub names: Vec<Arc<String>>, } impl Syntax { /// Creates a new syntax. pub fn new() -> Syntax { Syntax { rules: vec![], names: vec![] } } /// Adds a new rule. pub fn push(&mut self, name: Arc<String>, rule: Rule) { self.rules.push(rule); self.names.push(name); } /// Optimizes syntax. pub fn optimize(self) -> Syntax { let new_rules = self.rules.iter().map(|r| optimize::optimize_rule(&r, &self.rules)).collect(); Syntax {rules: new_rules, names: self.names} } } /// Reads syntax from text. pub fn syntax(rules: &str) -> Result<Syntax, Range<ParseError>> { lazy_static! { static ref BOOTSTRAP_RULES: Syntax = bootstrap::rules().optimize(); } let mut tokens = vec![]; parse(&BOOTSTRAP_RULES, rules, &mut tokens)?; let mut ignored_meta_data = vec![]; match bootstrap::convert(&tokens, &mut ignored_meta_data) { Ok(res) => Ok(res.optimize()), Err(()) => Err(Range::empty(0).wrap(ParseError::Conversion( format!("Bootstrapping rules are incorrect")))) } } /// Reads syntax from text, formatting the error as `String`. pub fn syntax_errstr(rules: &str) -> Result<Syntax, String> { match syntax(rules) { Ok(syntax) => Ok(syntax), Err(range_err) => { let mut w: Vec<u8> = vec![]; ParseErrorHandler::new(&rules).write(&mut w, range_err).unwrap(); Err(String::from_utf8(w).unwrap()) } } } /// Convenience method for loading data, using the meta language. /// Panics if there is an error, and writes error message to /// standard error output. pub fn load_syntax_data<A, B>( syntax_path: A, data_path: B ) -> Vec<Range<MetaData>> where A: AsRef<Path>, B: AsRef<Path> { use std::io::Read; let mut syntax_file = File::open(syntax_path).unwrap(); let mut s = String::new(); syntax_file.read_to_string(&mut s).unwrap(); let rules = stderr_unwrap(&s, syntax(&s)); let mut data_file = File::open(data_path).unwrap(); let mut d = String::new(); data_file.read_to_string(&mut d).unwrap(); let mut tokens = vec![]; stderr_unwrap(&d, parse(&rules, &d, &mut tokens)); tokens } #[cfg(test)] mod tests { use super::*; fn is_thread_safe<T: Send + Sync>() {} #[test] fn meta_data_thread_safe() { is_thread_safe::<MetaData>(); } #[test] fn parse_error_thread_safe() { is_thread_safe::<ParseError>(); } #[test] fn rule_thread_safe() { is_thread_safe::<Rule>(); } #[test] fn syntax_thread_safe() { is_thread_safe::<Syntax>(); } }