1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
#![allow(clippy::cognitive_complexity)] #![allow(clippy::many_single_char_names)] #![allow(clippy::new_without_default)] //! A library for manipulating Context Free Grammars (CFG). It is impractical to fully homogenise //! all the types of grammars out there, so the aim is for different grammar types //! to have completely separate implementations. Code that wants to be generic over more than one //! grammar type can then use an "adapter" to homogenise the particular grammar types of interest. //! Currently this is a little academic, since only Yacc-style grammars are supported (albeit //! several variants of Yacc grammars). //! //! Unfortunately, CFG terminology is something of a mess. Some people use different terms for the //! same concept interchangeably; some use different terms to convey subtle differences of meaning //! (but without complete uniformity). "Token", "terminal", and "lexeme" are examples of this: they //! are synonyms in some tools and papers, but not in others. //! //! In order to make this library somewhat coherent, we therefore use some basic terminology //! guidelines for major concepts (acknowledging that this will cause clashes with some grammar //! types). //! //! * A *grammar* is an ordered sequence of *productions*. //! * A *production* is an ordered sequence of *symbols*. //! * A *rule* maps a name to one or more productions. //! * A *token* is the name of a syntactic element. //! //! For example, in the following Yacc grammar: //! //! R1: "a" "b" | R2; //! R2: "c"; //! //! the following statements are true: //! //! * There are 3 productions. 1: ["a", "b"] 2: ["R2"] 3: ["c"]` //! * There are two rules: R1 and R2. The mapping to productions is {R1: {1, 2}, R2: {3}} //! * There are three tokens: a, b, and c. //! //! cfgrammar makes the following guarantees about grammars: //! //! * Productions are numbered from `0` to `prods_len() - 1` (inclusive). //! * Rules are numbered from `0` to `rules_len() - 1` (inclusive). //! * Tokens are numbered from `0` to `toks_len() - 1` (inclusive). //! * The StorageT type used to store productions, rules, and token indices can be infallibly //! converted into usize (see [`TIdx`](struct.TIdx.html) and friends for more details). //! //! For most current uses, the main function to investigate is //! [`YaccGrammar::new()`](yacc/grammar/struct.YaccGrammar.html#method.new) and/or //! [`YaccGrammar::new_with_storaget()`](yacc/grammar/struct.YaccGrammar.html#method.new_with_storaget) //! which take as input a Yacc grammar. #[cfg(feature = "serde")] use serde::{Deserialize, Serialize}; mod idxnewtype; pub mod yacc; /// A type specifically for rule indices. pub use crate::idxnewtype::{PIdx, RIdx, SIdx, TIdx}; #[derive(Clone, Copy, Debug, Hash, Eq, PartialEq)] #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] pub enum Symbol<StorageT> { Rule(RIdx<StorageT>), Token(TIdx<StorageT>) }