Skip to main content

parol/
build.rs

1//! Allows programmatically invoking parol from a `build.rs` script
2//!
3//! The process of invoking a grammar starts with a [`struct@Builder`] and one of two output modes:
4//! 1. Cargo build script output mode, via [Builder::with_cargo_script_output] (easiest)
5//! 2. Explicitly specifying an output directory via [Builder::with_explicit_output_dir]
6//!
7//! ## Cargo integration
8//! If this API detects it is running inside a
9//! [Cargo `build.rs` script](https://doc.rust-lang.org/stable/cargo/reference/build-scripts.html),
10//! then it implicitly enables cargo integration.
11//!
12//! This has Cargo *automatically* regenerate the parser sources whenever the grammar changes. This
13//! is done by implicitly outputting the appropriate
14//! [`rerun-if-changed=<grammar>`](https://doc.rust-lang.org/stable/cargo/reference/build-scripts.html#change-detection)
15//! instructions to Cargo.
16//!
17//! ### Defaults
18//! When using [`Builder::with_cargo_script_output`], a number of reasonable defaults are set:
19//!
20//! By default, the output directory is set to the `OUT_DIR` environment variable.
21//! By default, the generated parser name is `parser.rs` and the generated grammar action file is `
22//!
23//! You can
24//! ```ignore
25//! mod parser {
26//!     include!(concat!(env!("OUT_DIR"), "/parser.rs"));
27//! }
28//! ```
29//!
30//! ### Tradeoffs
31//! The disadvantage of using this mode (or using Cargo build scripts in general),
32//! is that it adds the `parol` crate as an explicit build dependency.
33//!
34//! Although this doesn't increase the runtime binary size, it does increase the initial compile
35//! times.
36//! If someone just wants to `cargo install <your crate>`, Cargo will have to download and execute
37//! `parol` to generate your parser code.
38//!
39//! Contributors to your project (who modify your grammar) will have to download and invoke parol
40//! anyways, so this cost primarily affects initial compile times. Also cargo is very intelligent
41//! about caching build script outputs.
42//!
43//! Despite the impact on initial compiles, this is somewhat traditional in the Rust community.
44//! It's [the recommended way to use `bindgen`](https://rust-lang.github.io/rust-bindgen/library-usage.html)
45//! and it's the only way to use [`pest`](https://pest.rs/).
46//!
47//! If you are really concerned about compile times, you can use explicit output (below).
48//!
49//! ## Explicitly controlling Output Locations
50//! If you want more control over the location of generated grammar files,
51//! you can invoke [`Builder::with_explicit_output_dir`] to explicitly set an output directory.
52//!
53//! In addition you must explicitly name your output parser and action files,
54//! or the configuration will give an error.
55//!
56//! This is used to power the command line `parol` tool, and is useful for additional control.
57//!
58//! Any configured *output* paths (including generated parsers, expanded grammars, etc)
59//! are resolved relative to this base output using [Path::join]. This means that specifying
60//! absolute paths overrides this explicit base directory.
61//!
62//! The grammar input file is resolved in the regular manner.
63//! It does not use the "output" directory.
64//!
65//! ### Interaction with version control
66//! When using [`Builder::with_cargo_script_output`], the output is put in a subdir of the `target`
67//! directory and excluded from version control.
68//!
69//! This is useful if you want to ignore changes in generated code.
70//!
71//! However, when specifying an explicit output directory (with [`Builder::with_explicit_output_dir`]),
72//! you may have to include the generated sources explicitly into the build process. One way is
73//! indicated above where the include! macro is used.
74//!
75//! Otherwise, you would probably set the output to a sub-directory of `src`.
76//! This means that files are version controlled and you would have to commit them whenever changes
77//! are made.
78//!
79//! ## Using the CLI directly
80//! Note that explicitly specifying the output directory doesn't avoid running parol on `cargo
81//! install`.
82//!
83//! It does not increase the initial build speed, and still requires compiling and invoking `parol`.
84//!
85//! If you really want to avoid adding `parol` as a build dependency,
86//! you need to invoke the CLI manually to generate the parser sources ahead of time.
87//!
88//! Using a build script requires adding a build dependency, and cargo will unconditionally execute
89//! build scripts on first install.
90//! While Cargo's build script caching is excellent, it only activates on recompiles.
91//!
92//! As such, using the CLI manually is really the only way to improve (initial) compile times.
93//!
94//! It is (often) not worth it, because it is inconvenient, and the impact only happens on *initial* compiles.
95//!
96//! ## API Completeness
97//! Anything you can do with the main `parol` executable, you should also be able to do with this API.
98//!
99//! That is because the main executable is just a wrapper around the API
100//!
101//! However, a couple more advanced features use unstable/internal APIs (see below).
102//!
103//! As a side note, the CLI does not require you to specify an output location.
104//! You can run `parol -f grammar.parol` just fine and it will generate no output.
105//!
106//! In build scripts, this is typically a mistake (so it errors by default).
107//! If you want to disable this sanity check, use [`Builder::disable_output_sanity_checks`]
108//!
109//! ### Internal APIs
110//! The main `parol` command needs a couple of features that do not fit nicely into this API
111//! (or interact closely with the crate's internals).
112//!
113//!
114//! Because of that, there are a number of APIs explicitly marked as unstable or internal.
115//! Some of these are public and some are private.
116//!
117//! Expect breaking changes both before and after 1.0 (but especially before).
118#![deny(missing_docs)]
119
120use std::collections::BTreeMap;
121use std::convert::TryFrom;
122use std::path::{Path, PathBuf};
123use std::{env, fs};
124
125use crate::config::{CommonGeneratorConfig, ParserGeneratorConfig, UserTraitGeneratorConfig};
126use crate::generators::export_node_types::{NodeTypesExporter, NodeTypesInfo};
127use crate::generators::node_kind_enum_generator::NodeKindTypesGenerator;
128use crate::parser::GrammarType;
129use crate::{
130    GrammarConfig, GrammarTypeInfo, LRParseTable, LookaheadDFA, MAX_K, ParolGrammar,
131    UserTraitGenerator,
132};
133use clap::{Parser, ValueEnum};
134use parol_macros::parol;
135use parol_runtime::{ParseTree, Result};
136
137/// Contains all attributes that should be inserted optionally on top of the generated trait source.
138/// * Used in the Builder API. Therefore it mus be public
139#[derive(Clone, Debug, Parser, ValueEnum)]
140pub enum InnerAttributes {
141    /// Suppresses clippy warnings like these: `warning: this function has too many arguments (9/7)`
142    AllowTooManyArguments,
143}
144
145impl std::fmt::Display for InnerAttributes {
146    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
147        match self {
148            InnerAttributes::AllowTooManyArguments => {
149                write!(f, "#![allow(clippy::too_many_arguments)]")
150            }
151        }
152    }
153}
154
155/// The default maximum lookahead
156///
157/// This is used both for the CLI and for the builder.
158pub const DEFAULT_MAX_LOOKAHEAD: usize = 5;
159/// The default name of the generated grammar module.
160pub const DEFAULT_MODULE_NAME: &str = "grammar";
161/// The default name of the user type that implements grammar parsing.
162pub const DEFAULT_USER_TYPE_NAME: &str = "Grammar";
163
164fn is_build_script() -> bool {
165    // Although only `OUT_DIR` is necessary for our purposes, it's possible someone else set it.
166    // Check for a second one to make sure we're actually running under cargo
167    // See full list of environment variables here: https://is.gd/K6LyzQ
168    env::var_os("OUT_DIR").is_some() && env::var_os("CARGO_MANIFEST_DIR").is_some()
169}
170
171/// Builds the configuration for generating and analyzing `parol` grammars.
172///
173/// A grammar file is required for almost all possible operations (set with [Builder::grammar_file])
174///
175/// Does not actually generate anything until finished.
176#[derive(Clone)]
177pub struct Builder {
178    /// The base output directory
179    output_dir: PathBuf,
180    grammar_file: Option<PathBuf>,
181    /// Output file for expanded grammar
182    expanded_grammar_output_file: Option<PathBuf>,
183    /// Output file for the generated parser source
184    parser_output_file: Option<PathBuf>,
185    /// Output file for the generated actions files.
186    actions_output_file: Option<PathBuf>,
187    /// The output file for the generated syntree node wrappers
188    node_kind_enum_output_file: Option<PathBuf>,
189    pub(crate) user_type_name: String,
190    pub(crate) module_name: String,
191    cargo_integration: bool,
192    max_lookahead: usize,
193    /// By default, we want to require that the parser output file is specified.
194    /// Otherwise we're just wasting time outputting to /dev/null.
195    ///
196    /// The CLI needs to be able to override this (mostly for debugging), hence the option.
197    output_sanity_checks: bool,
198    /// Activate the minimization of boxed types in the generated parser
199    pub(crate) minimize_boxed_types: bool,
200    /// Internal debugging for CLI.
201    debug_verbose: bool,
202    /// Generate range information for AST types
203    range: bool,
204    /// Generate typed syntree node wrappers
205    enum_kind: bool,
206    /// Inner attributes to insert at the top of the generated trait source.
207    inner_attributes: Vec<InnerAttributes>,
208    /// Enables trimming of the parse tree during parsing.
209    /// Generates the call to trim_parse_tree on the parser object before the call of parse.
210    pub(crate) trim_parse_tree: bool,
211    /// Disbales the error recovery mechanism in the generated parser
212    pub(crate) disable_recovery: bool,
213    /// The language to generate code for
214    pub(crate) language: crate::config::Language,
215}
216
217impl Builder {
218    /// Create a new builder fr use in a Cargo build script (`build.rs`).
219    ///
220    /// This is the recommended default way to get started.
221    ///
222    /// All the outputs are set relative to the `OUT_DIR` environment variable,
223    /// as is standard for [Cargo build script outputs](https://doc.rust-lang.org/stable/cargo/reference/build-scripts.html#outputs-of-the-build-script).
224    ///
225    /// This sets sensible defaults for every output file name.
226    ///
227    /// | Method name                    | CLI Option           | Default (relative) name |
228    /// | -------------------------------|----------------------|-------------------------|
229    /// | `parser_output_file`           | `--parser` or `-p`   | "parser.rs"             |
230    /// | `actions_output_file`          | `--actions` or `-a`  | "grammar_trait.rs"      |
231    /// | `expanded_grammar_output_file` | `--expanded` or `-e` | "grammar-exp.par"       |
232    ///
233    ///
234    /// See the module documentation for how to include these files into your project.
235    ///
236    /// Panics if used outside of a cargo build script.
237    pub fn with_cargo_script_output() -> Self {
238        assert!(is_build_script(), "Cannot use outside of a cargo script");
239        // Don't worry! $OUT_DIR is unique for every
240        let out_dir = env::var_os("OUT_DIR").unwrap();
241        let mut builder = Self::with_explicit_output_dir(out_dir);
242        // Set those reasonable defaults we promised
243        builder
244            .parser_output_file("parser.rs")
245            .actions_output_file("grammar_trait.rs")
246            .node_kind_enums_output_file("node_kind.rs")
247            .expanded_grammar_output_file("grammar-exp.par");
248        // Cargo integration should already be enabled (because we are a build script)
249        assert!(builder.cargo_integration);
250        builder
251    }
252    /// Internal utility to resolve a path relative to the output directory
253    fn resolve_output_path(&self, p: impl AsRef<Path>) -> PathBuf {
254        self.output_dir.join(p)
255    }
256    /// Create a new builder with an explicitly specified output directory.
257    ///
258    /// This requires that output files be specified explicitly,
259    /// unless this check is disabled with [`Builder::disable_output_sanity_checks`]
260    ///
261    /// If this detects running inside a build script,
262    /// it will automatically enable cargo integration.
263    ///
264    /// If output files are specified using absolute paths,
265    /// it overrides this explicit output dir.
266    ///
267    /// See module docs on "explicit output mode" for more details.
268    pub fn with_explicit_output_dir(output: impl AsRef<Path>) -> Self {
269        /*
270         * Most of these correspond to CLI options.
271         */
272        Builder {
273            output_dir: PathBuf::from(output.as_ref()),
274            grammar_file: None,
275            cargo_integration: is_build_script(),
276            debug_verbose: false,
277            range: false,
278            enum_kind: false,
279            max_lookahead: DEFAULT_MAX_LOOKAHEAD,
280            module_name: String::from(DEFAULT_MODULE_NAME),
281            user_type_name: String::from(DEFAULT_USER_TYPE_NAME),
282            // In this mode, the user must specify explicit outputs.
283            // The default is /dev/null (`None`)
284            parser_output_file: None,
285            actions_output_file: None,
286            node_kind_enum_output_file: None,
287            expanded_grammar_output_file: None,
288            minimize_boxed_types: false,
289            inner_attributes: Vec::new(),
290            // By default, we require that output files != /dev/null
291            output_sanity_checks: true,
292            trim_parse_tree: false,
293            disable_recovery: false,
294            language: crate::config::Language::Rust,
295        }
296    }
297    /// By default, we require that the generated parser and action files are not discarded.
298    ///
299    /// This disables that check (used for the CLI).
300    ///
301    /// NOTE: When using [`Builder::with_cargo_script_output`], these are automatically inferred.
302    pub fn disable_output_sanity_checks(&mut self) -> &mut Self {
303        self.output_sanity_checks = false;
304        self
305    }
306    /// Set the output location for the generated parser.
307    ///
308    /// If you are using [Builder::with_cargo_script_output],
309    /// the default output is "$OUT_DIR/parser.rs".
310    ///
311    /// If you are using an explicitly specified output directory, then this option is *required*.
312    pub fn parser_output_file(&mut self, p: impl AsRef<Path>) -> &mut Self {
313        self.parser_output_file = Some(self.resolve_output_path(p));
314        self
315    }
316    /// Set the actions output location for the generated parser.
317    ///
318    /// If you are using [Builder::with_cargo_script_output],
319    /// the default output is "$OUT_DIR/grammar_trait.rs".
320    ///
321    /// If you are using an explicitly specified output directory, then this option is *required*.
322    pub fn actions_output_file(&mut self, p: impl AsRef<Path>) -> &mut Self {
323        self.actions_output_file = Some(self.resolve_output_path(p));
324        self
325    }
326    /// Set the actions output location for the generated parser.
327    ///
328    /// If you are using [Builder::with_cargo_script_output],
329    /// the default output is "$OUT_DIR/grammar-exp.par".
330    ///
331    /// Otherwise, this is ignored.
332    pub fn expanded_grammar_output_file(&mut self, p: impl AsRef<Path>) -> &mut Self {
333        self.expanded_grammar_output_file = Some(self.resolve_output_path(p));
334        self
335    }
336    /// Set the output location for the generated node kind enum.
337    /// The output does not contain any `parol_runtime` dependencies, so you can specify "../other_crate/src/node_kind.rs" as the output file while the other crate does not have `parol_runtime` as a dependency.
338    ///
339    /// The default location is "$OUT_DIR/node_kind.rs".
340    pub fn node_kind_enums_output_file(&mut self, p: impl AsRef<Path>) -> &mut Self {
341        self.node_kind_enum_output_file = Some(self.resolve_output_path(p));
342        self
343    }
344    /// Explicitly enable/disable cargo integration.
345    ///
346    /// This is automatically set to true if you are running a build script,
347    /// and is `false` otherwise.
348    pub fn set_cargo_integration(&mut self, enabled: bool) -> &mut Self {
349        self.cargo_integration = enabled;
350        self
351    }
352    /// Set the grammar file used as input for parol.
353    ///
354    /// This is required for most operations.
355    ///
356    /// Does not check that the file exists.
357    pub fn grammar_file(&mut self, grammar: impl AsRef<Path>) -> &mut Self {
358        self.grammar_file = Some(PathBuf::from(grammar.as_ref()));
359        self
360    }
361    /// Set the name of the user type that implements the language processing
362    pub fn user_type_name(&mut self, name: &str) -> &mut Self {
363        self.user_type_name = name.into();
364        self
365    }
366    /// Set the name of the user module that implements the language processing
367    ///
368    /// This is the module that contains the [Self::user_type_name]
369    pub fn user_trait_module_name(&mut self, name: &str) -> &mut Self {
370        self.module_name = name.into();
371        self
372    }
373    /// Set the maximum lookahead for the generated parser.
374    ///
375    /// If nothing is specified, the default lookahead is [DEFAULT_MAX_LOOKAHEAD].
376    ///
377    /// Returns a [BuilderError] if the lookahead is greater than [crate::MAX_K].
378    pub fn max_lookahead(&mut self, k: usize) -> std::result::Result<&mut Self, BuilderError> {
379        if k > MAX_K {
380            return Err(BuilderError::LookaheadTooLarge);
381        }
382        self.max_lookahead = k;
383        Ok(self)
384    }
385    /// Debug verbose information to the standard output
386    ///
387    /// This is an internal method, and is only intended for the CLI.
388    #[doc(hidden)]
389    pub fn debug_verbose(&mut self) -> &mut Self {
390        self.debug_verbose = true;
391        self
392    }
393    /// Generate range information for AST types
394    ///
395    pub fn range(&mut self) -> &mut Self {
396        self.range = true;
397        self
398    }
399    /// Generate node kind enums `TerminalKind` and `NonTerminalKind`
400    pub fn node_kind_enums(&mut self) -> &mut Self {
401        self.enum_kind = true;
402        self
403    }
404    /// Inserts the given inner attributes at the top of the generated trait source.
405    pub fn inner_attributes(&mut self, inner_attributes: Vec<InnerAttributes>) -> &mut Self {
406        self.inner_attributes = inner_attributes;
407        self
408    }
409    /// Activate the minimization of boxed types in the generated parser
410    pub fn minimize_boxed_types(&mut self) -> &mut Self {
411        self.minimize_boxed_types = true;
412        self
413    }
414    /// Enables trimming of the parse tree during parsing.
415    /// Generates the call to trim_parse_tree on the parser object before the call of parse.
416    ///
417    pub fn trim_parse_tree(&mut self) -> &mut Self {
418        self.trim_parse_tree = true;
419        self
420    }
421
422    /// Disables the error recovery mechanism in the generated parser
423    pub fn disable_recovery(&mut self) -> &mut Self {
424        self.disable_recovery = true;
425        self
426    }
427
428    /// Set the language to generate code for
429    pub fn language(&mut self, language: crate::config::Language) -> &mut Self {
430        self.language = language;
431        self
432    }
433
434    /// Begin the process of generating the grammar
435    /// using the specified listener (or None if no listener is desired).
436    ///
437    /// Returns an error if the build is *configured* incorrectly.
438    /// In a build script, this is typically a programmer error.
439    pub fn begin_generation_with<'l>(
440        &mut self,
441        listener: Option<&'l mut dyn BuildListener>,
442    ) -> std::result::Result<GrammarGenerator<'l>, BuilderError> {
443        /*
444         * For those concerned about performance:
445         *
446         * The overhead of all these copies and dyn dispatch is marginal
447         * in comparison to the actual grammar generation.
448         */
449        let grammar_file = self
450            .grammar_file
451            .as_ref()
452            .ok_or(BuilderError::MissingGrammarFile)?
453            .clone();
454        if self.output_sanity_checks {
455            // Check that we have outputs
456            if self.parser_output_file.is_none() {
457                return Err(BuilderError::MissingParserOutputFile);
458            } else if self.actions_output_file.is_none() {
459                return Err(BuilderError::MissingActionOutputFile);
460            }
461            // Missing expanded grammar file is fine. They might not want that.
462        }
463        Ok(GrammarGenerator {
464            listener: MaybeBuildListener(listener),
465            grammar_file,
466            builder: self.clone(),
467            state: None,
468            grammar_config: None,
469            lookahead_dfa_s: None,
470            parse_table: None,
471            type_info: None,
472        })
473    }
474    /// Generate the parser, writing it to the pre-configured output files.
475    pub fn generate_parser(&mut self) -> Result<()> {
476        self.begin_generation_with(None)
477            .map_err(|e| parol!("Misconfigured parol generation: {}", e))?
478            .generate_parser()
479    }
480    /// Generate the parser, writing it to the pre-configured output files. And export the node info.
481    pub fn generate_parser_and_export_node_infos(&mut self) -> Result<NodeTypesInfo> {
482        self.begin_generation_with(None)
483            .map_err(|e| parol!("Misconfigured parol generation: {}", e))?
484            .generate_parser_and_export_node_infos()
485    }
486}
487
488impl CommonGeneratorConfig for Builder {
489    fn user_type_name(&self) -> &str {
490        &self.user_type_name
491    }
492
493    fn module_name(&self) -> &str {
494        &self.module_name
495    }
496
497    fn minimize_boxed_types(&self) -> bool {
498        self.minimize_boxed_types
499    }
500
501    fn range(&self) -> bool {
502        self.range
503    }
504
505    fn node_kind_enums(&self) -> bool {
506        self.enum_kind
507    }
508
509    fn language(&self) -> crate::config::Language {
510        self.language
511    }
512}
513
514impl ParserGeneratorConfig for Builder {
515    fn trim_parse_tree(&self) -> bool {
516        self.trim_parse_tree
517    }
518
519    fn recovery_disabled(&self) -> bool {
520        self.disable_recovery
521    }
522}
523
524impl UserTraitGeneratorConfig for Builder {
525    fn inner_attributes(&self) -> &[InnerAttributes] {
526        &self.inner_attributes
527    }
528}
529
530/// Represents in-process grammar generation.
531///
532/// Most of the time you will want to use [Builder::generate_parser] to bypass this completely.
533///
534/// This is an advanced API, and unless stated otherwise, all its methods are unstable (see module docs).
535///
536/// The lifetime parameter `'l` refers to the lifetime of the optional listener.
537pub struct GrammarGenerator<'l> {
538    /// The build listener
539    ///
540    /// This is a fairly advanced feature
541    listener: MaybeBuildListener<'l>,
542    pub(crate) grammar_file: PathBuf,
543    builder: Builder,
544    state: Option<State>,
545    pub(crate) grammar_config: Option<GrammarConfig>,
546    lookahead_dfa_s: Option<BTreeMap<String, LookaheadDFA>>,
547    parse_table: Option<LRParseTable>,
548    type_info: Option<GrammarTypeInfo>,
549}
550impl GrammarGenerator<'_> {
551    /// Generate the parser, writing it to the pre-configured output files.
552    pub fn generate_parser(&mut self) -> Result<()> {
553        self.parse()?;
554        self.expand()?;
555        self.post_process()?;
556        self.write_output()?;
557        Ok(())
558    }
559
560    /// Generate the parser, writing it to the pre-configured output files. And export the node info.
561    pub fn generate_parser_and_export_node_infos(&mut self) -> Result<NodeTypesInfo> {
562        self.parse()?;
563        self.expand()?;
564        self.post_process()?;
565        self.write_output()?;
566        self.export_node_infos()
567    }
568
569    //
570    // Internal APIs
571    //
572
573    #[doc(hidden)]
574    pub fn parse(&mut self) -> Result<()> {
575        assert_eq!(self.state, None);
576        let input = fs::read_to_string(&self.grammar_file).map_err(|e| {
577            parol!(
578                "Can't read grammar file {}: {}",
579                self.grammar_file.display(),
580                e
581            )
582        })?;
583        if self.builder.cargo_integration {
584            println!("cargo:rerun-if-changed={}", self.grammar_file.display());
585        }
586        let mut parol_grammar = ParolGrammar::new();
587        let syntax_tree = crate::parser::parse(&input, &self.grammar_file, &mut parol_grammar)?;
588        self.listener
589            .on_initial_grammar_parse(&syntax_tree, &input, &parol_grammar)?;
590        self.grammar_config = Some(GrammarConfig::try_from(parol_grammar)?);
591
592        let _grammar_config = self.grammar_config.as_ref().unwrap();
593
594        self.state = Some(State::Parsed);
595        Ok(())
596    }
597    #[doc(hidden)]
598    pub fn expand(&mut self) -> Result<()> {
599        assert_eq!(self.state, Some(State::Parsed));
600        let grammar_config = self.grammar_config.as_mut().unwrap();
601        // NOTE: it's up to the listener to add appropriate error context
602        self.listener
603            .on_intermediate_grammar(IntermediateGrammar::Untransformed, &*grammar_config)?;
604        let cfg =
605            crate::check_and_transform_grammar(&grammar_config.cfg, grammar_config.grammar_type)?;
606
607        // To have at least a preliminary version of the expanded grammar,
608        // even when the next checks fail, we write out the expanded grammar here.
609        // In most cases it will be overwritten further on.
610        if let Some(ref expanded_file) = self.builder.expanded_grammar_output_file {
611            fs::write(
612                expanded_file,
613                crate::render_par_string(grammar_config, /* add_index_comment */ true)?,
614            )
615            .map_err(|e| parol!("Error writing left-factored grammar! {}", e))?;
616        }
617
618        // Exchange original grammar with transformed one
619        grammar_config.update_cfg(cfg);
620
621        self.listener
622            .on_intermediate_grammar(IntermediateGrammar::Transformed, &*grammar_config)?;
623        if let Some(ref expanded_file) = self.builder.expanded_grammar_output_file {
624            fs::write(
625                expanded_file,
626                crate::render_par_string(grammar_config, /* add_index_comment */ true)?,
627            )
628            .map_err(|e| parol!("Error writing left-factored grammar!: {}", e))?;
629        }
630        self.state = Some(State::Expanded);
631        Ok(())
632    }
633    #[doc(hidden)]
634    pub fn post_process(&mut self) -> Result<()> {
635        assert_eq!(self.state, Some(State::Expanded));
636        let grammar_config = self.grammar_config.as_mut().unwrap();
637        match grammar_config.grammar_type {
638            GrammarType::LLK => {
639                self.lookahead_dfa_s = Some(
640                    crate::calculate_lookahead_dfas(grammar_config, self.builder.max_lookahead)
641                        .map_err(|e| {
642                            parol!("Lookahead calculation for the given grammar failed!: {}", e)
643                        })?,
644                );
645
646                if self.builder.debug_verbose {
647                    print!(
648                        "Lookahead DFAs:\n{:?}",
649                        self.lookahead_dfa_s.as_ref().unwrap()
650                    );
651                }
652
653                // Update maximum lookahead size for scanner generation
654                grammar_config.update_lookahead_size(
655                    self.lookahead_dfa_s
656                        .as_ref()
657                        .unwrap()
658                        .iter()
659                        .max_by_key(|(_, dfa)| dfa.k)
660                        .unwrap()
661                        .1
662                        .k,
663                );
664            }
665            GrammarType::LALR1 => {
666                self.parse_table = Some(crate::calculate_lalr1_parse_table(grammar_config)?.0);
667                grammar_config.update_lookahead_size(1);
668            }
669        }
670
671        if self.builder.debug_verbose {
672            print!("\nGrammar config:\n{grammar_config:?}");
673        }
674        self.state = Some(State::PostProcessed);
675        Ok(())
676    }
677    #[doc(hidden)]
678    pub fn write_output(&mut self) -> Result<()> {
679        assert_eq!(self.state, Some(State::PostProcessed));
680        let grammar_config = self.grammar_config.as_mut().unwrap();
681
682        let language = self.builder.language();
683
684        let lexer_source = match language {
685            crate::config::Language::Rust => {
686                crate::generate_lexer_source(grammar_config, &self.builder)
687                    .map_err(|e| parol!("Failed to generate lexer source!: {}", e))?
688            }
689            crate::config::Language::CSharp => {
690                crate::generators::cs_lexer_generator::generate_lexer_source(
691                    grammar_config,
692                    &self.builder,
693                )
694                .map_err(|e| parol!("Failed to generate C# lexer source!: {}", e))?
695            }
696        };
697
698        let mut type_info: GrammarTypeInfo =
699            GrammarTypeInfo::try_new(&self.builder.user_type_name)?;
700
701        let user_trait_source = match language {
702            crate::config::Language::Rust => {
703                let user_trait_generator = UserTraitGenerator::new(grammar_config);
704                user_trait_generator.generate_user_trait_source(
705                    &self.builder,
706                    grammar_config.grammar_type,
707                    &mut type_info,
708                )?
709            }
710            crate::config::Language::CSharp => {
711                let user_trait_generator =
712                    crate::generators::cs_user_trait_generator::CSUserTraitGenerator::new(
713                        grammar_config,
714                    );
715                user_trait_generator.generate_user_trait_source(
716                    &self.builder,
717                    grammar_config.grammar_type,
718                    &mut type_info,
719                )?
720            }
721        };
722
723        if let Some(ref user_trait_file_out) = self.builder.actions_output_file {
724            fs::write(user_trait_file_out, user_trait_source)
725                .map_err(|e| parol!("Error writing generated user trait source!: {}", e))?;
726            if language == crate::config::Language::Rust {
727                crate::try_format(user_trait_file_out)?;
728            }
729        } else if self.builder.debug_verbose {
730            println!("\nSource for semantic actions:\n{user_trait_source}");
731        }
732
733        let ast_type_has_lifetime = type_info.symbol_table.has_lifetime(type_info.ast_enum_type);
734
735        let parser_source = match language {
736            crate::config::Language::Rust => match grammar_config.grammar_type {
737                GrammarType::LLK => crate::generate_parser_source(
738                    grammar_config,
739                    &lexer_source,
740                    &self.builder,
741                    self.lookahead_dfa_s.as_ref().unwrap(),
742                    ast_type_has_lifetime,
743                )?,
744                GrammarType::LALR1 => crate::generate_lalr1_parser_source(
745                    grammar_config,
746                    &lexer_source,
747                    &self.builder,
748                    self.parse_table.as_ref().unwrap(),
749                    ast_type_has_lifetime,
750                )?,
751            },
752            crate::config::Language::CSharp => match grammar_config.grammar_type {
753                GrammarType::LLK => crate::generators::cs_parser_generator::generate_parser_source(
754                    grammar_config,
755                    &lexer_source,
756                    &self.builder,
757                    self.lookahead_dfa_s.as_ref().unwrap(),
758                    ast_type_has_lifetime,
759                )?,
760                GrammarType::LALR1 => {
761                    crate::generators::cs_parser_generator::generate_lalr1_parser_source(
762                        grammar_config,
763                        &lexer_source,
764                        &self.builder,
765                        self.parse_table.as_ref().unwrap(),
766                        ast_type_has_lifetime,
767                    )?
768                }
769            },
770        };
771
772        if let Some(ref parser_file_out) = self.builder.parser_output_file {
773            fs::write(parser_file_out, parser_source)
774                .map_err(|e| parol!("Error writing generated lexer source!: {}", e))?;
775            if language == crate::config::Language::Rust {
776                crate::try_format(parser_file_out)?;
777            }
778        } else if self.builder.debug_verbose {
779            println!("\nParser source:\n{parser_source}");
780        }
781
782        if let Some(ref syntree_node_wrappers_output_file) = self.builder.node_kind_enum_output_file
783        {
784            let mut f = fs::OpenOptions::new()
785                .write(true)
786                .create(true)
787                .truncate(true)
788                .open(syntree_node_wrappers_output_file)
789                .map_err(|e| parol!("Error opening generated syntree node wrappers!: {}", e))?;
790            let syntree_node_types_generator =
791                NodeKindTypesGenerator::new(grammar_config, &type_info);
792            syntree_node_types_generator
793                .generate(&mut f)
794                .map_err(|e| parol!("Error generating syntree node wrappers!: {}", e))?;
795            crate::try_format(syntree_node_wrappers_output_file)?;
796        }
797
798        self.state = Some(State::Finished);
799        self.type_info = Some(type_info);
800
801        Ok(())
802    }
803
804    fn export_node_infos(&self) -> Result<NodeTypesInfo> {
805        let node_types_exporter = NodeTypesExporter::new(
806            self.grammar_config.as_ref().unwrap(),
807            self.type_info.as_ref().unwrap(),
808        );
809        Ok(node_types_exporter.generate())
810    }
811}
812
813#[derive(Clone, Copy, Debug, PartialEq, Eq)]
814enum State {
815    Parsed,
816    Expanded,
817    PostProcessed,
818    Finished,
819}
820
821/// A build listener, for advanced customization of the parser generation.
822///
823/// This is used by the CLI to implement some of its more advanced options (without cluttering up the main interface).
824///
825/// The details of this trait are considered unstable.
826#[allow(
827    unused_variables, // All these variables are going to be unused because these are NOP impls....
828    missing_docs, // This is fine because this is internal.
829)]
830pub trait BuildListener {
831    fn on_initial_grammar_parse(
832        &mut self,
833        syntax_tree: &ParseTree,
834        input: &str,
835        grammar: &ParolGrammar,
836    ) -> Result<()> {
837        Ok(())
838    }
839    fn on_intermediate_grammar(
840        &mut self,
841        stage: IntermediateGrammar,
842        config: &GrammarConfig,
843    ) -> Result<()> {
844        Ok(())
845    }
846}
847#[derive(Default)]
848struct MaybeBuildListener<'l>(Option<&'l mut dyn BuildListener>);
849impl BuildListener for MaybeBuildListener<'_> {
850    fn on_initial_grammar_parse(
851        &mut self,
852        syntax_tree: &ParseTree,
853        input: &str,
854        grammar: &ParolGrammar,
855    ) -> Result<()> {
856        if let Some(ref mut inner) = self.0 {
857            inner.on_initial_grammar_parse(syntax_tree, input, grammar)
858        } else {
859            Ok(())
860        }
861    }
862
863    fn on_intermediate_grammar(
864        &mut self,
865        stage: IntermediateGrammar,
866        config: &GrammarConfig,
867    ) -> Result<()> {
868        if let Some(ref mut inner) = self.0 {
869            inner.on_intermediate_grammar(stage, config)
870        } else {
871            Ok(())
872        }
873    }
874}
875
876/// Marks an intermediate stage of the grammar, in between the various transformations that parol does.
877///
878/// The last transformation is returned by [IntermediateGrammar::LAST]
879///
880/// This enum gives some degree of access to the individual transformations that parol does.
881/// As such, the specific variants are considered unstable.
882#[non_exhaustive]
883#[derive(Copy, Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
884pub enum IntermediateGrammar {
885    /// Writes the untransformed parsed grammar
886    ///
887    /// NOTE: This is different then the initially parsed syntax tree
888    Untransformed,
889    /// Writes the transformed parsed grammar
890    Transformed,
891}
892impl IntermediateGrammar {
893    /// The last transformation.
894    pub const LAST: IntermediateGrammar = IntermediateGrammar::Transformed;
895}
896
897/// An error that occurs configuring the [struct@Builder].
898#[derive(Debug, thiserror::Error)]
899#[non_exhaustive]
900pub enum BuilderError {
901    /// Indicates that the operation needs a grammar file as input,
902    /// but that one has not been specified.
903    #[error("Missing an input grammar file")]
904    MissingGrammarFile,
905    /// Indicates that no parser output file has been specified.
906    ///
907    /// This would discard the generated parser, which is typically a mistake.
908    #[error("No parser output file specified")]
909    MissingParserOutputFile,
910    /// Indicates that no parser output file has been specified.
911    ///
912    /// This would discard the generated parser, which is typically a mistake.
913    #[error("No action output file specified")]
914    MissingActionOutputFile,
915    /// Indicates that the specified lookahead is too large
916    #[error("Maximum lookahead is {}", MAX_K)]
917    LookaheadTooLarge,
918}