Skip to main content

rustledger_parser/cst/
mod.rs

1//! Lossless concrete syntax tree (CST) for Beancount.
2//!
3//! Phase 1 of the parser-CST migration tracked in #1262. Sits inside
4//! `rustledger-parser` (no new crate) — phases 2-5 will move the
5//! existing AST-style parser internals to delegate to this module
6//! and eventually delete the old code paths.
7//!
8//! # Phase 1 surface
9//!
10//! - [`SyntaxKind`]: every token and node kind that can appear in the
11//!   tree. `num_enum::TryFromPrimitive` for the u16 → enum conversion.
12//! - [`BeancountLanguage`]: the rowan `Language` impl + type aliases
13//!   ([`SyntaxNode`], [`SyntaxToken`], [`SyntaxElement`]).
14//! - [`lossless_kind_tokens`]: drive the lossless lexer (`tokenize_lossless`)
15//!   and recover the leading BOM byte-by-byte.
16//! - [`parse_flat`]: produce a flat `SOURCE_FILE` tree that round-trips
17//!   byte-identically against the source.
18//!
19//! # Trivia attachment policy (phase 2.0)
20//!
21//! Phase 1 emits a flat tree, where trivia attachment is a non-
22//! question. Phase 2.1+ introduces structural nodes (`DIRECTIVE`,
23//! then `POSTING` / `AMOUNT` / `COST_SPEC` / `META_ENTRY` / ...)
24//! that wrap token runs. Phase 2.0 pins **the
25//! Directive-Terminator Rule**: every directive owns its content
26//! tokens PLUS its terminating `NEWLINE`.
27//!
28//! Short version:
29//!
30//! - **Same-line trailing** trivia (whitespace + EOL comment
31//!   before the terminator) lives INSIDE the directive.
32//! - **Inter-directive leading** trivia (blank lines, mid-file
33//!   comment blocks) lives INSIDE the NEXT directive.
34//! - **File-leading** trivia (before the first content token) is
35//!   a direct child of `SOURCE_FILE`.
36//! - **File-trailing** trivia (after the file-final directive's
37//!   terminator) is also a direct child of `SOURCE_FILE`.
38//!
39//! Fully symmetric: every directive has the same children shape
40//! (optional leading + content + optional same-line trailing +
41//! terminator `NEWLINE`). No EOF special case.
42//!
43//! Phase 2.0 ships NO production helper — the policy is enforced
44//! via tree-shape regression tests in `cst::trivia` (private
45//! submodule). Phase 2.1's structured parser writes its own
46//! streaming, state-aware predicate that produces trees matching
47//! those shapes. If the parser drifts, the regression tests fire.
48//! See the `trivia` module rustdoc for the full spec, rationale,
49//! and recursive-application notes for phase 2.1's grammar.
50
51pub mod ast;
52mod convert;
53pub(crate) mod format;
54mod lossless_tokens;
55mod parser;
56mod syntax_kind;
57mod trivia;
58
59pub use convert::parse_via_cst;
60// Formatter exports do NOT re-export through `cst` — the sole
61// import path for the formatter is `rustledger_parser::format`
62// (the crate-root sub-module that re-exports the six symbols
63// directly from `crate::cst::format`). Round-6 sealed the
64// `cst::format` path so there is exactly one import shape per
65// formatter symbol from outside the crate.
66pub use lossless_tokens::lossless_kind_tokens;
67pub use parser::{parse_flat, parse_structured};
68pub use syntax_kind::{BeancountLanguage, SyntaxElement, SyntaxKind, SyntaxNode, SyntaxToken};