Skip to main content

lex_babel/
lib.rs

1//! Multi-format interoperability for Lex documents
2//!
3//!     This crate provides a uniform interface for converting between Lex AST and various document
4//!     formats (Markdown, HTML, Pandoc JSON, etc.).
5//!
6//!     TLDR: For format authors:
7//!         - Babel never parses or serializes any format, but instead relies on the format's libraries
8//!         - The convertion should be by converting to the IR, running the common code in common if releveant (it usually is), then to the ast of the target format.
9//!         - We should use the testing harness see (lex-parser/src/lex/testing.rs) to load documents and process them into asts.
10//!         - Each element should use the harness above and the available file for isolated elements testings with unit tests (load with the lib, assert with ast / ir)
11//!         - Each format should have trifecta unit tested in from and to formats to lex.
12//!         - Each format should have a kitchensink unit tested in from and to formats to lex
13//!         - Read the README.lex the full details)
14//!
15//! Architecture
16//!
17//!     The goal here is to, as much as possible, split what is the common logic for multiple formats
18//!     conversions into a format agnoistic layer. This is done by the using the IR representation (./ir/mod.rs),
19//!     and having the common code in ./common/mod.rs. This allows for the format specific code to be focused on the data format transformations, while having a strong, focused core that can be well tested in isolation.
20//!
21//!     This is a pure lib, that is , it powers the lex-cli but is shell agnostic, that is no code
22//!     should be written that supposes a shell environment, be it to std print, env vars etc.
23
24//!
25//!     The file structure :
26//!     .
27//!     ├── error.rs
28//!     ├── format.rs               # Format trait definition
29//!     ├── registry.rs             # FormatTregistry for discovery and selection
30//!     ├── formats
31//!     │   ├── <format>
32//!     │   │   ├── parser.rs       # Parser implementation
33//!     │   │   ├── serializer.rs   # Serializer implementation
34//!     │   │   └── mod.rs
35//!     |   ├─  interop             # Shared conversion utilities
36//!     ├── lib.rs
37//!     ├── ir                      # Intermediate Representation
38//!     ├── common                # Common mapping code
39//!
40//! Testing   
41//!     tests
42//!     └── <format>
43//!         ├── <testname>.rs
44//!         └── fixtures
45//!         ├── <docname>.<format>
46//!         ├── kitchensink.html
47//!         ├── kitchensink.lex
48//!         └── kitchensink.md
49//!
50//!     Note that rust does not by default discover tests in subdirectories, so we need to include these
51//!     in the mod.
52//!
53//!
54//! Core Algorithms
55//!
56//!     The most complex part of the work is reconstructing a nested representation from a flat document, followed by the reverse operations.  For this reason we have a common IR (./ir/mod.rs) that is used for all formats.
57//!     Over this representation we implement both algorithms (see ./common/flat_to_nested.rs and ./common/nested_to_flat.rs).
58//!     This means that all the heavy lifting is done by a core, well tested and maintained module,
59//! freeing format adaptations to be focused on the simpler data format transformations.
60//!
61//!
62//! Formats
63//!
64//!     Format specific capabilities are implemented with the Format trait. formats should have a
65//!     parse() and serialize() method, a name and file extensions. See the trait def [./format.rs ]
66//!     - Format trait: Uniform interface for all formats (parsing and/or serialization)
67//!     - FormatRegistry: Centralized discovery and selection of formats
68//!     - Format implementations: Concrete implementations for each supported format
69//!
70//!
71//! The Lex Format
72//!
73//!     The Lex format itself is implemented as a format, see ./formats/lex/mod.rs, which allows for
74//!     a homogeneous API where all formats have identical interfaces:
75//!
76//!     Note that Lex is a more expressive format than most, which means that converting from is
77//!     simple , but always lossy. In particular converting from requires some cosnideartion on how
78//!     to best represent the author's intent.
79//!
80//!     This means that full format interop round tripping is not possible.
81//!
82//! Format Selection
83//!
84//!     The choice for the formats is pretty sensible:
85//!
86//!     - HTML Output: should be self arguing, as it's the most common format for publishing and viewing.
87//!     - Markdown: both in and to, as Mardown is the universal format for plain text editing.
88//!     - XML: serializing Lex's is trivial and can be useful as a structured format for storage.
89//!
90//!     These are table stakes, that is a format that can't export to HTML, convert to markdown or
91//! lack a good semantic pure xml output is a non starter.
92//!
93//!
94//!     For everything else, there is good arguments for a variety of formats. The one that has the strongest fit
95//!  and use case is Latex, as Lex can be very useful for scientific writing. But latex is
96//!  complicated, and having pandoc in the pipeline allows us to serve reasonably well pretty much
97//!  any other format.
98//!
99//! Library Choices
100//!
101//!     This, not being lex's core means that we will offload as much as possible to better, specialized creates
102//!  for each format. the escope here is mainly to adapt the ast's from lex to the format or vice
103//!  versa. For example we never write the serializer for , say markdown, but pass the AST to the
104//!     mardown library. To support a format inbound, we write the format ast -> lex ast adapter.
105//!  likewise, for outbound formats we will do the reverse, converting from the lex ast to the
106//!  format's.
107//!
108//!     As much as possible, we will use rust crates, and avoid shelling out and having outside dependencies.
109//!
110pub mod error;
111pub mod format;
112pub mod formats;
113pub mod publish;
114pub mod registry;
115pub mod templates;
116pub mod transforms;
117
118pub mod common;
119pub mod ir;
120
121pub use error::FormatError;
122pub use format::{Format, SerializedDocument};
123pub use registry::FormatRegistry;
124
125/// Converts a lex document to the Intermediate Representation (IR).
126///
127/// # Information Loss
128///
129/// The IR is a simplified, semantic representation. The following
130/// Lex information is lost during conversion:
131/// - Blank line grouping (BlankLineGroup nodes)
132/// - Source positions and token information
133/// - Comment annotations at document level
134///
135/// For lossless Lex representation, use the AST directly.
136pub fn to_ir(doc: &lex_core::lex::ast::elements::Document) -> ir::nodes::Document {
137    ir::from_lex::from_lex_document(doc)
138}
139
140/// Converts an IR document back to Lex AST.
141///
142/// This is useful for round-trip conversions: Format → IR → Lex.
143pub fn from_ir(doc: &ir::nodes::Document) -> lex_core::lex::ast::elements::Document {
144    ir::to_lex::to_lex_document(doc)
145}