lex_babel/lib.rs
1//! Multi-format interoperability for Lex documents
2//!
3//! This crate provides a uniform interface for converting between Lex AST and various document
4//! formats (Markdown, HTML, Pandoc JSON, etc.).
5//!
6//! TLDR: For format authors:
7//! - Babel never parses or serializes any format, but instead relies on the format's libraries
8//! - The convertion should be by converting to the IR, running the common code in common if releveant (it usually is), then to the ast of the target format.
9//! - We should use the testing harness see (lex-parser/src/lex/testing.rs) to load documents and process them into asts.
10//! - Each element should use the harness above and the available file for isolated elements testings with unit tests (load with the lib, assert with ast / ir)
11//! - Each format should have trifecta unit tested in from and to formats to lex.
12//! - Each format should have a kitchensink unit tested in from and to formats to lex
13//! - Read the README.lex the full details)
14//!
15//! Architecture
16//!
17//! The goal here is to, as much as possible, split what is the common logic for multiple formats
18//! conversions into a format agnoistic layer. This is done by the using the IR representation (./ir/mod.rs),
19//! and having the common code in ./common/mod.rs. This allows for the format specific code to be focused on the data format transformations, while having a strong, focused core that can be well tested in isolation.
20//!
21//! This is a pure lib, that is , it powers the lex-cli but is shell agnostic, that is no code
22//! should be written that supposes a shell environment, be it to std print, env vars etc.
23
24//!
25//! The file structure :
26//! .
27//! ├── error.rs
28//! ├── format.rs # Format trait definition
29//! ├── registry.rs # FormatTregistry for discovery and selection
30//! ├── formats
31//! │ ├── <format>
32//! │ │ ├── parser.rs # Parser implementation
33//! │ │ ├── serializer.rs # Serializer implementation
34//! │ │ └── mod.rs
35//! | ├─ interop # Shared conversion utilities
36//! ├── lib.rs
37//! ├── ir # Intermediate Representation
38//! ├── common # Common mapping code
39//!
40//! Testing
41//! tests
42//! └── <format>
43//! ├── <testname>.rs
44//! └── fixtures
45//! ├── <docname>.<format>
46//! ├── kitchensink.html
47//! ├── kitchensink.lex
48//! └── kitchensink.md
49//!
50//! Note that rust does not by default discover tests in subdirectories, so we need to include these
51//! in the mod.
52//!
53//!
54//! Core Algorithms
55//!
56//! The most complex part of the work is reconstructing a nested representation from a flat document, followed by the reverse operations. For this reason we have a common IR (./ir/mod.rs) that is used for all formats.
57//! Over this representation we implement both algorithms (see ./common/flat_to_nested.rs and ./common/nested_to_flat.rs).
58//! This means that all the heavy lifting is done by a core, well tested and maintained module,
59//! freeing format adaptations to be focused on the simpler data format transformations.
60//!
61//!
62//! Formats
63//!
64//! Format specific capabilities are implemented with the Format trait. formats should have a
65//! parse() and serialize() method, a name and file extensions. See the trait def [./format.rs ]
66//! - Format trait: Uniform interface for all formats (parsing and/or serialization)
67//! - FormatRegistry: Centralized discovery and selection of formats
68//! - Format implementations: Concrete implementations for each supported format
69//!
70//!
71//! The Lex Format
72//!
73//! The Lex format itself is implemented as a format, see ./formats/lex/mod.rs, which allows for
74//! a homogeneous API where all formats have identical interfaces:
75//!
76//! Note that Lex is a more expressive format than most, which means that converting from is
77//! simple , but always lossy. In particular converting from requires some cosnideartion on how
78//! to best represent the author's intent.
79//!
80//! This means that full format interop round tripping is not possible.
81//!
82//! Format Selection
83//!
84//! The choice for the formats is pretty sensible:
85//!
86//! - HTML Output: should be self arguing, as it's the most common format for publishing and viewing.
87//! - Markdown: both in and to, as Mardown is the universal format for plain text editing.
88//! - XML: serializing Lex's is trivial and can be useful as a structured format for storage.
89//!
90//! These are table stakes, that is a format that can't export to HTML, convert to markdown or
91//! lack a good semantic pure xml output is a non starter.
92//!
93//!
94//! For everything else, there is good arguments for a variety of formats. The one that has the strongest fit
95//! and use case is Latex, as Lex can be very useful for scientific writing. But latex is
96//! complicated, and having pandoc in the pipeline allows us to serve reasonably well pretty much
97//! any other format.
98//!
99//! Library Choices
100//!
101//! This, not being lex's core means that we will offload as much as possible to better, specialized creates
102//! for each format. the escope here is mainly to adapt the ast's from lex to the format or vice
103//! versa. For example we never write the serializer for , say markdown, but pass the AST to the
104//! mardown library. To support a format inbound, we write the format ast -> lex ast adapter.
105//! likewise, for outbound formats we will do the reverse, converting from the lex ast to the
106//! format's.
107//!
108//! As much as possible, we will use rust crates, and avoid shelling out and having outside dependencies.
109//!
110pub mod error;
111pub mod format;
112pub mod formats;
113pub mod publish;
114pub mod registry;
115pub mod templates;
116pub mod transforms;
117
118pub mod common;
119pub mod ir;
120
121pub use error::FormatError;
122pub use format::{Format, SerializedDocument};
123pub use registry::FormatRegistry;
124
125/// Converts a lex document to the Intermediate Representation (IR).
126///
127/// # Information Loss
128///
129/// The IR is a simplified, semantic representation. The following
130/// Lex information is lost during conversion:
131/// - Blank line grouping (BlankLineGroup nodes)
132/// - Source positions and token information
133/// - Comment annotations at document level
134///
135/// For lossless Lex representation, use the AST directly.
136pub fn to_ir(doc: &lex_core::lex::ast::elements::Document) -> ir::nodes::Document {
137 ir::from_lex::from_lex_document(doc)
138}
139
140/// Converts an IR document back to Lex AST.
141///
142/// This is useful for round-trip conversions: Format → IR → Lex.
143pub fn from_ir(doc: &ir::nodes::Document) -> lex_core::lex::ast::elements::Document {
144 ir::to_lex::to_lex_document(doc)
145}