Multi-format interoperability for Lex documents
This crate provides a uniform interface for converting between Lex AST and various document
formats .
TLDR: For format authors:
- Babel never parses or serializes any format, but instead relies on the format's libraries
- The convertion should be by converting to the IR, running the common code in common if releveant , then to the ast of the target format.
- We should use the testing harness see to load documents and process them into asts.
- Each element should use the harness above and the available file for isolated elements testings with unit tests
- Each format should have trifecta unit tested in from and to formats to lex.
- Each format should have a kitchensink unit tested in from and to formats to lex
- Read the README.lex the full details)
Architecture
The goal here is to, as much as possible, split what is the common logic for multiple formats
conversions into a format agnoistic layer. This is done by the using the IR representation ,
and having the common code in ./common/mod.rs. This allows for the format specific code to be focused on the data format transformations, while having a strong, focused core that can be well tested in isolation.
This is a pure lib, that is , it powers the lex-cli but is shell agnostic, that is no code
should be written that supposes a shell environment, be it to std print, env vars etc.
The file structure :
.
├── error.rs
├── format.rs # Format
Testing
tests
└──
├── .rs
└── fixtures
├── .
├── kitchensink.html
├── kitchensink.lex
└── kitchensink.md
Note that rust does not by default discover tests in subdirectories, so we need to include these
in the mod.
Core Algorithms
The most complex part of the work is reconstructing a nested representation from a flat document, followed by the reverse operations. For this reason we have a common IR that is used for all formats.
Over this representation we implement both algorithms .
This means that all the heavy lifting is done by a core, well tested and maintained module,
freeing format adaptations to be focused on the simpler data format transformations.
Formats
Format specific capabilities are implemented with the Format trait. formats should have a
parse and serialize method, a name and file extensions. See the
The Lex Format
The Lex format itself is implemented as a format, see ./formats/lex/mod.rs, which allows for
a homogeneous API where all formats have identical interfaces:
Note that Lex is a more expressive format than most, which means that converting from is
simple , but always lossy. In particular converting from requires some cosnideartion on how
to best represent the author's intent.
This means that full format interop round tripping is not possible.
Format Selection
The choice for the formats is pretty sensible:
- HTML Output: should be self arguing, as it's the most common format for publishing and viewing.
- Markdown: both in and to, as Mardown is the universal format for plain text editing.
- XML: serializing Lex's is trivial and can be useful as a structured format for storage.
These are table stakes, that is a format that can't export to HTML, convert to markdown or
lack a good semantic pure xml output is a non starter.
For everything else, there is good arguments for a variety of formats. The one that has the strongest fit
and use case is Latex, as Lex can be very useful for scientific writing. But latex is complicated, and having pandoc in the pipeline allows us to serve reasonably well pretty much any other format.
Library Choices
This, not being lex's core means that we will offload as much as possible to better, specialized creates
for each format. the escope here is mainly to adapt the ast's from lex to the format or vice versa. For example we never write the serializer for , say markdown, but pass the AST to the mardown library. To support a format inbound, we write the format ast -> lex ast adapter. likewise, for outbound formats we will do the reverse, converting from the lex ast to the format's.
As much as possible, we will use rust crates, and avoid shelling out and having outside dependencies.