lex_babel/formats/markdown/mod.rs
1//! Markdown format implementation
2//!
3//! This module implements bidirectional conversion between Lex and CommonMark Markdown.
4//!
5//! # Library Choice
6//!
7//! We use the `comrak` crate for Markdown parsing and serialization. This choice is based on:
8//! - Single crate for both parsing and serialization
9//! - Feature-rich with CommonMark compliance
10//! - Robust and well-maintained
11//! - Supports extensions (tables, strikethrough, etc.)
12//!
13//! # Element Mapping Table
14//!
15//! Complete Lex ↔ Markdown Mapping:
16//!
17//! | Lex Element | Markdown Equivalent | Export Notes | Import Notes |
18//! |------------------|-------------------------|----------------------------------------|---------------------------------------|
19//! | Session | Heading (# ## ###) | Session level → heading level (1-6) | Heading level → session nesting |
20//! | Paragraph | Paragraph | Direct mapping | Direct mapping |
21//! | List | List (- or 1. 2. 3.) | Ordered/unordered preserved | Detect type from first item marker |
22//! | ListItem | List item (- item) | Direct mapping with nesting | Direct mapping with nesting |
23//! | Definition | **Term**: Description | Bold term + colon + content | Parse bold + colon pattern |
24//! | Verbatim | Code block (```) | Language → info string | Info string → language |
25//! | Annotation | HTML comment | `<!-- lex:label key=val -->` format | Not implemented (annotations lost) |
26//! | InlineContent: | | | |
27//! | Text | Plain text | Direct | Direct |
28//! | Bold | **bold** or __bold__ | Use ** | Parse both |
29//! | Italic | *italic* or _italic_ | Use * | Parse both |
30//! | Code | `code` | Direct | Direct |
31//! | Math | $math$ or $$math$$ | Use $...$ | Parse if extension enabled |
32//! | Reference | \[text\] | Plain text (Lex refs are citations) | Parse link/reference syntax |
33//!
34//! # Lossy Conversions
35//!
36//! The following conversions lose information on round-trip:
37//! - Lex sessions beyond level 6 → h6 with nested content (Markdown max is h6)
38//! - Lex annotations → HTML comments (exported but not parsed on import)
39//! - Lex definition structure → bold text pattern (not native Markdown)
40//! - Lex references → plain text (citations, not URLs)
41//! - Multiple blank lines → single blank line (Markdown normalization)
42//! - Verbatim post-wall indentation → lost (see issue #276)
43//!
44//! # Architecture Notes
45//!
46//! There is a fundamental mismatch between Markdown's flat model and Lex's hierarchical structure.
47//! We leverage the IR event system (lex-babel/src/common/) to handle the nested-to-flat and
48//! flat-to-nested conversions. This keeps format-specific code focused on Markdown AST transformations.
49//!
50//! Lists are the only Markdown element that are truly nested, making them straightforward to map.
51//!
52//! # Testing
53//!
54//! Export tests use Lex spec files from specs/v1/elements/ for isolated element testing.
55//! Integration tests use the kitchensink benchmark and a CommonMark reference document.
56//! See the testing guide in docs/local/tasks/86-babel-markdown.lex for details.
57//!
58//! # Implementation Status
59//!
60//! - [x] Export (Lex → Markdown)
61//! - [x] Paragraph
62//! - [x] Heading (Session) - nested sessions → flat heading hierarchy
63//! - [x] Bold, Italic, Code inlines
64//! - [x] Lists - ordered/unordered detection, tight formatting
65//! - [x] Code blocks (Verbatim)
66//! - [x] Definitions - term paragraph + description siblings
67//! - [x] Annotations - as HTML comments with content
68//! - [x] Math - rendered as $...$ text
69//! - [x] References - rendered as plain text citations
70//! - [x] Import (Markdown → Lex)
71//! - [x] Paragraph
72//! - [x] Heading → Session (flat heading hierarchy → nested sessions)
73//! - [x] Bold, Italic, Code inlines
74//! - [x] Lists
75//! - [x] Code blocks → Verbatim
76//! - [x] Annotations (HTML comment parsing)
77//! - [x] Definitions (pattern matching)
78
79pub mod parser;
80pub mod serializer;
81
82use crate::error::FormatError;
83use crate::format::Format;
84use lex_core::lex::ast::Document;
85
86/// Format implementation for Markdown
87pub struct MarkdownFormat;
88
89impl Format for MarkdownFormat {
90 fn name(&self) -> &str {
91 "markdown"
92 }
93
94 fn description(&self) -> &str {
95 "CommonMark Markdown format"
96 }
97
98 fn file_extensions(&self) -> &[&str] {
99 &["md", "markdown"]
100 }
101
102 fn supports_parsing(&self) -> bool {
103 true
104 }
105
106 fn supports_serialization(&self) -> bool {
107 true
108 }
109
110 fn parse(&self, source: &str) -> Result<Document, FormatError> {
111 parser::parse_from_markdown(source)
112 }
113
114 fn serialize(&self, doc: &Document) -> Result<String, FormatError> {
115 serializer::serialize_to_markdown(doc)
116 }
117}