lex_babel/formats/markdown/
mod.rs

1//! Markdown format implementation
2//!
3//! This module implements bidirectional conversion between Lex and CommonMark Markdown.
4//!
5//! # Library Choice
6//!
7//! We use the `comrak` crate for Markdown parsing and serialization. This choice is based on:
8//! - Single crate for both parsing and serialization
9//! - Feature-rich with CommonMark compliance
10//! - Robust and well-maintained
11//! - Supports extensions (tables, strikethrough, etc.)
12//!
13//! # Element Mapping Table
14//!
15//! Complete Lex ↔ Markdown Mapping:
16//!
17//! | Lex Element      | Markdown Equivalent     | Export Notes                           | Import Notes                          |
18//! |------------------|-------------------------|----------------------------------------|---------------------------------------|
19//! | Session          | Heading (# ## ###)      | Session level → heading level (1-6)    | Heading level → session nesting       |
20//! | Paragraph        | Paragraph               | Direct mapping                         | Direct mapping                        |
21//! | List             | List (- or 1. 2. 3.)    | Ordered/unordered preserved            | Detect type from first item marker    |
22//! | ListItem         | List item (- item)      | Direct mapping with nesting            | Direct mapping with nesting           |
23//! | Definition       | **Term**: Description   | Bold term + colon + content            | Parse bold + colon pattern            |
24//! | Verbatim         | Code block (```)        | Language → info string                 | Info string → language                |
25//! | Annotation       | HTML comment            | `<!-- lex:label key=val -->` format    | Not implemented (annotations lost)    |
26//! | InlineContent:   |                         |                                        |                                       |
27//! |   Text           | Plain text              | Direct                                 | Direct                                |
28//! |   Bold           | **bold** or __bold__    | Use **                                 | Parse both                            |
29//! |   Italic         | *italic* or _italic_    | Use *                                  | Parse both                            |
30//! |   Code           | `code`                  | Direct                                 | Direct                                |
31//! |   Math           | $math$ or $$math$$      | Use $...$                              | Parse if extension enabled            |
32//! |   Reference      | \[text\]                | Plain text (Lex refs are citations)    | Parse link/reference syntax           |
33//!
34//! # Lossy Conversions
35//!
36//! The following conversions lose information on round-trip:
37//! - Lex sessions beyond level 6 → h6 with nested content (Markdown max is h6)
38//! - Lex annotations → HTML comments (exported but not parsed on import)
39//! - Lex definition structure → bold text pattern (not native Markdown)
40//! - Lex references → plain text (citations, not URLs)
41//! - Multiple blank lines → single blank line (Markdown normalization)
42//! - Verbatim post-wall indentation → lost (see issue #276)
43//!
44//! # Architecture Notes
45//!
46//! There is a fundamental mismatch between Markdown's flat model and Lex's hierarchical structure.
47//! We leverage the IR event system (lex-babel/src/common/) to handle the nested-to-flat and
48//! flat-to-nested conversions. This keeps format-specific code focused on Markdown AST transformations.
49//!
50//! Lists are the only Markdown element that are truly nested, making them straightforward to map.
51//!
52//! # Testing
53//!
54//! Export tests use Lex spec files from specs/v1/elements/ for isolated element testing.
55//! Integration tests use the kitchensink benchmark and a CommonMark reference document.
56//! See the testing guide in docs/local/tasks/86-babel-markdown.lex for details.
57//!
58//! # Implementation Status
59//!
60//! - [x] Export (Lex → Markdown)
61//!   - [x] Paragraph
62//!   - [x] Heading (Session) - nested sessions → flat heading hierarchy
63//!   - [x] Bold, Italic, Code inlines
64//!   - [x] Lists - ordered/unordered detection, tight formatting
65//!   - [x] Code blocks (Verbatim)
66//!   - [x] Definitions - term paragraph + description siblings
67//!   - [x] Annotations - as HTML comments with content
68//!   - [x] Math - rendered as $...$ text
69//!   - [x] References - rendered as plain text citations
70//! - [x] Import (Markdown → Lex)
71//!   - [x] Paragraph
72//!   - [x] Heading → Session (flat heading hierarchy → nested sessions)
73//!   - [x] Bold, Italic, Code inlines
74//!   - [x] Lists
75//!   - [x] Code blocks → Verbatim
76//!   - [x] Annotations (HTML comment parsing)
77//!   - [x] Definitions (pattern matching)
78
79pub mod parser;
80pub mod serializer;
81
82use crate::error::FormatError;
83use crate::format::Format;
84use lex_core::lex::ast::Document;
85
86/// Format implementation for Markdown
87pub struct MarkdownFormat;
88
89impl Format for MarkdownFormat {
90    fn name(&self) -> &str {
91        "markdown"
92    }
93
94    fn description(&self) -> &str {
95        "CommonMark Markdown format"
96    }
97
98    fn file_extensions(&self) -> &[&str] {
99        &["md", "markdown"]
100    }
101
102    fn supports_parsing(&self) -> bool {
103        true
104    }
105
106    fn supports_serialization(&self) -> bool {
107        true
108    }
109
110    fn parse(&self, source: &str) -> Result<Document, FormatError> {
111        parser::parse_from_markdown(source)
112    }
113
114    fn serialize(&self, doc: &Document) -> Result<String, FormatError> {
115        serializer::serialize_to_markdown(doc)
116    }
117}