lex_babel/formats/html/mod.rs
1//! HTML format implementation
2//!
3//! This module implements bidirectional conversion between Lex and HTML5.
4//!
5//! # Library Choice
6//!
7//! We use the `html5ever` + `rcdom` + `markup5ever` ecosystem for HTML parsing and serialization:
8//! - `html5ever`: Browser-grade HTML5 parser from the Servo project
9//! - `markup5ever_rcdom`: Reference-counted DOM tree implementation
10//! - `markup5ever`: Serialization infrastructure
11//!
12//! This choice is based on:
13//! - Complete solution for both parsing and serialization
14//! - Battle-tested with 12M+ downloads
15//! - WHATWG HTML5 specification compliance
16//! - Active maintenance by Servo project
17//! - Handles malformed HTML gracefully
18//!
19//! # Element Mapping Table
20//!
21//! Complete Lex ↔ HTML Mapping:
22//!
23//! | Lex Element | HTML Equivalent | Export Notes | Import Notes |
24//! |------------------|----------------------------------------------------|-------------------------------------------|---------------------------------------|
25//! | Document | `<div class="lex-document">` | Root container with document class | Parse body content |
26//! | Session | `<section class="lex-session lex-session-N">` + `<hN>` | Session → section + heading | section + heading → Session |
27//! | Paragraph | `<p class="lex-paragraph">` | Direct mapping with class | Direct mapping |
28//! | List | `<ul>`/`<ol>` with `class="lex-list"` | Ordered/unordered preserved with class | Detect ul/ol type |
29//! | ListItem | `<li class="lex-list-item">` | Direct mapping with class | Direct mapping |
30//! | Definition | `<dl class="lex-definition">` `<dt>` `<dd>` | Term in dt, description in dd | Parse dl/dt/dd structure |
31//! | Verbatim | `<pre class="lex-verbatim">` `<code>` | Language → data-language attribute | Extract language from attribute |
32//! | Annotation | `<!-- lex:label key=val -->` | HTML comment format | Parse HTML comment pattern |
33//! | InlineContent: | | | |
34//! | Text | Plain text | Direct | Direct |
35//! | Bold | `<strong>` | Semantic strong tag | Parse both strong and b |
36//! | Italic | `<em>` | Semantic emphasis tag | Parse both em and i |
37//! | Code | `<code>` | Inline code tag | Direct |
38//! | Math | `<span class="lex-math">` | Preserve $ delimiters in span | Parse math span |
39//! | Reference | `<a href="url">text</a>` | Convert to anchor with prev word as text | Parse anchor back to reference |
40//!
41//! # CSS Classes
42//!
43//! All Lex elements receive CSS classes matching their AST structure:
44//! - `.lex-document`: Root document container
45//! - `.lex-session`, `.lex-session-1`, `.lex-session-2`, etc.: Sessions with depth
46//! - `.lex-paragraph`: Paragraphs
47//! - `.lex-list`: Lists (combined with ul/ol)
48//! - `.lex-list-item`: List items
49//! - `.lex-definition`: Definition lists
50//! - `.lex-verbatim`: Verbatim/code blocks
51//! - `.lex-math`: Math expressions
52//!
53//! This enables:
54//! - Precise CSS targeting for presentation
55//! - Perfect round-trip conversion (HTML → Lex → HTML preserves structure)
56//! - Custom theming without modifying structure
57//!
58//! # CSS and Theming
59//!
60//! HTML export includes embedded CSS from:
61//! - `css/baseline.css`: Browser reset + default modern presentation (always included)
62//! - `css/themes/theme-*.css`: Optional overrides layered on top of the baseline
63//!
64//! The default theme (`HtmlTheme::Modern`) injects an empty stylesheet so the
65//! baseline alone controls rendering. Other themes, like Fancy Serif, only add
66//! targeted overrides.
67//!
68//! Themes use Google Fonts and are mobile-responsive.
69//!
70//! # Output Format
71//!
72//! Export produces a single, self-contained HTML file:
73//! - Complete HTML5 document structure
74//! - Embedded CSS in <style> tag
75//! - No external dependencies (except optionally-linked fonts)
76//! - Mobile-responsive viewport meta tag
77//!
78//! # Lossy Conversions
79//!
80//! The following conversions may lose information on round-trip:
81//! - Lex sessions beyond level 6 → h6 with nested sections (HTML heading limit)
82//! - Lex annotations → HTML comments (exported but parsing is lossy)
83//! - Some whitespace normalization
84//!
85//! # Architecture Notes
86//!
87//! Like the Markdown implementation, we handle the nested-to-flat conversion using the IR
88//! event system (lex-babel/src/common/). HTML is more naturally hierarchical than Markdown,
89//! but sessions still require special handling as they don't map directly to HTML's heading
90//! structure.
91//!
92//! We use semantic HTML elements with CSS classes for styling rather than presentational
93//! elements.
94//!
95//! # Implementation Status
96//!
97//! - [x] Export (Lex → HTML)
98//! - [ ] Document structure with CSS embedding
99//! - [ ] Paragraph
100//! - [ ] Heading (Session) → section + heading
101//! - [ ] Bold, Italic, Code inlines
102//! - [ ] Lists - ordered/unordered
103//! - [ ] Code blocks (Verbatim) with language attribute
104//! - [ ] Definitions → dl/dt/dd
105//! - [ ] Annotations → HTML comments
106//! - [ ] Math → span with class
107//! - [ ] References → anchors with link conversion
108//! - [ ] Import (HTML → Lex)
109//! - [ ] All elements (to be implemented after export)
110
111mod serializer;
112
113use crate::error::FormatError;
114use crate::format::Format;
115use lex_core::lex::ast::Document;
116
117pub use serializer::HtmlOptions;
118
119/// Returns the default baseline CSS used for HTML export.
120///
121/// This is the same CSS embedded in all HTML exports when no custom CSS is provided.
122/// Use this to get a starting point for custom styling.
123pub fn get_default_css() -> &'static str {
124 include_str!("../../../css/baseline.css")
125}
126
127/// Format implementation for HTML
128pub struct HtmlFormat {
129 /// CSS theme to use for export
130 theme: HtmlTheme,
131}
132
133/// Available CSS themes for HTML export
134#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
135pub enum HtmlTheme {
136 /// Serif typography override (fonts only, layout comes from baseline)
137 FancySerif,
138 /// Baseline modern theme (no-op; relies on baseline.css)
139 #[default]
140 Modern,
141}
142
143impl Default for HtmlFormat {
144 fn default() -> Self {
145 Self::new(HtmlTheme::Modern)
146 }
147}
148
149impl HtmlFormat {
150 /// Create a new HTML format with the specified theme
151 pub fn new(theme: HtmlTheme) -> Self {
152 Self { theme }
153 }
154
155 /// Create HTML format with fancy serif theme
156 pub fn with_fancy_serif() -> Self {
157 Self::new(HtmlTheme::FancySerif)
158 }
159
160 /// Create HTML format with modern theme
161 pub fn with_modern() -> Self {
162 Self::new(HtmlTheme::Modern)
163 }
164}
165
166impl Format for HtmlFormat {
167 fn name(&self) -> &str {
168 "html"
169 }
170
171 fn description(&self) -> &str {
172 "HTML5 format with embedded CSS"
173 }
174
175 fn file_extensions(&self) -> &[&str] {
176 &["html", "htm"]
177 }
178
179 fn supports_parsing(&self) -> bool {
180 false // Implement after export is working
181 }
182
183 fn supports_serialization(&self) -> bool {
184 true
185 }
186
187 fn parse(&self, _source: &str) -> Result<Document, FormatError> {
188 Err(FormatError::NotSupported(
189 "HTML import not yet implemented".to_string(),
190 ))
191 }
192
193 fn serialize(&self, doc: &Document) -> Result<String, FormatError> {
194 serializer::serialize_to_html(doc, self.theme)
195 }
196
197 fn serialize_with_options(
198 &self,
199 doc: &Document,
200 options: &std::collections::HashMap<String, String>,
201 ) -> Result<crate::format::SerializedDocument, FormatError> {
202 let mut theme = self.theme;
203 if let Some(theme_str) = options.get("theme") {
204 theme = match theme_str.as_str() {
205 "fancy-serif" => HtmlTheme::FancySerif,
206 "modern" | "default" => HtmlTheme::Modern,
207 _ => {
208 // Fallback to default for unknown themes, or could error.
209 // For now, let's fallback to Modern to be safe.
210 HtmlTheme::Modern
211 }
212 };
213 }
214
215 let mut html_options = HtmlOptions::new(theme);
216
217 // Handle custom CSS option (expects CSS content, not path)
218 if let Some(css_content) = options.get("custom_css") {
219 html_options = html_options.with_custom_css(css_content.clone());
220 }
221
222 serializer::serialize_to_html_with_options(doc, html_options)
223 .map(crate::format::SerializedDocument::Text)
224 }
225}
226
227#[cfg(test)]
228mod tests {
229 use super::*;
230
231 #[test]
232 fn test_get_default_css_returns_baseline() {
233 let css = get_default_css();
234 // Should contain key selectors from baseline.css
235 assert!(css.contains(".lex-document"));
236 assert!(css.contains(".lex-paragraph"));
237 assert!(css.contains(".lex-session"));
238 // Should be non-trivial content
239 assert!(css.len() > 1000);
240 }
241
242 #[test]
243 fn test_get_default_css_is_same_as_embedded() {
244 // The CSS returned should be the exact same as what's embedded in HTML output
245 let css = get_default_css();
246 // Verify it's the actual include_str content by checking for CSS custom properties
247 assert!(css.contains("--lex-"));
248 }
249}