Expand description
Markdown to DocSpec event stream reader.
This crate provides a MarkdownReader that implements EventSource to convert
Markdown documents into the DocSpec event stream format. It uses pulldown-cmark
to parse CommonMark-compliant Markdown and emits typed events representing document
structure.
§Quick Start
use docspec_markdown_reader::{MarkdownReader, EventSource};
let markdown = "# Hello\n\nWorld";
let mut reader = MarkdownReader::from_str(markdown);
while let Some(event) = reader.next_event()? {
println!("{event:?}");
}§Supported Elements
- Headings (h1–h6) →
StartHeading/EndHeading - Paragraphs →
StartParagraph/EndParagraph - Block quotes →
StartBlockQuote/EndBlockQuote - Code blocks →
StartPreformatted/EndPreformatted - Bold text →
StartTextStyle { kind: Bold }/EndTextStyle - Italic text →
StartTextStyle { kind: Italic }/EndTextStyle - Inline code →
StartTextStyle { kind: Code }/EndTextStyle - Strikethrough →
StartTextStyle { kind: Strikethrough }/EndTextStyle - Images →
Image { source: Uri, alt, title, decorative } - Hard line breaks →
LineBreak - Soft line breaks →
SoftBreak - Thematic breaks →
ThematicBreak - Tables →
StartTable/EndTable,StartTableRow/EndTableRow,StartTableHeader/EndTableHeader,StartTableCell/EndTableCell(GFM column alignment syntax is parsed, but alignment data is discarded) - Bullet lists →
StartUnorderedListItem/EndUnorderedListItem - Numbered lists →
StartOrderedListItem/EndOrderedListItem(start: Option<u64>isSome(n)on the first item of each list,Noneon subsequent items; child items may nest inside their parent’sStart*/End*pair withlevelindicating indent depth; task list markers (- [ ]/- [x]) are parsed as literal text) - Links →
StartLink { href, title }/EndLink(inline, reference, collapsed, shortcut, autolink, and email autolink variants — all resolved to inline form by pulldown-cmark; image-inside-link closes the link before emitting the image as a sibling block: content preceding the image stays inside the link, content following the image is outside the link, and the link is empty only when the image is the sole link label, e.g.[](url))
§Supported Raw HTML Tags
The following raw HTML tags embedded in markdown source are translated into
DocSpec events. All attributes on these tags are silently ignored. All other
HTML tags continue to be silently dropped.
§Inline formatting (translated to StartTextStyle / EndTextStyle)
<b>,<strong>→TextStyleKind::Bold<i>,<em>→TextStyleKind::Italic<u>→TextStyleKind::Underline<s>,<strike>,<del>→TextStyleKind::Strikethrough<code>→TextStyleKind::Code<sub>→TextStyleKind::Subscript<sup>→TextStyleKind::Superscript<mark>→TextStyleKind::Markwith constant yellow#FFFF00
§Self-closing / void
<br>,<br/>,<br />→Event::LineBreak<hr>→Event::ThematicBreak(block context only; ignored in paragraph context)
§Block (only inside an HtmlBlock)
<h1>…<h6>→Event::StartHeading { level: N }+ content +Event::EndHeading
§Known limitations
- Raw HTML
<pre><code>...</code></pre>is NOT treated as a code block; the<pre>is dropped (out of scope) and the<code>becomes an inline style. Use markdown fenced code blocks instead. - HTML attributes (id, class, style, href, src, etc.) are NOT extracted.
- Unclosed tags are auto-closed at the end of the containing block.
§Unsupported Elements
The following elements are not emitted as structured events. Text content is recursively extracted where applicable; structure is silently dropped:
- Definition lists and footnotes
- Math blocks and inline math
- Subscript and superscript formatting (use
<sub>/<sup>raw HTML instead)
§Memory Model
MarkdownReader owns its source text for the parser’s lifetime. While events
are emitted one at a time via EventSource::next_event (the stream-event
guarantee is preserved), the source String is held in memory until the reader
is dropped. This is a constraint of pulldown-cmark, which is permanently
borrow-based by design (see pulldown-cmark issue #463).
For contrast, HtmlReader (from docspec-html-reader) streams its source via a
16 KB sliding-window buffer and does not hold the full document in memory.
Structs§
- Markdown
Reader - A streaming Markdown reader that implements
EventSource.
Traits§
- Event
Source - Produces a stream of
crate::Events from a document source.