Expand description
HTML to DocSpec event stream reader.
This crate provides an HtmlReader that implements EventSource to convert
HTML documents into the DocSpec event stream format. It uses html5gum
to parse HTML5-compliant markup and emits typed events representing document
structure.
§Quick Start
use docspec_html_reader::{HtmlReader, EventSource};
let html = "<p>Hello world</p>";
let mut reader = HtmlReader::from_str(html);
while let Some(event) = reader.next_event()? {
println!("{event:?}");
}§Supported Elements
- Paragraphs →
StartParagraph/EndParagraph
§Unsupported Elements
All other HTML elements are silently ignored. Text content inside inline
elements (e.g., <strong>, <em>) is preserved as Text events, but
the formatting structure is dropped.
§Streaming
HtmlReader streams its source via html5gum::IoReader’s 16 KB sliding-window
buffer. Memory usage is constant regardless of document size — the document need
not fit in memory. Both HtmlReader::from_str and HtmlReader::from_reader
use this streaming path internally.
Structs§
- Html
Reader - A streaming HTML reader that implements
EventSource.
Traits§
- Event
Source - Produces a stream of
crate::Events from a document source.