docspec-html-reader 1.8.0

HTML to DocSpec event stream reader
Documentation

docspec-html-reader

Streaming HTML to DocSpec event stream reader.

See the main DocSpec repository for documentation, architecture, and the event protocol.

Supported Elements

  • Paragraphs (<p>)
  • Emits exactly: StartDocument, StartParagraph, Text, EndParagraph, EndDocument

Out of Scope (silently dropped)

All other HTML elements are silently ignored. Text content inside inline elements (e.g., <strong>, <em>) is preserved as Text events, but the formatting structure is dropped.

Streaming Guarantee

HtmlReader streams its source via html5gum::IoReader's 16 KB sliding-window buffer. Memory usage is constant regardless of document size — the document need not fit in memory. Both from_str and from_reader use this streaming path internally.

Quick Start

use docspec_html_reader::{HtmlReader, EventSource};

let mut reader = HtmlReader::from_str("<p>Hello world</p>");
while let Some(event) = reader.next_event()? {
    println!("{event:?}");
}
# Ok::<(), docspec_core::Error>(())

From a file or any Read + Seek source:

use std::fs::File;
use docspec_html_reader::{HtmlReader, EventSource};

let file = File::open("document.html")?;
let mut reader = HtmlReader::from_reader(file)?;
while let Some(event) = reader.next_event()? {
    println!("{event:?}");
}
# Ok::<(), docspec_core::Error>(())

See Also