Skip to main content

Crate docspec_html_reader

Crate docspec_html_reader 

Source
Expand description

HTML to DocSpec event stream reader.

This crate provides an HtmlReader that implements EventSource to convert HTML documents into the DocSpec event stream format. It uses html5gum to parse HTML5-compliant markup and emits typed events representing document structure.

§Quick Start

use docspec_html_reader::{HtmlReader, EventSource};

let html = "<p>Hello world</p>";
let mut reader = HtmlReader::from_str(html);

while let Some(event) = reader.next_event()? {
    println!("{event:?}");
}

§Supported Elements

  • Paragraphs → StartParagraph / EndParagraph

§Unsupported Elements

All other HTML elements are silently ignored. Text content inside inline elements (e.g., <strong>, <em>) is preserved as Text events, but the formatting structure is dropped.

§Streaming

HtmlReader streams its source via html5gum::IoReader’s 16 KB sliding-window buffer. Memory usage is constant regardless of document size — the document need not fit in memory. Both HtmlReader::from_str and HtmlReader::from_reader use this streaming path internally.

Structs§

HtmlReader
A streaming HTML reader that implements EventSource.

Traits§

EventSource
Produces a stream of crate::Events from a document source.