html-cat 0.1.0

HTML5 parser: tokenizer + tree builder producing a Document tree of Element/Text/Comment nodes. No mut, no Rc/Arc, no interior mutability, no panics, exhaustive matches. First sub-crate of a Servo-replacement webview runtime targeting Tauri.
docs.rs failed to build html-cat-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

html-cat

HTML5 parser: tokenizer + tree builder producing a Document tree of Element/Text/Comment nodes.

html-cat is the first sub-crate of a comp-cat-rs Servo-replacement webview runtime targeting Tauri integration. Same framework constraints as the rest of the stack: no mut, no Rc/Arc, no interior mutability, no panics, exhaustive matches, static dispatch.

Example

use html_cat::{parse, Error};

fn main() -> Result<(), Error> {
    let doc = parse("<!DOCTYPE html><html><body><p>hi</p></body></html>")?;
    assert_eq!(doc.root().children().len(), 1);
    Ok(())
}

v0 scope

  • Standard HTML5 tokenizer state machine (Data, Tag, AttrName, AttrValue, Comment, Doctype, raw-text contexts).
  • Tree-builder insertion modes: Initial, BeforeHtml, BeforeHead, InHead, InBody, AfterBody.
  • Void elements (br, img, input, meta, link, ...).
  • Named char references for the common set (amp, lt, gt, quot, apos, nbsp) plus numeric refs.
  • Doctype recognition (HTML5).

Deferred to v0.2+

  • Full named-entity table.
  • Foreign content (SVG, MathML).
  • Template element semantics.
  • Adoption agency algorithm for misnested formatting elements.
  • document.write integration.
  • Streaming parser.

License

MIT OR Apache-2.0