Crate html5tokenizer[−][src]
Expand description
html5tokenizer
This library is a fork of the tokenizer from html5ever with the following changes:
-
The dependencies on markup5ever, tendril, mac and log were removed. This spares you about 40 build dependencies and the
unsafe
code from Tendril. -
The dependency on phf was made optional: if you don’t need to resolve named entities like
&
, you can disable thenamed-entities
feature, in which case this library does not have any dependencies (other than the standard library). -
This library takes care of appropriately switching tokenizer states based on tag names (e.g. for
script
andstyles
) … with the html5ever tokenizer you had to do this yourself. -
An optional
spans
feature has been added to make the tokenizer report the source code spans for parser errors, tag names and attributes. The feature is disabled by default. -
The API has been polished, e.g. the internal tokenizer state enums are no longer public and errors are no longer stringly typed.
If you want to parse HTML into a tree (DOM) you should by all means use html5ever, this crate is merely for those who only want an HTML5 tokenizer and seek to minimize their build dependencies (html5ever pulls in 56).
Credits
Thanks to the developers of html5ever for their awesome parser!
Modules
Types to represent the parser errors that can occur.
Structs
A tag attribute, e.g. class="test"
in <div class="test" ...>
.
A queue of owned string buffers, which supports incrementally consuming characters.
A DOCTYPE
token.
A tag token.
The HTML tokenizer.
Tokenizer options, with an impl for Default
.
Enums
Traits
Types which can receive tokens from the tokenizer.