pub trait Emitter {
type Token;
Show 26 methods
fn set_last_start_tag(&mut self, last_start_tag: Option<&[u8]>);
fn emit_eof(&mut self);
fn emit_error(&mut self, error: Error);
fn pop_token(&mut self) -> Option<Self::Token>;
fn emit_string(&mut self, c: &[u8]);
fn init_start_tag(&mut self);
fn init_end_tag(&mut self);
fn init_comment(&mut self);
fn emit_current_tag(&mut self) -> Option<State>;
fn emit_current_comment(&mut self);
fn emit_current_doctype(&mut self);
fn set_self_closing(&mut self);
fn set_force_quirks(&mut self);
fn push_tag_name(&mut self, s: &[u8]);
fn push_comment(&mut self, s: &[u8]);
fn push_doctype_name(&mut self, s: &[u8]);
fn init_doctype(&mut self);
fn init_attribute(&mut self);
fn push_attribute_name(&mut self, s: &[u8]);
fn push_attribute_value(&mut self, s: &[u8]);
fn set_doctype_public_identifier(&mut self, value: &[u8]);
fn set_doctype_system_identifier(&mut self, value: &[u8]);
fn push_doctype_public_identifier(&mut self, s: &[u8]);
fn push_doctype_system_identifier(&mut self, s: &[u8]);
fn current_is_appropriate_end_tag_token(&mut self) -> bool;
fn adjusted_current_node_present_but_not_in_html_namespace(
&mut self
) -> bool { ... }
}Expand description
An emitter is an object providing methods to the tokenizer to produce tokens.
Domain-specific applications of the HTML tokenizer can manually implement this trait to customize per-token allocations, or avoid them altogether.
An emitter is assumed to have these internal states:
- last start tag: The most recently emitted start tag’s name
- current token: Can be a tag, doctype or comment token. There’s only one current token.
- current attribute: The currently processed HTML attribute, consisting of two strings for name and value.
The following methods are describing what kind of behavior the WHATWG spec expects, but that doesn’t mean you need to follow it. For example:
-
If your usage of the tokenizer will ignore all errors, none of the error handling and validation requirements apply to you. You can implement
emit_erroras noop and omit all checks that would emit errors. -
If you don’t care about attributes at all, you can make all related methods a noop.
The state machine needs to have a functional implementation of
current_is_appropriate_end_tag_token to do correct transitions, however.
Required Associated Types
The token type emitted by this emitter. This controls what type of values the crate::Tokenizer
yields when used as an iterator.
Required Methods
fn set_last_start_tag(&mut self, last_start_tag: Option<&[u8]>)
fn set_last_start_tag(&mut self, last_start_tag: Option<&[u8]>)
Set the name of the last start tag.
This is primarily for testing purposes. This is not supposed to override the tag name of the current tag.
The state machine has reached the end of the file. It will soon call pop_token for the
last time.
fn emit_error(&mut self, error: Error)
fn emit_error(&mut self, error: Error)
A (probably recoverable) parsing error has occured.
After every state change, the tokenizer calls this method to retrieve a new token that can be returned via the tokenizer’s iterator interface.
fn emit_string(&mut self, c: &[u8])
fn emit_string(&mut self, c: &[u8])
Emit a bunch of plain characters as character tokens.
fn init_start_tag(&mut self)
fn init_start_tag(&mut self)
Set the current token to a start tag.
fn init_end_tag(&mut self)
fn init_end_tag(&mut self)
Set the current token to an end tag.
fn init_comment(&mut self)
fn init_comment(&mut self)
Set the current token to a comment.
fn emit_current_tag(&mut self) -> Option<State>
fn emit_current_tag(&mut self) -> Option<State>
Emit the current token, assuming it is a tag.
Also get the current attribute and append it to the to-be-emitted tag. See docstring for
Emitter::init_attribute for how duplicates should be handled.
If a start tag is emitted, update the last start tag.
If the current token is not a start/end tag, this method may panic.
The return value is used to switch the tokenizer to a new state. Used in tree building.
If this method always returns None, states are never switched, which leads to artifacts
like contents of <script> tags being incorrectly interpreted as HTML.
It’s not possible to implement this method correctly in line with the spec without implementing a full-blown tree builder as per tree construction, which this crate does not offer.
You can approximate correct behavior using naive_next_state, but the caveats of doing
so are not well-understood.
See the tokenize_with_state_switches cargo example for a practical example where this
matters.
fn emit_current_comment(&mut self)
fn emit_current_comment(&mut self)
Emit the current token, assuming it is a comment.
If the current token is not a comment, this method may panic.
fn emit_current_doctype(&mut self)
fn emit_current_doctype(&mut self)
Emit the current token, assuming it is a doctype.
If the current token is not a doctype, this method may panic.
fn set_self_closing(&mut self)
fn set_self_closing(&mut self)
Assuming the current token is a start tag, set the self-closing flag.
If the current token is not a start or end tag, this method may panic.
If the current token is an end tag, the emitter should emit the
crate::Error::EndTagWithTrailingSolidus error.
fn set_force_quirks(&mut self)
fn set_force_quirks(&mut self)
Assuming the current token is a doctype, set its “force quirks” flag to true.
If the current token is not a doctype, this method pay panic.
fn push_tag_name(&mut self, s: &[u8])
fn push_tag_name(&mut self, s: &[u8])
Assuming the current token is a start/end tag, append a string to the current tag’s name.
If the current token is not a start or end tag, this method may panic.
fn push_comment(&mut self, s: &[u8])
fn push_comment(&mut self, s: &[u8])
Assuming the current token is a comment, append a string to the comment’s contents.
If the current token is not a comment, this method may panic.
fn push_doctype_name(&mut self, s: &[u8])
fn push_doctype_name(&mut self, s: &[u8])
Assuming the current token is a doctype, append a string to the doctype’s name.
If the current token is not a doctype, this method may panic.
fn init_doctype(&mut self)
fn init_doctype(&mut self)
Set the current token to a new doctype token:
- the name should be empty
- the “public identifier” should be null (different from empty)
- the “system identifier” should be null (different from empty)
- the “force quirks” flag should be
false
fn init_attribute(&mut self)
fn init_attribute(&mut self)
Set the current attribute to a new one, starting with empty name and value strings.
The old attribute, if any, should be put on the current token. If an attribute with that
name already exists, WHATWG says the new one should be ignored and a
crate::Error::DuplicateAttribute error should be emitted.
If the current token is an end tag token, a crate::Error::EndTagWithAttributes error should be
emitted.
If the current token is no tag at all, this method may panic.
fn push_attribute_name(&mut self, s: &[u8])
fn push_attribute_name(&mut self, s: &[u8])
Append a string to the current attribute’s name.
If there is no current attribute, this method may panic.
fn push_attribute_value(&mut self, s: &[u8])
fn push_attribute_value(&mut self, s: &[u8])
Append a string to the current attribute’s value.
If there is no current attribute, this method may panic.
fn set_doctype_public_identifier(&mut self, value: &[u8])
fn set_doctype_public_identifier(&mut self, value: &[u8])
Assuming the current token is a doctype, set its “public identifier” to the given string.
If the current token is not a doctype, this method may panic.
fn set_doctype_system_identifier(&mut self, value: &[u8])
fn set_doctype_system_identifier(&mut self, value: &[u8])
Assuming the current token is a doctype, set its “system identifier” to the given string.
If the current token is not a doctype, this method may panic.
fn push_doctype_public_identifier(&mut self, s: &[u8])
fn push_doctype_public_identifier(&mut self, s: &[u8])
Assuming the current token is a doctype, append a string to its “public identifier” to the given string.
If the current token is not a doctype, this method may panic.
fn push_doctype_system_identifier(&mut self, s: &[u8])
fn push_doctype_system_identifier(&mut self, s: &[u8])
Assuming the current token is a doctype, append a string to its “system identifier” to the given string.
If the current token is not a doctype, this method may panic.
fn current_is_appropriate_end_tag_token(&mut self) -> bool
fn current_is_appropriate_end_tag_token(&mut self) -> bool
Return true if all of these hold. Return false otherwise.
- the current token is an end tag
- the last start tag exists
- the current end tag token’s name equals to the last start tag’s name.
See also WHATWG’s definition of “appropriate end tag token”.
Provided Methods
fn adjusted_current_node_present_but_not_in_html_namespace(&mut self) -> bool
fn adjusted_current_node_present_but_not_in_html_namespace(&mut self) -> bool
By default, this always returns false and thus all CDATA sections are tokenized as bogus comments.