Expand description
High-performance, zero-copy, streaming XML syntax reader.
This crate tokenizes well-formed XML into fine-grained events (start tags,
attributes, text, comments, etc.) delivered through a Visitor trait.
It does not validate that xml or attribute names are legal, build a tree, resolve namespaces,
or expand entity references.
§Quick start
Implement Visitor to receive events, then feed input to a Reader:
use xml_syntax_reader::{Reader, Visitor, Span};
struct Print;
impl Visitor for Print {
type Error = std::convert::Infallible;
fn start_tag_open(&mut self, name: &[u8], _: Span) -> Result<(), Self::Error> {
println!("element: {}", String::from_utf8_lossy(name));
Ok(())
}
}
let mut reader = Reader::new();
reader.parse_slice(b"<hello/>", &mut Print).unwrap();For streaming use, call Reader::parse in a loop - it returns the
number of bytes consumed so the caller can shift the buffer and append
more data. parse_read wraps this loop for std::io::Read sources.
§Encoding
The parser operates on bytes and assumes UTF-8 input. Use
probe_encoding to detect the transport encoding (BOM / XML
declaration) and transcode if necessary before parsing.
§Input Limits
The parser enforces hardcoded limits to prevent resource exhaustion:
-
Names (element, attribute, PI target, DOCTYPE, entity references): maximum 1,000 bytes. Exceeding this produces
ErrorKind::NameTooLong. -
Character references: maximum 7 bytes for the value between
&#and;(the longest valid reference isor). Exceeding this producesErrorKind::CharRefTooLong. -
Text content, attribute values, and content bodies (comments, CDATA sections, processing instructions, and DOCTYPE declarations) are all streamed in chunks at buffer boundaries. The visitor receives zero or more content calls with contiguous spans - zero for empty constructs (e.g.
<!---->,<?target?>), and more than one when the body spans buffer boundaries. Text content (characters) is additionally interleaved withentity_ref/char_refcallbacks at reference boundaries. Attribute values are chunked at both buffer boundaries and entity/character reference boundaries, which produce separateattribute_entity_refandattribute_char_refcallbacks. There is no size limit on any of these. See theVisitortrait documentation for the full callback sequences.
Structs§
- Declared
Encoding - Encoding name extracted from the XML declaration, stored inline to avoid allocation.
- Error
- Error from the XML syntax reader.
- Probe
Result - Result of probing the encoding of an XML document.
- Reader
- Streaming XML syntax reader.
- Span
- Absolute byte range in the input stream.
startis inclusive,endis exclusive:[start, end).
Enums§
- Encoding
- Encoding detected by
probe_encoding(). - Error
Kind - Parse
Error - Result of
Reader::parse(). - Read
Error - Error type for
parse_readandparse_read_with_capacity.
Traits§
- Visitor
- Trait for receiving fine-grained XML parsing events.
Functions§
- parse_
read - Parse XML from a
std::io::Readsource. - parse_
read_ with_ capacity - Like
parse_read, but with a caller-specified buffer capacity. - probe_
encoding - Probe the encoding of an XML document from its initial bytes.