Skip to main content

Crate xml_syntax_reader

Crate xml_syntax_reader 

Source
Expand description

High-performance, zero-copy, streaming XML syntax reader.

This crate tokenizes well-formed XML into fine-grained events (start tags, attributes, text, comments, etc.) delivered through a Visitor trait. It does not validate that xml or attribute names are legal, build a tree, resolve namespaces, or expand entity references.

§Quick start

Implement Visitor to receive events, then feed input to a Reader:

use xml_syntax_reader::{Reader, Visitor, Span};

struct Print;
impl Visitor for Print {
    type Error = std::convert::Infallible;
    fn start_tag_open(&mut self, name: &[u8], _: Span) -> Result<(), Self::Error> {
        println!("element: {}", String::from_utf8_lossy(name));
        Ok(())
    }
}

let mut reader = Reader::new();
reader.parse_slice(b"<hello/>", &mut Print).unwrap();

For streaming use, call Reader::parse in a loop - it returns the number of bytes consumed so the caller can shift the buffer and append more data. parse_read wraps this loop for std::io::Read sources.

§Encoding

The parser operates on bytes and assumes UTF-8 input. Use probe_encoding to detect the transport encoding (BOM / XML declaration) and transcode if necessary before parsing.

§Input Limits

The parser enforces hardcoded limits to prevent resource exhaustion:

  • Names (element, attribute, PI target, DOCTYPE, entity references): maximum 1,000 bytes. Exceeding this produces ErrorKind::NameTooLong.

  • Character references: maximum 7 bytes for the value between &# and ; (the longest valid reference is &#x10FFFF; or &#1114111;). Exceeding this produces ErrorKind::CharRefTooLong.

  • Text content, attribute values, and content bodies (comments, CDATA sections, processing instructions, and DOCTYPE declarations) are all streamed in chunks at buffer boundaries. The visitor receives zero or more content calls with contiguous spans - zero for empty constructs (e.g. <!---->, <?target?>), and more than one when the body spans buffer boundaries. Text content (characters) is additionally interleaved with entity_ref / char_ref callbacks at reference boundaries. Attribute values are chunked at both buffer boundaries and entity/character reference boundaries, which produce separate attribute_entity_ref and attribute_char_ref callbacks. There is no size limit on any of these. See the Visitor trait documentation for the full callback sequences.

Structs§

DeclaredEncoding
Encoding name extracted from the XML declaration, stored inline to avoid allocation.
Error
Error from the XML syntax reader.
ProbeResult
Result of probing the encoding of an XML document.
Reader
Streaming XML syntax reader.
Span
Absolute byte range in the input stream. start is inclusive, end is exclusive: [start, end).

Enums§

Encoding
Encoding detected by probe_encoding().
ErrorKind
ParseError
Result of Reader::parse().
ReadError
Error type for parse_read and parse_read_with_capacity.

Traits§

Visitor
Trait for receiving fine-grained XML parsing events.

Functions§

parse_read
Parse XML from a std::io::Read source.
parse_read_with_capacity
Like parse_read, but with a caller-specified buffer capacity.
probe_encoding
Probe the encoding of an XML document from its initial bytes.