anyxml 0.2.0

A fully spec-conformant XML library
Documentation

anyxml

anyxml is a fully spec-conformant XML library.

Features

The current implementation supports the following features:

  • parse XML 1.0 document
    • DTD parsing
    • Entity reference substitution (both general entity and parameter entity are supported)
    • Character reference substitution
    • Attribute value normalization
    • Default attribute value handling
  • validate XML 1.0 document using DTD
  • handle namespace conforming to XML Namespace 1.0

Parser

You can use a SAX-like API designed with reference to Java SAX API.

The key difference from the Java API is that SAX handlers are provided solely as two traits: SAXHandler and EntityResolver.
This approach reduces opportunities to use Rc/Arc or internal mutability.

Example

use std::fmt::Write as _;

use anyxml::sax::{
    attributes::Attributes,
    handler::{EntityResolver, SAXHandler},
    parser::XMLReaderBuilder,
};

#[derive(Default)]
struct ExampleHandler {
    buffer: String,
}
impl EntityResolver for ExampleHandler {}
impl SAXHandler for ExampleHandler {
    fn start_document(&mut self) {
        writeln!(self.buffer, "start document").ok();
    }
    fn end_document(&mut self) {
        writeln!(self.buffer, "end document").ok();
    }

    fn start_element(
        &mut self,
        _uri: Option<&str>,
        _local_name: Option<&str>,
        qname: &str,
        _atts: &Attributes,
    ) {
        writeln!(self.buffer, "start element {qname}").ok();
    }
    fn end_element(
        &mut self,
        _uri: Option<&str>,
        _local_name: Option<&str>,
        qname: &str
    ) {
        writeln!(self.buffer, "end element {qname}").ok();
    }

    fn characters(&mut self, data: &str) {
        writeln!(self.buffer, "characters '{data}'").ok();
    }
}

let mut reader = XMLReaderBuilder::new()
    .set_handler(ExampleHandler::default())
    .build();
reader.parse_str(r#"<?xml version="1.0"?><greeting>Hello!!</greeting>"#, None).ok();

let handler = reader.handler;
assert_eq!(r#"start document
start element greeting
characters 'Hello!!'
end element greeting
end document
"#, handler.buffer);

Parser (Progressive)

SAX-like parsers appear to retrieve all data from a specific source at once, but in some cases, applications may want to provide data incrementally.
This crate also supports such functionality, which libxml2 calls as "Push type parser" or "Progressive type parser".

In this crate, this feature is called the "Progressive Parser".
This is because "push" and "pull" are generally used as terms to classify how the parser delivers the parsing results to the application, rather than how the source is provided to the parser.

When parsing XML documents that retrieve external resources, note that the application must set the appropriate base URI for the parser before starting parsing.
By default, the current directory is set as the base URI.

Example

use std::fmt::Write as _;

use anyxml::sax::{
    attributes::Attributes,
    handler::DebugHandler,
    parser::XMLReaderBuilder,
};

let mut reader = XMLReaderBuilder::new()
    .set_handler(DebugHandler::default())
    .progressive_parser()
    .build();
let source = br#"<greeting>Hello!!</greeting>"#;

for chunk in source.chunks(5) {
    reader.parse_chunk(chunk, false).ok();
}
// Note that the last chunk must set `finish` to `true`.
// As shown below, it's okay for an empty chunk.
reader.parse_chunk([], true).ok();

let handler = reader.handler;
assert_eq!(r#"setDocumentLocator()
startDocument()
startElement(None, greeting, greeting)
characters(Hello!!)
endElement(None, greeting, greeting)
endDocument()
"#, handler.buffer);

Conformance

This crate conforms to the following specifications:

Tests

This crate passes the following tests: