anyxml
anyxml is a fully spec-conformant XML library.
Features
The current implementation supports the following features:
- parse XML 1.0 document
- DTD parsing
- Entity reference substitution (both general entity and parameter entity are supported)
- Character reference substitution
- Attribute value normalization
- Default attribute value handling
- validate XML 1.0 document using DTD
- handle namespace conforming to XML Namespace 1.0
- build, modify and serialize XML document trees
- execute XPath and lookup specific node in the document tree
- only XPath 1.0 is supported in the current implementation
- resolve an alternative URI of external identifiers or URI using XML Catalogs
Parser
You can use a SAX-like API designed with reference to Java SAX API.
The key difference from the Java API is that SAX handlers are provided solely as three traits: SAXHandler, EntityResolver and ErrorHandler.
This approach reduces opportunities to use Rc/Arc or internal mutability.
Example
use Write as _;
use ;
let mut reader = new
.set_handler
.build;
reader.parse_str.ok;
let handler = reader.handler;
assert_eq!;
Parser (Progressive)
SAX-like parsers appear to retrieve all data from a specific source at once, but in some cases, applications may want to provide data incrementally.
This crate also supports such feature, which libxml2 calls as "Push type parser" or "Progressive type parser".
In this crate, this feature is called the "Progressive Parser".
This is because "push" and "pull" are generally used as terms to classify how the parser delivers the parsing results to the application, rather than how the source is provided to the parser.
When parsing XML documents that retrieve external resources, note that the application must set the appropriate base URI for the parser before starting parsing.
By default, the current directory is set as the base URI.
Example
use ;
let mut reader = new
.set_handler
.progressive_parser
.build;
let source = br#"<greeting>Hello!!</greeting>"#;
for chunk in source.chunks
// Note that the last chunk must set `finish` to `true`.
// As shown below, it's okay for an empty chunk.
reader.parse_chunk.ok;
let handler = reader.handler;
assert_eq!;
Parser (StAX)
This crate also supports StAX (Streaming API for XML) style parser.
Unlike SAX parsers, which cannot control the timing of event reports, applications can retrieve events from StAX parsers at arbitrary moments.
StAX parser does not require event handlers, but applications can configure user-defined EntityResolver and ErrorHandler.
To capture all errors except unrecoverable fatal error, configuring ErrorHandler is mandatory. If no ErrorHandler is configured, only the last error can be retrieved.
Example
use ;
let mut reader = default;
reader
.parse_str
.unwrap;
assert!;
assert!;
assert!;
assert!;
assert!;
assert!;
Tree Manipulation
This API represents the entire XML document as a tree and provides methods for manipulating it.
When parsing an XML document to construct the tree, you can use handlers for SAX parsers.
Even without a document, you can build the tree by creating and editing various nodes starting from the Document node.
This API assumes namespace support, so it may not accept prefixed names without a specified namespace name.
Example
use ;
let mut reader = new
.set_handler
.build;
reader
.parse_str
.unwrap;
// If a fatal error occurs, the constructed tree is meaningless.
assert!;
let document = reader.handler.document;
let mut root = document.first_child.unwrap.as_element.unwrap;
assert_eq!;
let text = root.first_child.unwrap.as_text.unwrap;
assert_eq!;
// modify the document tree
root.append_child.unwrap;
// serialize the document tree
assert_eq!;
XPath execution
This crate supports XPath, enabling the search for specific nodes within the document tree.
In the current implementation, only XPath 1.0 is available; features not explicitly defined in the XPath 1.0 specification (such as functions defined in the XSLT or XPointer specifications) cannot be used.
In the following example, the evaluate_str function compiles the XPath, parses the document, and evaluates the XPath all at once.
If you use the same XPath repeatedly, you can use the compile function to obtain a precompiled XPath expression.
Example
use evaluate_str;
const DOCUMENT: &str = r#"<root>
<greeting xml:lang='en'>Hello</greeting>
<greeting xml:lang='ja'>こんにちは</greeting>
<greeting xml:lang='ch'>你好</greeting>
</root>
"#;
const XPATH: &str = "//greeting[lang('ja')]/text()";
let text = evaluate_str
.unwrap
.as_string
.unwrap;
assert_eq!;
Conformance
This crate conforms to the following specifications:
- Extensible Markup Language (XML) 1.0 (Fifth Edition)
- Namespaces in XML 1.0 (Third Edition)
- XML Base (Second Edition)
- xml:id Version 1.0
- XML Path Language (XPath) Version 1.0
- XML Catalogs (OASIS Standard V1.1, 7 October 2005)
Tests
This crate passes the following tests:
- XML Conformance Test Suites
- xml:id Conformance Test Suites
- OASIS XSLT Test Suites (for XPath)
- This link is already broken and requires the Wayback Machine to access it.
- some self-made tests