Crate elementtree [] [src]

A simple library for parsing an XML file into an in-memory tree structure

Not recommended for large XML files, as it will load the entire file into memory.

Installation

[dependencies]
elementtree = "0"

Reading

For reading XML you can use the Element::from_reader method which will parse from a given reader. Afterwards you end up with a fancy element tree that can be accessed in various different ways.

You can use ("ns", "tag") or {ns}tag to refer to fully qualified elements.

let root = Element::from_reader(r#"<?xml version="1.0"?>
<root xmlns="tag:myns" xmlns:foo="tag:otherns">
    <list a="1" b="2" c="3">
        <item foo:attr="foo1"/>
        <item foo:attr="foo2"/>
        <item foo:attr="foo3"/>
    </list>
</root>
"#.as_bytes()).unwrap();
let list = root.find("{tag:myns}list").unwrap();
for child in list.find_all("{tag:myns}item") {
    println!("attribute: {}", child.get_attr("{tag:otherns}attr").unwrap());
}

Writing

Writing is easy as well but if you work with namespaces you will need to register them with the root. If namespaces are not used yet they will otherwise be registered with an empty (and once that is used a random prefix) on the element itself which will blow up the XML size.

Most methods for modification support chaining in one form or another which makes modifications slightly more ergonomic.

let ns = "http://example.invalid/#myns";
let other_ns = "http://example.invalid/#otherns";

let mut root = Element::new((ns, "mydoc"));
root.set_namespace_prefix(other_ns, "other");

{
    let mut list = root.append_new_child((ns, "list"));
    for x in 0..3 {
        list.append_new_child((ns, "item"))
            .set_text(format!("Item {}", x))
            .set_attr((other_ns, "id"), x.to_string());
    }
}

Design Notes

This library largely follows the ideas of Python's ElementTree but it has some specific changes that simplify the model for Rust. In particular nodes do not know about their parents or siblings. While this obviously reduces a lot of what would be possible with the library it significantly simplifies memory management and the external API.

If you are coming from a DOM environment the following differences are the most striking:

  • There are no text nodes, instead text is stored either in the text attribute of a node or in the tail of a child node. This means that for most situations just working with the text is what you want and you can ignore the existence of the tail.
  • tags and attributes are implemented through a QName abstraction that simplifies working wiht namespaces. Most APIs just accept strings and will create QNames automatically.
  • namespace prefixes never play a role and are in fact not really exposed. Instead all namespaces are managed through their canonical identifier.

Notes on Namespaces

Namespaces are internally tracked in a shared map attached to elements. The map is not exposed but when an element is created another element can be passed in and the namespace map is copied over. Internally a copy on write mechanism is used so when changes are performed on the namespace the namespaces will be copied and the writer will emit them accordingly.

Namespaces need to be registered or the XML generated will be malformed.

Structs

Attrs

An iterator over attributes of an element.

Children

An iterator over children of an element.

Element

Represents an XML element.

FindChildren

An iterator over matching children.

Position

Represents a position in the source.

QName

A QName represents a qualified name.

Enums

Error

Errors that can occur parsing XML

Traits

AsQName

Convenience trait to get a QName from an object.