uppsala 0.4.0

A pure Rust XML parser, DOM, namespace, XPath, and XSD validation library
Documentation

Uppsala

A zero-dependency pure Rust XML library.

Uppsala implements the core XML stack from parsing through schema validation, with no external crates -- not even in dev-dependencies. Everything is built from scratch: the parser, the DOM, the XPath engine, the XSD validator, and even the regex engine used for XSD pattern facets.

Features

  • XML 1.0 (Fifth Edition) parsing and well-formedness checking
  • Namespaces in XML 1.0 (Third Edition) with prefix resolution and scoping
  • Arena-based DOM with tree mutation (insert, remove, replace)
  • XPath 1.0 evaluation (all axes, functions, predicates, operators)
  • XSD 1.1 validation (structures + datatypes, 40+ built-in types)
  • XSD regex engine (custom NFA matcher for pattern facets)
  • SIMD-accelerated parsing (SSE2 on x86_64, scalar fallback elsewhere)
  • Serialization with round-trip fidelity, pretty-printing, and streaming output
  • XmlWriter for imperative XML construction without a DOM
  • UTF-16 auto-detection (LE/BE with or without BOM)

Conformance

Uppsala is tested against the W3C conformance suites:

Suite Pass Rate Tests
W3C XML Conformance (not-wf) 100% 631/631
W3C XML Conformance (valid) 100% 531/531
W3C XML Conformance (invalid) 100% 46/46
W3C XSD -- NIST Datatypes 100% 19,217/19,217
W3C XSD -- Sun Combined 100% 199/199
W3C XSD -- MS DataTypes 100% 1,212/1,212

In addition there are 274 hand-crafted tests covering XML parsing, namespaces, XPath evaluation, XSD validation, serialization round-trips, and source ranges.

# Run all tests
cargo test

# Run W3C XML Conformance Suite (~1208 tests)
cargo test --test w3c_xmlconf

# Run W3C XML Schema Test Suite (~20156 tests)
cargo test --test w3c_xsts -- --nocapture

Performance

We need someone to do a full benchmark in a proper environment. The following is in an Ubuntu 24.04 VM.

Uppsala uses SSE2 SIMD intrinsics on x86_64 to scan text content and attribute values 16 bytes at a time, with a scalar fallback for other architectures. Combined with lookup-table optimizations and zero-copy parsing, this makes it faster than roxmltree across all document sizes:

File Size vs roxmltree
gigantic.svg 1.3 MB 5.3x faster
text.xml 126 KB 9.3x faster
attributes.xml 265 KB 2.0x faster
medium.svg 152 KB 1.4x faster
huge.xml 815 KB 1.2x faster
SAML files 3-11 KB 1.5-1.8x faster

Text-heavy documents benefit most from SIMD -- long runs of plain text between markup are scanned with minimal per-byte overhead.

Is this really fast? Maybe, maybe not. But it is good enough for my use cases right now.

Usage

Add to your Cargo.toml:

[dependencies]
uppsala = "0.3"

Parse and query

use uppsala::{parse, XPathEvaluator};
use uppsala::xpath::XPathValue;

let xml = r#"
<bookstore>
  <book category="fiction">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <price>10.99</price>
  </book>
  <book category="non-fiction">
    <title>Sapiens</title>
    <author>Yuval Noah Harari</author>
    <price>14.99</price>
  </book>
</bookstore>
"#;

let mut doc = parse(xml).unwrap();

// DOM traversal
let titles = doc.get_elements_by_tag_name("title");
for id in &titles {
    println!("{}", doc.text_content_deep(*id));
}

// XPath queries
doc.prepare_xpath();
let eval = XPathEvaluator::new();
let root = doc.root();
if let Ok(XPathValue::NodeSet(nodes)) =
    eval.evaluate(&doc, root, "//book[@category='fiction']/title")
{
    for id in &nodes {
        println!("Fiction: {}", doc.text_content_deep(*id));
    }
}

Validate against an XSD schema

use uppsala::{parse, XsdValidator};

let schema_xml = r#"
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="temperature" type="xs:decimal"/>
</xs:schema>
"#;

let instance_xml = "<temperature>36.6</temperature>";

let schema_doc = parse(schema_xml).unwrap();
let instance_doc = parse(instance_xml).unwrap();
let validator = XsdValidator::from_schema(&schema_doc).unwrap();
let errors = validator.validate(&instance_doc);

if errors.is_empty() {
    println!("Valid!");
} else {
    for e in &errors {
        println!("Validation error: {}", e);
    }
}

Build XML with XmlWriter

use uppsala::XmlWriter;

let mut w = XmlWriter::new();
w.write_declaration();
w.start_element("catalog", &[("xmlns", "urn:example:catalog")]);
w.start_element("item", &[("id", "1")]);
w.text("Widget");
w.end_element("item");
w.empty_element("item", &[("id", "2"), ("name", "Gadget")]);
w.end_element("catalog");

println!("{}", w.into_string());

Pretty-print a document

use uppsala::{parse, XmlWriteOptions};

let xml = "<root><a><b>text</b></a></root>";
let doc = parse(xml).unwrap();
let opts = XmlWriteOptions::pretty("  ");
println!("{}", doc.to_xml_with_options(&opts));

Architecture

Uppsala uses an arena-based DOM where all nodes live in a flat Vec<NodeData> indexed by NodeId(usize). Tree relationships are maintained through parent/first_child/last_child/next_sibling/prev_sibling indices. This avoids Rc/RefCell overhead and makes tree mutation straightforward.

src/
  lib.rs            Public API, parse(), parse_bytes(), encoding detection
  error.rs          XmlError enum, XmlResult type alias
  dom.rs            Arena-based DOM: Document, NodeId, QName, serialization
  parser.rs         XML 1.0 recursive-descent parser with full DTD internal subset
  simd.rs           SSE2-accelerated byte scanning (content + attribute delimiters)
  namespace.rs      Namespace prefix resolution with scope stack
  writer.rs         XmlWriter imperative builder
  xpath.rs          XPath 1.0 lexer, parser, and evaluator
  xsd/              XSD validator (split into submodules)
    mod.rs          Module declarations, re-exports
    types.rs        Core data structures (XsdValidator, ElementDecl, TypeDef, etc.)
    builder.rs      Multi-pass schema builder
    parser.rs       Schema element/type/attribute/group parsing
    validation.rs   Instance document validation
    builtins.rs     Built-in type validation, facet enforcement
    composition.rs  xs:include, xs:redefine, xs:import
    identity.rs     xs:key, xs:unique, xs:keyref
    datetime.rs     Date/time/duration validation
    decimal.rs      Arbitrary-precision decimal comparison
  xsd_regex.rs      XSD regex pattern engine (custom NFA matcher)

Examples

The examples/ directory contains runnable programs:

# Parse XML, traverse the DOM, and run XPath queries
cargo run --example parse_and_query

# Validate documents against XSD schemas
cargo run --example validate_schema

# Build XML programmatically with XmlWriter and DOM
cargo run --example build_xml

Test Data Licensing

The test-data/ directory contains third-party conformance test suites. These files are not covered by Uppsala's BSD-2-Clause license; they retain their original licenses as described below.

W3C XML Conformance Test Suite

W3C XML Schema Test Suite (XSTS)

License

Uppsala itself is licensed under the BSD-2-Clause license. See LICENSE for details.