xmloxide
A pure Rust reimplementation of libxml2 — the de facto standard XML/HTML parsing library in the open-source world.
libxml2 became officially unmaintained in December 2025 with known security issues. xmloxide aims to be a memory-safe, high-performance replacement that passes the same conformance test suites.
Features
- Memory-safe — arena-based tree with zero
unsafein the public API - Conformant — 100% pass rate on the W3C XML Conformance Test Suite (1727/1727 applicable tests)
- Error recovery — parse malformed XML and still produce a usable tree, just like libxml2
- Multiple parsing APIs — DOM tree, SAX2 streaming, XmlReader pull, push/incremental
- HTML parser — error-tolerant HTML 4.01 parsing with auto-closing and void elements
- WHATWG HTML5 parser — full HTML Living Standard tokenizer and tree builder (8810/8810 html5lib-tests passing)
- HTML5 streaming — SAX-like callback API for HTML5 (
html5::sax) that wraps the tokenizer without building a DOM tree - CSS selectors — query elements with familiar CSS syntax (
css::select) including combinators, pseudo-classes, and fast#idlookup - XPath 1.0+ — full expression parser and evaluator with all XPath 1.0 core functions plus key XPath 2.0 functions (
matches(),replace(),tokenize(),upper-case(),lower-case(),abs(),min(),max(), and more) - Validation — DTD, RelaxNG, XML Schema (XSD), and ISO Schematron (ISO/IEC 19757-3) validation
- Serde integration — optional
serdefeature for XML (de)serialization to/from Rust types - Async parsing — optional
asyncfeature for parsing fromtokio::io::AsyncReadsources - Canonical XML — C14N 1.0 and Exclusive C14N serialization
- XInclude — document inclusion processing
- XML Catalogs — OASIS XML Catalogs for URI resolution
xmllintCLI — command-line tool for parsing, validating, and querying XML- Zero-copy where possible — string interning for fast comparisons
- No global state — each
Documentis self-contained andSend + Sync - C/C++ FFI — full C API with header file (
include/xmloxide.h) for embedding in C/C++ projects - Minimal dependencies — only
encoding_rs(library has zero other deps;clapis CLI-only)
Quick Start
use Document;
let doc = parse_str.unwrap;
let root = doc.root_element.unwrap;
assert_eq!;
assert_eq!;
Serialization
use Document;
use serialize;
let doc = parse_str.unwrap;
let xml = serialize;
assert_eq!;
XPath Queries
use Document;
use ;
let doc = parse_str.unwrap;
let root = doc.root_element.unwrap;
let result = evaluate.unwrap;
assert_eq!;
SAX2 Streaming
use ;
use ParseOptions;
;
parse_sax.unwrap;
HTML Parsing
use parse_html;
let doc = parse_html.unwrap;
let root = doc.root_element.unwrap;
assert_eq!;
CSS Selectors
use select;
use Document;
let doc = parse_str.unwrap;
let root = doc.root_element.unwrap;
let intros = select.unwrap;
assert_eq!;
assert_eq!;
HTML5 Parsing (WHATWG)
use parse_html5;
let doc = parse_html5.unwrap;
let root = doc.root_element.unwrap;
assert_eq!;
Fragment parsing (the algorithm behind innerHTML) is also supported:
use ;
let opts = Html5ParseOptions ;
let doc = parse_html5_with_options.unwrap;
HTML5 Streaming (SAX-like)
use ;
let mut handler = LinkExtractor ;
parse_html5_sax;
assert_eq!;
Error Recovery
use ;
let opts = default.recover;
let doc = parse_str_with_options.unwrap;
for diag in &doc.diagnostics
CLI Tool
# Parse and pretty-print
# Validate against a schema
# XPath query
# Canonical XML
# Parse HTML
Module Overview
| Module | Description |
|---|---|
tree |
Arena-based DOM tree (Document, NodeId, NodeKind) |
parser |
XML 1.0 recursive descent parser with error recovery |
parser::push |
Push/incremental parser for chunked input |
html |
Error-tolerant HTML 4.01 parser |
html5 |
WHATWG HTML Living Standard parser (tokenizer + tree builder) |
html5::sax |
Streaming SAX-like API for HTML5 (no DOM tree built) |
css |
CSS selector engine for querying document trees |
sax |
SAX2 streaming event-driven parser |
reader |
XmlReader pull-based parsing API |
serial |
XML, HTML, and HTML5 serializers, plus Canonical XML (C14N) |
xpath |
XPath 1.0+ expression parser and evaluator |
validation::dtd |
DTD parsing and validation |
validation::relaxng |
RelaxNG schema validation |
validation::xsd |
XML Schema (XSD) validation |
validation::schematron |
ISO Schematron rule-based validation |
serde_xml |
Serde XML (de)serialization (optional serde feature) |
async_xml |
Async parsing via tokio::io::AsyncRead (optional async feature) |
xinclude |
XInclude 1.0 document inclusion |
catalog |
OASIS XML Catalogs for URI resolution |
encoding |
Character encoding detection and transcoding |
ffi |
C/C++ FFI bindings (include/xmloxide.h) |
Performance
Parsing throughput is competitive with libxml2 — within 3-4% on most documents, and 12% faster on SVG. Serialization is 1.5-2.4x faster thanks to the arena-based tree design. XPath is 1.1-2.7x faster across all benchmarks.
Parsing:
| Document | Size | xmloxide | libxml2 | Result |
|---|---|---|---|---|
| Atom feed | 4.9 KB | 26.7 µs (176 MiB/s) | 25.5 µs (184 MiB/s) | ~4% slower |
| SVG drawing | 6.3 KB | 58.5 µs (103 MiB/s) | 65.6 µs (92 MiB/s) | 12% faster |
| Maven POM | 11.5 KB | 76.9 µs (142 MiB/s) | 74.2 µs (148 MiB/s) | ~4% slower |
| XHTML page | 10.2 KB | 69.5 µs (139 MiB/s) | 61.5 µs (157 MiB/s) | ~13% slower |
| Large (374 KB) | 374 KB | 2.15 ms (169 MiB/s) | 2.08 ms (175 MiB/s) | ~3% slower |
Serialization:
| Document | Size | xmloxide | libxml2 | Result |
|---|---|---|---|---|
| Atom feed | 4.9 KB | 11.3 µs | 17.5 µs | 1.5x faster |
| Maven POM | 11.5 KB | 20.1 µs | 47.5 µs | 2.4x faster |
| Large (374 KB) | 374 KB | 614 µs | 1397 µs | 2.3x faster |
XPath:
| Expression | xmloxide | libxml2 | Result |
|---|---|---|---|
Simple path (//entry/title) |
1.51 µs | 1.63 µs | 8% faster |
Attribute predicate (//book[@id]) |
5.91 µs | 15.99 µs | 2.7x faster |
count() function |
1.09 µs | 1.67 µs | 1.5x faster |
string() function |
1.32 µs | 1.77 µs | 1.3x faster |
Key optimizations: arena-based tree for fast serialization, byte-level pre-checks for character validation, bulk text scanning, ASCII fast paths for name parsing, zero-copy element name splitting, inline entity resolution, XPath // step fusion with fused axis expansion, inlined tree accessors, and name-test fast paths for child/descendant axes.
# Run benchmarks (requires libxml2 system library)
Testing
- 1078 unit tests across all modules
- 138 FFI tests covering the full C API surface (including SAX, Schematron, and CSS)
- libxml2 compatibility suite — 119/119 tests passing (100%) covering XML parsing, namespaces, error detection, and HTML parsing
- W3C XML Conformance Test Suite — 1727/1727 applicable tests passing (100%)
- html5lib-tests — 7032/7032 tokenizer tests + 1778/1778 tree construction tests (100%)
- Integration tests covering real-world XML/HTML documents, edge cases, and error recovery
C/C++ FFI
xmloxide provides a C-compatible API for embedding in C/C++ projects (like Chromium, game engines, or any codebase that currently uses libxml2).
# Build shared + static libraries (uses the included Makefile)
# Or build individually:
# Build and run the C example
xmloxide_document *doc = ;
uint32_t root = ;
char *name = ; // "root"
char *text = ; // "Hello"
;
;
;
The full API — including tree navigation and mutation, XPath evaluation, serialization (plain and pretty-printed), HTML/HTML5 parsing, DTD/RelaxNG/XSD/Schematron validation, C14N, SAX streaming, XmlReader, push parser, and XML Catalogs — is declared in include/xmloxide.h.
Migrating from libxml2
| libxml2 | xmloxide (Rust) | xmloxide (C FFI) |
|---|---|---|
xmlReadMemory |
Document::parse_str |
xmloxide_parse_str |
xmlReadFile |
Document::parse_file |
xmloxide_parse_file |
xmlParseDoc |
Document::parse_bytes |
xmloxide_parse_bytes |
htmlReadMemory |
html::parse_html |
xmloxide_parse_html |
| (HTML5 parsing) | html5::parse_html5 |
— |
| (HTML5 fragment / innerHTML) | html5::parse_html5_with_options |
— |
| (HTML5 streaming) | html5::sax::parse_html5_sax |
— |
(CSS selectors / querySelector) |
css::select |
— |
xmlFreeDoc |
(drop Document) |
xmloxide_free_doc |
xmlDocGetRootElement |
doc.root_element() |
xmloxide_doc_root_element |
xmlNodeGetContent |
doc.text_content(id) |
xmloxide_node_text_content |
xmlNodeSetContent |
doc.set_text_content(id, s) |
xmloxide_set_text_content |
xmlGetProp |
doc.attribute(id, name) |
xmloxide_node_attribute |
xmlSetProp |
doc.set_attribute(...) |
xmloxide_set_attribute |
xmlNewNode |
doc.create_node(...) |
xmloxide_create_element |
xmlNewText |
doc.create_node(Text{..}) |
xmloxide_create_text |
xmlAddChild |
doc.append_child(p, c) |
xmloxide_append_child |
xmlAddPrevSibling |
doc.insert_before(ref, c) |
xmloxide_insert_before |
xmlUnlinkNode |
doc.remove_node(id) |
xmloxide_remove_node |
xmlCopyNode |
doc.clone_node(id, deep) |
xmloxide_clone_node |
xmlGetID |
doc.element_by_id(s) |
xmloxide_element_by_id |
xmlDocDumpMemory |
serial::serialize(&doc) |
xmloxide_serialize |
xmlDocDumpFormatMemory |
serial::serialize_with_options |
xmloxide_serialize_pretty |
htmlDocDumpMemory |
serial::html::serialize_html |
xmloxide_serialize_html |
xmlC14NDocDumpMemory |
serial::c14n::canonicalize |
xmloxide_canonicalize |
xmlXPathEvalExpression |
xpath::evaluate |
xmloxide_xpath_eval |
xmlValidateDtd |
validation::dtd::validate |
xmloxide_validate_dtd |
xmlRelaxNGValidateDoc |
validation::relaxng::validate |
xmloxide_validate_relaxng |
xmlSchemaValidateDoc |
validation::xsd::validate_xsd |
xmloxide_validate_xsd |
| (Schematron validation) | validation::schematron::validate_schematron |
xmloxide_validate_schematron |
xmlXIncludeProcess |
xinclude::process_xincludes |
xmloxide_process_xincludes |
xmlLoadCatalog |
Catalog::parse |
xmloxide_parse_catalog |
xmlSAX2... callbacks |
sax::SaxHandler trait |
xmloxide_sax_parse |
xmlTextReaderRead |
reader::XmlReader |
xmloxide_reader_read |
xmlCreatePushParserCtxt |
parser::PushParser |
xmloxide_push_parser_new |
xmlParseChunk |
PushParser::push |
xmloxide_push_parser_push |
Thread safety: Unlike libxml2, xmloxide has no global state. Each Document is self-contained and Send + Sync. The FFI layer uses thread-local storage for the last error message — each thread has its own error state. No initialization or cleanup functions are needed.
Fuzzing
xmloxide includes fuzz targets for security testing:
# Install cargo-fuzz (requires nightly)
# Run a fuzz target
Building
Minimum supported Rust version: 1.81
Limitations
- No XML 1.1 — xmloxide implements XML 1.0 (Fifth Edition) only. XML 1.1 is rarely used and not planned.
- No XSLT — XSLT is a separate specification (libxslt) and is out of scope.
- HTML parsers — both an HTML 4.01 parser (matching libxml2's behavior) and a full WHATWG HTML5 parser are provided. The HTML5 parser passes 100% of html5lib-tests.
- Push parser buffers internally — the push/incremental parser API (
PushParser) currently buffers all pushed data and performs the full parse onfinish(), rather than truly streaming like libxml2'sxmlParseChunk. SAX streaming (parse_saxfor XML,html5::sax::parse_html5_saxfor HTML5) is available as an alternative for memory-constrained large-document processing. - XPath
namespace::axis — thenamespace::axis returns the element node when in-scope namespaces match (rather than materializing separate namespace nodes), following the same pattern as the attribute axis.
Contributing
See CONTRIBUTING.md for development setup and guidelines.
Changelog
See CHANGELOG.md for version history.
License
MIT