fastxml
A fast, memory-efficient XML library for Rust with XPath and schema validation support. Designed for processing large XML documents like CityGML files used in PLATEAU.
Features
- 🦀 Pure Rust — No C dependencies, no unsafe code
- 🔄 libxml Compatible — Consistent parsing/XPath results
- 💾 Memory Efficient — Parse and validate gigabyte-scale XML with ~1 MB memory footprint
- 🔍 Full XPath 1.0 — Complete XPath 1.0 support with namespace handling
- 📋 XSD Support — Schema parsing with import resolution, built-in GML types
- ⚡ Async Support — Async schema fetching and resolution with tokio
⚠️ Early Development (v0.x): API may change. Limited production experience. Not recommended for business-critical systems. Use at your own risk.
Performance
Benchmark results as of v0.8.0 on PLATEAU DEM GML (907 MB, 31M nodes) — benchmark code:
Parse only:
| Mode | Time | Throughput | Memory |
|---|---|---|---|
| libxml DOM | 7.11s | 128 MB/s | 4.19 GB |
| fastxml DOM | 8.0s | 114 MB/s | 805 MB |
| fastxml Streaming | 4.75s | 191 MB/s | ~1 MB |
Parse + Schema Validation:
| Mode | Time | Throughput | Memory |
|---|---|---|---|
| libxml DOM + validate | 11.10s | 82 MB/s | 3.64 GB |
| fastxml DOM + validate | 38.2s | 24 MB/s | 1.96 GB |
| fastxml Streaming + validate | 15.9s | 57 MB/s | ~25 MB |
- DOM: 5.2x less memory than libxml
- Streaming parse + validate: 57 MB/s throughput with ~25 MB memory regardless of file size
Installation
[]
= "0.9"
Cargo Features
| Feature | Description |
|---|---|
ureq |
Sync HTTP client for schema fetching (recommended) |
tokio |
Async HTTP client for schema fetching (reqwest + tokio) |
async-trait |
Async trait support for custom implementations |
compare-libxml |
Enable libxml2 comparison tests |
# Recommended: sync schema fetching
= { = "0.9", = ["ureq"] }
# Async schema fetching
= { = "0.9", = ["tokio"] }
Schema Fetchers
| Fetcher | Description |
|---|---|
FileFetcher |
Local filesystem |
UreqFetcher |
Sync HTTP (requires ureq) |
ReqwestFetcher |
Async HTTP (requires tokio) |
DefaultFetcher |
File + sync HTTP combined with built-in caching (requires ureq for HTTP) |
AsyncDefaultFetcher |
File + async HTTP combined with built-in caching (requires tokio) |
CachingFetcher |
Wraps any sync fetcher with in-memory caching |
AsyncCachingFetcher |
Wraps any async fetcher with in-memory caching (requires tokio) |
FileCachingFetcher |
Wraps any sync fetcher with file-based caching (temp directory) |
AsyncFileCachingFetcher |
Wraps any async fetcher with file-based caching (requires tokio) |
Traits:
| Trait | Description |
|---|---|
SchemaFetcher |
Sync fetcher trait |
AsyncSchemaFetcher |
Async fetcher trait (requires tokio) |
use ;
let fetcher = with_base_dir;
let result = fetcher.fetch?;
Quick Start
DOM Parsing
use ;
let xml = r#"<root><item id="1">Hello</item><item id="2">World</item></root>"#;
let doc = from.parse?;
for node in doc.query_nodes?
Parser::from accepts &str or &[u8]; use Parser::from_reader(reader) to parse from any BufRead, and .options(ParserOptions { .. }) to configure parsing.
Reusable XPath Queries
evaluate(&doc, "…") re-parses the expression on every call. To run the same
expression against many documents, compile it once with Query:
use ;
let query = compile?;
let a = from.parse?;
let b = from.parse?;
assert_eq!;
assert_eq!;
Namespaces declared on each document's root are registered automatically; add
extra bindings with .namespace(prefix, uri). Use .eval(&doc) for a typed
XPathResult, or .eval_from(&doc, &node) to start from a context node. A
compiled Query (and StreamableQuery) renders back to an equivalent XPath
string via to_string().
The QueryExt trait adds method-call ergonomics on the document itself. Its
argument is anything that is AsQuery, so a string and a pre-compiled Query
are interchangeable:
use ;
let doc = from.parse?;
// String: compiled on the fly.
assert_eq!;
let n = doc.query?.to_number;
// Pre-compiled query: reused without re-parsing.
let q = compile?;
assert_eq!;
Serializing to XML
Printer turns a parsed document or node back into XML:
use ;
let doc = from.parse?;
let xml = from.to_string?; // whole document, with <?xml ?>
let pretty = from.pretty.to_string?; // indented
// Stream straight to any writer, no intermediate String:
from.write_to?;
Printer::from accepts &XmlDocument, &XmlNode, or &XmlRoNode (a document
emits an XML declaration by default, a single node does not). Builders:
.pretty() / .indent(s) / .declaration(bool) / .encoding(s). Terminals:
.to_string() / .into_bytes() / .write_to(w).
Streaming Parser
For a quick, buffered list of events:
use Parser;
for event in from.events?
To process large files with constant memory, use for_each_event — the callback is invoked as each event is read, nothing is buffered, and it may capture and mutate local state:
use Parser;
use XmlEvent;
use BufReader;
use File;
let file = open?;
let mut elements = 0;
from_reader.for_each_event?;
println!;
Stream Transform
Transform XML with XPath-based element selection:
use Transformer;
let xml = r#"<root><item id="1">A</item><item id="2">B</item></root>"#;
// Modify elements (supports multiple handlers), render the result as a String
let result = from
.on
.to_string?;
// Iterate for side effects (no output transformation)
let mut ids = Vecnew;
from
.on
.for_each?;
Terminals: to_string(), into_bytes(), write_to(&mut writer), and for_each().
on / on_with_context / collect accept either a string (analyzed when the
transform runs) or a pre-compiled StreamableQuery. Compiling validates
streamability up front, so a non-streamable pattern is rejected immediately
rather than failing mid-run:
use ;
let q = compile?; // Ok: streamable
assert!; // rejected up front
let result = from
.on
.to_string?;
(Query is the analogue for evaluation; StreamableQuery is for transforms.)
A StreamableQuery is a subset of a full Query, so it converts freely to one
(Query::from(&sq), or doc.query(&sq)); the reverse is fallible
(StreamableQuery::try_from(&query), which rejects non-streamable expressions).
Reader-based Transform (Large Files)
For large XML files, use Transformer::from_reader to avoid loading the entire file into memory. It reads from any BufRead source and writes results incrementally:
use Transformer;
use ;
use File;
let reader = new;
let mut output = new;
// Transform and write to output (returns the number of matched elements)
let count = from_reader
.on
.write_to?;
println!;
// Or iterate for side effects only (no output)
let reader = new;
let mut ids = Vecnew;
from_reader
.on
.for_each?;
Advanced transforms
These richer operations are available for in-memory input (Transformer::from): single-pass data extraction, multi-XPath collection, parent-context access, root-namespace auto-detection, and fallback for non-streamable XPath. (On Transformer::from_reader they return an error, since they need random access.)
use Transformer;
let xml = r#"<root><item id="1">A</item><item id="2">B</item></root>"#;
// Extract data (single XPath)
let ids: = from
.collect?;
// Extract from multiple XPaths in a single pass
let : = from
.collect_multi?;
Auto-detect Namespaces
Extract namespace declarations from the root element without DOM parsing:
let xml = r#"<root xmlns:gml="http://www.opengis.net/gml"><gml:point/></root>"#;
from
.with_root_namespaces? // Auto-registers namespaces from root element
.on
.to_string?;
Namespace URI Matching
Match elements by namespace URI instead of prefix (useful when different prefixes map to the same URI):
// Matches both gml:feature and g:feature if they have the same namespace URI
from
.namespace
.on
.to_string?;
Parent Context Access
Access ancestor elements' information during streaming transformation:
from
.on_with_context
.to_string?;
XPath Streamability Check
Check if an XPath can be processed in a single streaming pass:
use ;
// Quick check
if is_streamable
// Detailed analysis
match analyze_xpath_str?
Fallback Control
By default, non-streamable XPath expressions return an error. Enable fallback for two-pass processing:
// Default: error on non-streamable XPath
let result = from
.on
.to_string;
// => Err(NotStreamable { ... })
// Enable fallback (loads entire document into memory)
let result = from
.allow_fallback
.on
.to_string?;
Async Schema Resolution
Parse XSD schemas with async import/include resolution (requires tokio feature):
use ;
async
Schema::builder() takes one or more .add(uri, bytes) sources; finish with .resolve() (no network), .resolve_with(&fetcher), or .resolve_with_async(&fetcher).
The async resolver:
- Fetches imported schemas asynchronously via HTTP
- Resolves nested imports (A → B → C)
- Detects circular dependencies
See examples/async_schema_resolution.rs for more examples.
Schema Validation
All validation goes through one Validator front door: the input type selects the engine (&XmlDocument → DOM, &str/&[u8]/reader → streaming), .schema(..) supplies an explicit schema (or it is resolved from xsi:schemaLocation), and run() returns a Report.
A Schema is built with Schema::from_xsd(bytes), Schema::builtin(), or Schema::builder().add(uri, bytes).resolve()?.
DOM Validation
use Parser;
use ;
let doc = from.parse?;
let schema = from_xsd?;
let report = from.schema.run?;
if report.is_valid
Streaming Validation
Validate during parsing with minimal memory:
use ;
use Arc;
let schema = new;
let reader = new;
let report = from_reader
.schema // share one schema across many validations
.max_errors
.run?;
Auto-detect Schema
Omit .schema(..) and the schema is resolved from the document's xsi:schemaLocation, using the default fetcher (requires the ureq feature):
use ;
let doc = from.parse?;
let report = from.run?;
For streaming, the schema is fetched lazily on the first element:
use Validator;
let report = from_reader.run?;
To supply a custom fetcher, use .run_with(fetcher) instead of .run().
Async Validation
Validate with async schema fetching (requires tokio feature) via run_async() (default fetcher) or run_async_with(&fetcher):
use ;
async
Validation Errors
use ErrorLevel;
// `report` is the value returned by `Validator::…::run()`
for error in report.errors
XPath
Basic Usage
use ;
let doc = from.parse?;
let result = doc.query?;
With Namespaces
let xml = r#"
<core:CityModel xmlns:core="http://www.opengis.net/citygml/2.0"
xmlns:bldg="http://www.opengis.net/citygml/building/2.0">
<bldg:Building gml:id="bldg_001">
<bldg:measuredHeight>25.5</bldg:measuredHeight>
</bldg:Building>
</core:CityModel>"#;
let doc = from.parse?;
let buildings = doc.query_nodes?;
libxml Compatibility
For migrating from libxml, the fastxml::compat module provides free functions
that mirror libxml's shape (evaluate, create_context, get_root_node,
node_to_xml_string, find_nodes_by_xpath, …). They are thin wrappers over the
modern front doors — prefer Parser / Query / QueryExt / Printer for new
code.
use Parser;
use ;
let doc = from.parse?;
let root = get_root_node?; // modern: doc.get_root_element()
let items = evaluate?; // modern: doc.query("//item")
See examples/ (query, printer, compat, dom_parsing, …) for runnable
demonstrations of both the modern and compatibility APIs.
Supported Specifications
XPath 1.0
| Feature | Examples |
|---|---|
| Paths | /root/child, //element, //* |
| Predicates | [@id='1'], [position()=1], [name()='foo'] |
| Axes | ancestor::, following-sibling::, namespace:: |
| Operators | and, or, not(), =, !=, <, >, +, -, *, div, mod |
| Functions | count(), contains(), string(), number(), sum(), etc. |
| Namespaces | //ns:element, namespace::* |
| Variables | $var |
| Union | `//a |
XSD Schema
| Feature | Support |
|---|---|
| Element/attribute definitions | ✅ |
| Complex types (sequence/choice/all) | ✅ |
| Simple types (restriction/list/union) | ✅ |
| Type inheritance | ✅ |
| Facets | ✅ |
| Attribute/model groups | ✅ |
| import/include/redefine | ✅ |
| Built-in XSD and GML types | ✅ |
| Identity constraints (unique/key/keyref) | ✅ |
| Substitution groups | ✅ |
Not Supported
- XQuery, XSLT, XInclude
- DTD validation
- XML Signature/Encryption
- Catalog support
- Full entity expansion
Conformance
Conformance test results as of v0.8.2. See conformance/ for details.
| Test Suite | Category | Pass Rate |
|---|---|---|
| W3C XML | valid documents | 89.9% |
| W3C XML | invalid documents | 91.2% |
| W3C XSD | schema compilation | 96.8% |
| W3C XSD | instance validation | 70.3% |
# Run conformance tests (requires test data download)
Development
# Validate XML files against XSD schema
# Benchmarks with an external xml file
License
MIT OR Apache-2.0