fastxml 0.6.2

A fast, memory-efficient XML library with XPath and XSD validation support
Documentation

fastxml

CI Crates.io docs.rs License

A fast, memory-efficient XML library for Rust with XPath and schema validation support. Designed for processing large XML documents like CityGML files used in PLATEAU.

Features

  • 🦀 Pure Rust — No C dependencies, no unsafe code
  • 🔄 libxml Compatible — Consistent parsing/XPath results
  • 💾 Memory Efficient — Parse and validate gigabyte-scale XML with ~1 MB memory footprint
  • 🔍 Full XPath 1.0 — Complete XPath 1.0 support with namespace handling
  • 📋 XSD Support — Schema parsing with import resolution, built-in GML types
  • Async Support — Async schema fetching and resolution with tokio

⚠️ Early Development (v0.x): API may change. Limited production experience. Not recommended for business-critical systems. Use at your own risk.

Performance

Benchmark on PLATEAU DEM GML (907 MB, 31M nodes) — benchmark code:

Parse only:

Mode Time Throughput Memory
libxml DOM 7.11s 128 MB/s 4.19 GB
fastxml DOM 11.50s 79 MB/s 951 MB
fastxml Streaming 9.86s 92 MB/s ~1 MB

Parse + Schema Validation:

Mode Time Throughput Memory
libxml DOM + validate 11.10s 82 MB/s 3.64 GB
fastxml DOM + validate 57.20s 16 MB/s 1.96 GB
fastxml Streaming 22.33s 41 MB/s ~1 MB
  • DOM: 4.4x less memory than libxml
  • Streaming: ~41 MB/s consistent throughput with minimal memory (~1 MB regardless of file size)

Installation

[dependencies]
fastxml = "0.6"

Cargo Features

Feature Description
ureq Sync HTTP client for schema fetching (recommended)
tokio Async HTTP client for schema fetching (reqwest + tokio)
async-trait Async trait support for custom implementations
compare-libxml Enable libxml2 comparison tests
# Recommended: sync schema fetching
fastxml = { version = "0.6", features = ["ureq"] }

# Async schema fetching
fastxml = { version = "0.6", features = ["tokio"] }

Schema Fetchers

Fetcher Description
FileFetcher Local filesystem
UreqFetcher Sync HTTP (requires ureq)
ReqwestFetcher Async HTTP (requires tokio)
DefaultFetcher File + sync HTTP combined (requires ureq for HTTP)
AsyncDefaultFetcher File + async HTTP combined (requires tokio)

Traits:

Trait Description
SchemaFetcher Sync fetcher trait
AsyncSchemaFetcher Async fetcher trait (requires tokio)
use fastxml::schema::{DefaultFetcher, SchemaFetcher};

let fetcher = DefaultFetcher::with_base_dir("/path/to/schemas");
let result = fetcher.fetch("schema.xsd")?;

Quick Start

DOM Parsing

use fastxml::{parse, evaluate};

let xml = r#"<root><item id="1">Hello</item><item id="2">World</item></root>"#;

let doc = parse(xml.as_bytes())?;
let result = evaluate(&doc, "//item")?;
for node in result.into_nodes() {
    println!("{}: {}", node.get_attribute("id").unwrap(), node.get_content().unwrap());
}

Streaming Parser

Process large files with minimal memory:

use fastxml::event::{StreamingParser, XmlEvent, XmlEventHandler};
use std::io::BufReader;
use std::fs::File;

struct Counter { count: usize }

impl XmlEventHandler for Counter {
    fn handle(&mut self, event: &XmlEvent) -> fastxml::error::Result<()> {
        if let XmlEvent::StartElement { .. } = event {
            self.count += 1;
        }
        Ok(())
    }
}

let file = File::open("large_file.xml")?;
let mut parser = StreamingParser::new(BufReader::new(file));
parser.add_handler(Box::new(Counter { count: 0 }));
parser.parse()?;

Stream Transform

Transform XML with XPath-based element selection:

use fastxml::transform::StreamTransformer;

let xml = r#"<root><item id="1">A</item><item id="2">B</item></root>"#;

// Modify elements (supports multiple handlers)
let result = StreamTransformer::new(xml)
    .on("//item[@id='2']", |node| node.set_attribute("modified", "true"))
    .run()?
    .to_string()?;

// Extract data (single XPath)
let ids: Vec<String> = StreamTransformer::new(xml)
    .collect("//item", |node| node.get_attribute("id").unwrap_or_default())?;

// Extract data from multiple XPaths in a single pass
let (ids, contents): (Vec<String>, Vec<String>) = StreamTransformer::new(xml)
    .collect_multi((
        ("//item", |node| node.get_attribute("id").unwrap_or_default()),
        ("//item", |node| node.get_content().unwrap_or_default()),
    ))?;

// Iterate for side effects (no output transformation)
let mut ids = Vec::new();
StreamTransformer::new(xml)
    .on("//item", |node| {
        ids.push(node.get_attribute("id").unwrap_or_default());
    })
    .for_each()?;

Auto-detect Namespaces

Extract namespace declarations from the root element without DOM parsing:

let xml = r#"<root xmlns:gml="http://www.opengis.net/gml"><gml:point/></root>"#;

StreamTransformer::new(xml)
    .with_root_namespaces()?  // Auto-registers namespaces from root element
    .on("//gml:point", |node| node.set_attribute("found", "true"))
    .run()?;

Namespace URI Matching

Match elements by namespace URI instead of prefix (useful when different prefixes map to the same URI):

// Matches both gml:feature and g:feature if they have the same namespace URI
StreamTransformer::new(xml)
    .namespace("gml", "http://www.opengis.net/gml")
    .on("//*[namespace-uri()='http://www.opengis.net/gml'][local-name()='feature']", |node| {
        // Matches any prefix that maps to this URI
    })
    .run()?;

Parent Context Access

Access ancestor elements' information during streaming transformation:

StreamTransformer::new(xml)
    .on_with_context("//item", |node, ctx| {
        // Get parent element info
        if let Some(parent) = ctx.parent() {
            node.set_attribute("parent_name", &parent.name);
        }

        // Get path-based ID (e.g., "root/items/item[2]")
        let path = ctx.path_id();
        node.set_attribute("path", &format!("{}/item[{}]", path, ctx.position()));
    })
    .run()?;

XPath Streamability Check

Check if an XPath can be processed in a single streaming pass:

use fastxml::transform::{is_streamable, analyze_xpath_str, XPathAnalysis};

// Quick check
if is_streamable("//item[@id='1']") {
    println!("Single-pass streaming OK");
}

// Detailed analysis
match analyze_xpath_str("//item[last()]")? {
    XPathAnalysis::Streamable(_) => println!("Streamable"),
    XPathAnalysis::NotStreamable(reason) => {
        println!("Not streamable: {}", reason);
        // Output: "Not streamable: uses last() function which requires knowing total count"
    }
}

Fallback Control

By default, non-streamable XPath expressions return an error. Enable fallback for two-pass processing:

// Default: error on non-streamable XPath
let result = StreamTransformer::new(xml)
    .on("//item[last()]", |_| {})
    .run();
// => Err(NotStreamable { ... })

// Enable fallback (loads entire document into memory)
let result = StreamTransformer::new(xml)
    .allow_fallback()
    .on("//item[last()]", |_| {})
    .run()?;

Async Schema Resolution

Parse XSD schemas with async import/include resolution (requires tokio feature):

use fastxml::schema::{
    AsyncDefaultFetcher, InMemoryStore,
    parse_xsd_with_imports_async,
};

#[tokio::main]
async fn main() -> fastxml::error::Result<()> {
    let xsd_content = std::fs::read("schema.xsd")?;

    // Create async fetcher and cache store
    let fetcher = AsyncDefaultFetcher::new()?;
    let store = InMemoryStore::new();

    // Parse schema with async import resolution
    let schema = parse_xsd_with_imports_async(
        &xsd_content,
        "http://example.com/schema.xsd",
        &fetcher,
        &store,
    ).await?;

    println!("Parsed {} types", schema.types.len());
    Ok(())
}

The async resolver:

  • Fetches imported schemas asynchronously via HTTP
  • Caches fetched schemas in the provided store
  • Resolves nested imports (A → B → C)
  • Detects circular dependencies

See examples/async_schema_resolution.rs for more examples.

Schema Validation

DOM Validation

use fastxml::{parse, validate_document_by_schema};

let doc = parse(std::fs::read("document.xml")?.as_slice())?;
let errors = validate_document_by_schema(&doc, "schema.xsd".to_string())?;

if errors.is_empty() {
    println!("Valid!");
}

Streaming Validation

Validate during parsing with minimal memory:

use fastxml::schema::StreamValidator;
use std::sync::Arc;

let schema = Arc::new(fastxml::schema::parse_xsd(&std::fs::read("schema.xsd")?)?);
let reader = std::io::BufReader::new(file);

let errors = StreamValidator::new(schema)
    .with_max_errors(100)
    .validate(reader)?;

Auto-detect Schema

Fetch schemas from xsi:schemaLocation automatically (requires ureq feature):

use fastxml::{parse, validate_with_schema_location};

let doc = parse(xml_bytes)?;
let errors = validate_with_schema_location(&doc)?;

For streaming:

use fastxml::streaming_validate_with_schema_location;

let errors = streaming_validate_with_schema_location(reader)?;

Async Validation

Validate with async schema fetching (requires tokio feature):

use fastxml::{parse, validate_with_schema_location_async};

#[tokio::main]
async fn main() -> fastxml::error::Result<()> {
    let doc = parse(xml_bytes)?;
    let errors = validate_with_schema_location_async(&doc).await?;
    Ok(())
}

Or get the compiled schema for reuse:

use fastxml::get_schema_from_schema_location_async;

let schema = get_schema_from_schema_location_async(&xml_bytes).await?;

Validation Errors

use fastxml::ErrorLevel;

for error in &errors {
    match error.level {
        ErrorLevel::Warning => print!("[WARN] "),
        ErrorLevel::Error => print!("[ERROR] "),
        ErrorLevel::Fatal => print!("[FATAL] "),
    }
    if let Some(line) = error.line {
        print!("line {}: ", line);
    }
    println!("{}", error.message);
}

XPath

Basic Usage

use fastxml::{parse, evaluate};

let doc = parse(xml)?;
let result = evaluate(&doc, "//item[@id='1']/text()")?;

With Namespaces

let xml = r#"
<core:CityModel xmlns:core="http://www.opengis.net/citygml/2.0"
                xmlns:bldg="http://www.opengis.net/citygml/building/2.0">
    <bldg:Building gml:id="bldg_001">
        <bldg:measuredHeight>25.5</bldg:measuredHeight>
    </bldg:Building>
</core:CityModel>"#;

let doc = parse(xml.as_bytes())?;
let buildings = evaluate(&doc, "//bldg:Building")?;

Supported Specifications

XPath 1.0

Feature Examples
Paths /root/child, //element, //*
Predicates [@id='1'], [position()=1], [name()='foo']
Axes ancestor::, following-sibling::, namespace::
Operators and, or, not(), =, !=, <, >, +, -, *, div, mod
Functions count(), contains(), string(), number(), sum(), etc.
Namespaces //ns:element, namespace::*
Variables $var
Union `//a

XSD Schema

Feature Support
Element/attribute definitions
Complex types (sequence/choice/all)
Simple types (restriction/list/union)
Type inheritance
Facets
Attribute/model groups
import/include/redefine
Built-in XSD and GML types
Identity constraints (unique/key/keyref)
Substitution groups

Not Supported

  • XQuery, XSLT, XInclude
  • DTD validation
  • XML Signature/Encryption
  • Catalog support
  • Full entity expansion

Development

cargo test                              # Run tests
cargo test --features tokio             # With async tests
cargo test --features compare-libxml    # With libxml comparison
cargo bench                             # Benchmarks

Examples

# Async schema resolution
cargo run --example async_schema_resolution --features tokio

# Schema validation
cargo run --example schema_validation --features ureq

# Benchmark CLI
cargo run --release --example bench -- ./file.xml
cargo run --release --features ureq --example bench -- ./file.xml --validate

License

MIT OR Apache-2.0