Obj2XML-rs
High-performance, memory-efficient XML serializer and parser for Python, written in Rust.
A fast, deterministic, streaming-capable JSON↔XML tool with Python ergonomics. obj2xml-rs is a drop-in replacement for
libraries like xmltodict but designed for speed, scalability, and correctness.
It leverages Rust's zero-copy optimizations and streaming capabilities to handle massive datasets without exhausting system memory.
Features
- Blazing Fast: Built on
quick-xmlwith Zero-Copy (Cow<str>) optimizations. 5-15x faster than pure Python. - True Streaming: Supports Python Generators and Iterators. Writes huge XML files item-by-item directly to disk.
- Robust Error Context: Exceptions include the full XML path (e.g.,
Error at root/users/[3]/@id). - Safe: Includes cycle detection to prevent infinite recursion crashes.
- Professional Spec: Supports Namespaces, CDATA, Comments, Processing Instructions, and deterministic attribute sorting.
- Pythonic: Supports
defaulthandlers for custom types (likedatetime), similar tojson.dump.
Installation
Quick Start
1. Unparse (Dict → XML)
=
Output:
Rust
Fast
Safe
2. Parse (XML → Dict)
=
=
# {'root': {'@id': '1', 'item': ['A', 'B']}}
3. Streaming (Low Memory Write)
Generate XML from a generator. Writes to file incrementally.
yield
Specification & Behavior
This section defines how Python structures map to XML.
1. Reserved Keys
The following keys have special meaning in a dictionary:
| Key | Description | Example |
|---|---|---|
| @key | XML Attribute (prefix configurable) | {"@id": 1} → |
| #text | Element text content | {"tag": {"#text": "Hello"}} → Hello |
| #comment | XML Comment | {"#comment": "Note"} → |
| ?key | Processing Instruction | {"?xml-stylesheet": "href..."} → |
| #tail | Text content appearing immediately after the element's closing tag. | {"b": {"#text": "Bold", "#tail": " text"}} → < b>Bold</ b> text |
| cdata | CDATA Wrapper | {"#text": {"cdata": "x<y"}} → |
2. Element Mapping & Lists
- Dict Keys: Map directly to XML Element names.
- Lists: Keys containing a list generate repeated elements with the same name.
# <items><item>1</item><item>2</item></items> - Root Primitives: If the input is a list of primitives, they are wrapped in
item_name.# <n>1</n><n>2</n>
3. Attributes & Sorting
- Keys starting with
attr_prefix(default"@") become attributes. - Values: Any serializable value is accepted. Dicts/Lists in attributes are stringified.
- Sorting: Attributes follow Python insertion order by default. Use
sort_attributes=Truefor deterministic output (attributes sorted lexicographically).
4. Namespaces
Namespaces can be declared in three ways:
- Static (Root Scope): Best practice for clean XML.
# <root xmlns:soap="http://example.com/soap"> ... - Inline Declarations:
- Dynamic Assignment:
# Automatically generates prefixes (ns0, ns1...)
5. Advanced Nodes
- CDATA: Use the
__cdata__key inside a text node. - Comments: Use
#comment. - Processing Instructions: Keys starting with
?.
6. Constraints & Validation policies
- XML Names: No validation of XML name syntax is performed. If you pass
{"<invalid>": 1}, invalid XML will be generated. - Mixed Content: Mixed
#textand child elements are allowed.# Valid: <p>Hello<b>World</b></p> - Root Rules:
full_document=True(default): Requires exactly one root element.full_document=False: Allows multiple roots (XML Fragment).
Error Handling
Errors are actionable and include the full path to the problematic node.
=
Output:
Custom serialization failed: Bad data (at users/[0]/meta/@date)
- Circular References: A
RecursionErroris raised if an object references itself.
API Reference
Unparse (Write)
Parse (Read)
CLI Usage
JSON to XML (Unparse)
# Basic
# Streaming from Pipe
|
XML to JSON (Parse)
# Convert XML file to JSON
# Force specific tags to be lists
Python XML Library Comparison Matrix
| Feature | obj2xml-rs | xmltodict | xmltodict-rs | dicttoxml | quick-xmltodict |
|---|---|---|---|---|---|
| Language | Rust (PyO3) | Python | Rust (PyO3) | Python | Rust (PyO3) |
| Capabilities | Read & Write | Read & Write | Read & Write | Write Only | Read Only |
| Write Speed | High | Low | High | Low | N/A |
| Write Memory Model | Streaming / Zero-Copy | In-Memory Object Graph | In-Memory String | In-Memory String | N/A |
| Stream Writing | Yes (Generators) | No | No | No | N/A |
| Async Support | Yes (asyncio) | No | No | No | N/A |
| Cycle Detection | Yes, detects cycles early andraises path-aware Python exceptions | No — fails with RecursionError | No — causes interpreter crash (SIGSEGV) on cyclic input | No — fails with RecursionError | N/A |
| Error Context | Path-Aware | Generic | Generic | Generic | N/A |
| Attributes | Deterministicc | Insertion Order | Insertion Order | Non-deterministic unless pre-sorted | N/A |
| Namespaces | Yes | Yes | Yes | Limited | N/A |
📄 License
This project is licensed under the Apache License 2.0.