obj2xml-rs 0.2.0

High-performance, memory-efficient XML to Dict, Dict to XML for Python, written in Rust.
Documentation

Obj2XML-rs

PyPI version Python Versions

High-performance, memory-efficient XML serializer and parser for Python, written in Rust.

A fast, deterministic, streaming-capable JSON↔XML tool with Python ergonomics. obj2xml-rs is a drop-in replacement for libraries like xmltodict but designed for speed, scalability, and correctness. It leverages Rust's zero-copy optimizations and streaming capabilities to handle massive datasets without exhausting system memory.

Features

  • Blazing Fast: Built on quick-xml with Zero-Copy (Cow<str>) optimizations. 5-15x faster than pure Python.
  • True Streaming: Supports Python Generators and Iterators. Writes huge XML files item-by-item directly to disk.
  • Robust Error Context: Exceptions include the full XML path (e.g., Error at root/users/[3]/@id).
  • Safe: Includes cycle detection to prevent infinite recursion crashes.
  • Professional Spec: Supports Namespaces, CDATA, Comments, Processing Instructions, and deterministic attribute sorting.
  • Pythonic: Supports default handlers for custom types (like datetime), similar to json.dump.

Installation

pip install obj2xml-rs

Quick Start

1. Unparse (Dict → XML)

import obj2xml_rs

data = {
    "root": {
        "@id": "123",
        "name": "Rust",
        "features": ["Fast", "Safe"]
    }
}
print(obj2xml_rs.unparse(data, pretty=True))

Output:

<?xml version="1.0" encoding="utf-8"?>
<root id="123">
  <name>Rust</name>
  <features>Fast</features>
  <features>Safe</features>
</root>

2. Parse (XML → Dict)

xml = '<root id="1"><item>A</item><item>B</item></root>'
data = obj2xml_rs.parse(xml)
print(data)
# {'root': {'@id': '1', 'item': ['A', 'B']}}

3. Streaming (Low Memory Write)

Generate XML from a generator. Writes to file incrementally.

def huge_data():
    for i in range(1_000_000):
        yield {"row": {"id": i, "val": f"data_{i}"}}

obj2xml_rs.unparse(
    huge_data(), 
    output="large.xml", 
    streaming=True, 
    item_name="row"
)

Specification & Behavior

This section defines how Python structures map to XML.

1. Reserved Keys

The following keys have special meaning in a dictionary:

Key Description Example
@key XML Attribute (prefix configurable) {"@id": 1} →
#text Element text content {"tag": {"#text": "Hello"}} → Hello
#comment XML Comment {"#comment": "Note"} →
?key Processing Instruction {"?xml-stylesheet": "href..."} →
#tail Text content appearing immediately after the element's closing tag. {"b": {"#text": "Bold", "#tail": " text"}} → < b>Bold</ b> text
cdata CDATA Wrapper {"#text": {"cdata": "x<y"}} →

2. Element Mapping & Lists

  • Dict Keys: Map directly to XML Element names.
  • Lists: Keys containing a list generate repeated elements with the same name.
    {"items": {"item": [1, 2]}} 
    # <items><item>1</item><item>2</item></items>
    
  • Root Primitives: If the input is a list of primitives, they are wrapped in item_name.
    unparse([1, 2], item_name="n", full_document=False)
    # <n>1</n><n>2</n>
    

3. Attributes & Sorting

  • Keys starting with attr_prefix (default "@") become attributes.
  • Values: Any serializable value is accepted. Dicts/Lists in attributes are stringified.
  • Sorting: Attributes follow Python insertion order by default. Use sort_attributes=True for deterministic output (attributes sorted lexicographically).

4. Namespaces

Namespaces can be declared in three ways:

  1. Static (Root Scope): Best practice for clean XML.
    unparse(data, namespaces={"soap": "http://example.com/soap"})
    # <root xmlns:soap="http://example.com/soap"> ...
    
  2. Inline Declarations:
    {"root": {"@xmlns:x": "urn:x", "x:child": 1}}
    
  3. Dynamic Assignment:
    {"tag": {"@ns": "urn:auto"}}
    # Automatically generates prefixes (ns0, ns1...)
    

5. Advanced Nodes

  • CDATA: Use the __cdata__ key inside a text node.
  • Comments: Use #comment.
  • Processing Instructions: Keys starting with ?.
    {"root": {"?xml-stylesheet": 'type="text/xsl" href="style.xsl"'}}
    

6. Constraints & Validation policies

  • XML Names: No validation of XML name syntax is performed. If you pass {"<invalid>": 1}, invalid XML will be generated.
  • Mixed Content: Mixed #text and child elements are allowed.
    {"p": {"#text": "Hello", "b": "World"}} 
    # Valid: <p>Hello<b>World</b></p>
    
  • Root Rules:
  • full_document=True (default): Requires exactly one root element.
  • full_document=False: Allows multiple roots (XML Fragment).

Error Handling

Errors are actionable and include the full path to the problematic node.

def fail_serializer(obj):
    raise ValueError("Bad data")

data = {"users": [{"name": "Alice", "meta": {"@date": object()}}]}

try:
    unparse(data, default=fail_serializer)
except ValueError as e:
    print(e)

Output:

Custom serialization failed: Bad data (at users/[0]/meta/@date)
  • Circular References: A RecursionError is raised if an object references itself.

API Reference

Unparse (Write)

def unparse(
   input: Union[Dict, Iterable, Any],
   *,
   output: Optional[Union[str, IO]] = None,
   encoding: str = "utf-8",
   full_document: bool = True,
   attr_prefix: str = "@",
   cdata_key: str = "#text",
   pretty: bool = False,
   indent: str = "  ",
   compat: str = "native",
   streaming: bool = False,
   default: Optional[Callable[[Any], str]] = None,
   item_name: str = "item",
   sort_attributes: bool = False,
   namespaces: Optional[Dict[str, str]] = None
) -> str:

Parse (Read)

def parse(
    xml_input: Union[str, bytes, IO],
    *,
    encoding: Optional[str] = None,
    attr_prefix: str = "@",
    cdata_key: str = "#text",
    force_cdata: bool = False,
    process_namespaces: bool = False,
    namespace_separator: str = ":",
    strip_whitespace: bool = True,
    force_list: Optional[Iterable[str]] = None,
    process_comments: bool = False
) -> Dict[str, Any]:

CLI Usage

JSON to XML (Unparse)

# Basic
python -m obj2xml_rs unparse input.json -o output.xml --pretty

# Streaming from Pipe
cat huge.json | python -m obj2xml_rs unparse --stream --item-name "record" > out.xml

XML to JSON (Parse)

# Convert XML file to JSON
python -m obj2xml_rs parse data.xml -o data.json --pretty

# Force specific tags to be lists
python -m obj2xml_rs parse data.xml --force-list item user

Python XML Library Comparison Matrix

Feature obj2xml-rs xmltodict xmltodict-rs dicttoxml quick-xmltodict
Language Rust (PyO3) Python Rust (PyO3) Python Rust (PyO3)
Capabilities Read & Write Read & Write Read & Write Write Only Read Only
Write Speed High Low High Low N/A
Write Memory Model Streaming / Zero-Copy In-Memory Object Graph In-Memory String In-Memory String N/A
Stream Writing Yes (Generators) No No No N/A
Async Support Yes (asyncio) No No No N/A
Cycle Detection Yes, detects cycles early andraises path-aware Python exceptions No — fails with RecursionError No — causes interpreter crash (SIGSEGV) on cyclic input No — fails with RecursionError N/A
Error Context Path-Aware Generic Generic Generic N/A
Attributes Deterministicc Insertion Order Insertion Order Non-deterministic unless pre-sorted N/A
Namespaces Yes Yes Yes Limited N/A

📄 License

This project is licensed under the Apache License 2.0.