xml-disassembler
Disassemble large XML files into smaller files and reassemble the original XML. Preserves the XML declaration, root namespace, and element order so that a full round-trip (disassemble → reassemble) reproduces the original file contents.
Note: This is a Rust implementation of the original TypeScript xml-disassembler.
Table of contents
- Quick start
- Features
- Installation
- Usage
- Disassembly strategies
- Ignore file
- Logging
- XML parser
- Testing
- License
- Contribution
Quick start
# Disassemble: one XML → many small files
# Reassemble: many small files → one XML
Features
- Disassemble – Split a single XML file (or directory of XML files) into many smaller files, grouped by structure.
- Reassemble – Merge disassembled files back into the original XML. Uses the XML declaration and root attributes from the disassembled files, with sensible defaults when missing.
- Multiple formats – Output (and reassemble from) XML, INI, JSON, JSON5, TOML, or YAML.
- Strategies –
unique-id(one file per nested element) orgrouped-by-tag(one file per tag). - Ignore rules – Exclude paths via a
.xmldisassemblerignorefile (same style as.gitignore). - Round-trip safe – Disassembled output includes the original XML declaration and
xmlnson the root; reassembly preserves order and content so the result matches the source. - Library API – Use
DisassembleXmlFileHandler,ReassembleXmlFileHandler,parse_xml, andbuild_xml_stringfrom your own Rust code.
Installation
From crates.io
- Install the Rust toolchain (rust-lang.org/tools/install).
- Run:
From source
The binary will be at target/release/xml-disassembler (or xml-disassembler.exe on Windows).
Usage
CLI
# Disassemble an XML file or directory (output written alongside the source)
# Reassemble a disassembled directory (writes one XML file next to the directory)
# Parse and rebuild a single XML file (useful for testing the parser)
Disassemble options
| Option | Description | Default |
|---|---|---|
--unique-id-elements <list> |
Comma-separated element names used to derive filenames for nested elements | (none) |
--prepurge |
Remove existing disassembly output before running | false |
--postpurge |
Delete original file/directory after disassembling | false |
--ignore-path <path> |
Path to the ignore file | .xmldisassemblerignore |
--format <fmt> |
Output format: xml, ini, json, json5, toml, yaml | xml |
--strategy <name> |
unique-id or grouped-by-tag | unique-id |
Reassemble options
| Option | Description | Default |
|---|---|---|
<extension> |
File extension/suffix for the rebuilt XML (e.g. permissionset-meta.xml) | xml |
--postpurge |
Delete disassembled directory after successful reassembly | false |
Examples:
# Creates fixtures/general/HR_Admin/ with disassembled files
# Creates fixtures/general/HR_Admin.xml
As a library
use ;
async
Disassembly strategies
unique-id (default)
Each nested element is written to its own file, named by a unique identifier (or an 8-character SHA-256 hash if no UID is available). Leaf content stays in a file named after the original XML.
Best for fine-grained diffs and version control.
- UID-based layout – When you provide
--unique-id-elements(e.g.name,id,apexClass), nested elements are named by the first matching field value. For Salesforce flows, a typical list might be:apexClass,name,object,field,layout,actionName,targetReference,assignToReference,choiceText,promptText. Using unique ID elements also ensures predictable sorting in the reassembled output. - Hash-based layout – When no unique ID is found, elements are named with an 8-character hash of their content (e.g.
419e0199.botMlDomain-meta.xml).
grouped-by-tag
All nested elements with the same tag go into one file per tag. Leaf content stays in the base file named after the original XML.
Best for fewer files and quick inspection.
Reassembly preserves element content and structure; element order may differ (especially with TOML).
Ignore file
Exclude files or directories from disassembly using an ignore file (default: .xmldisassemblerignore). The Rust implementation uses the ignore crate with .gitignore-style syntax.
Place the file in the directory you run disassembly from (or specify a path with --ignore-path).
Example .xmldisassemblerignore:
# Skip these paths
**/secret.xml
**/generated/
Logging
Logging uses the log crate with env_logger. Control verbosity via the RUST_LOG environment variable.
# Default: only errors
# Verbose logging (debug level)
RUST_LOG=debug
# Log only xml_disassembler crate
RUST_LOG=xml_disassembler=debug
When using the library, call env_logger::init() early in your binary (as in the CLI) and set RUST_LOG as needed.
XML parser
Parsing is done with quick-xml, with support for:
- CDATA – Preserved and output as
#cdatain the parsed structure. - Comments – Preserved in the XML output.
- Attributes – Stored with
@prefix (e.g.@version,@encoding).
Testing
Run all tests:
- Unit tests – In-module tests for parsers, builders, and merge logic (e.g.
strip_whitespace,merge_xml_elements,extract_root_attributes,parse_xml). - Integration test –
tests/disassemble_reassemble.rsruns a full round-trip: disassemble a fixture XML, reassemble it, and assert the reassembled content equals the original file.
License
Licensed under MIT.
Contribution
See CONTRIBUTING.md.