xml-3dm
A Rust implementation of the 3DM (3-way Diff/Merge) algorithm for structure-aware XML differencing and merging.
Overview
xml-3dm is a library for performing 3-way merges on XML documents. The library operates on XML tree structure and handles the following scenarios:
- Merging changes when one branch moves content and another edits it
- Detecting and preserving copy operations across the tree
- Resolving conflicts at the element and attribute level
- Generating compact diff representations with copy detection
This is a Rust port of the original 3DM tool created by Tancred Lindholm as part of his Master's thesis at Helsinki University of Technology.
Features
- Structure-aware merging: Understands XML tree structure, not just text lines
- Heuristic matching: Uses Q-gram distance and tree structure to match corresponding nodes
- Copy detection: Identifies copied subtrees and encodes them efficiently
- Conflict resolution: Provides detailed conflict and warning logs
- Full XML support: Handles elements, attributes, text nodes, and mixed content
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Usage
3-Way Merge
use ;
use io;
// Parse the three XML files
let base_parser = new;
let branch_parser = new;
let base = base_parser.parse_file?;
let left = branch_parser.parse_file?;
let right = branch_parser.parse_file?;
// Build matching between the three trees
let matching = new;
// Perform the merge
let mut merger = new;
merger.merge?;
// Check for conflicts
if merger.conflict_log.has_conflicts
Diff Generation
use ;
use io;
// Parse base and modified versions
let base_parser = new;
let branch_parser = new;
let base = base_parser.parse_file?;
let branch = branch_parser.parse_file?;
// Build matching optimized for diff
let matching = new;
// Generate diff
if let Some = new
Patch Application
use ;
use io;
// Parse base and diff
let parser = new;
let base = parser.parse_file?;
let diff = parser.parse_file?;
// Apply patch
let patcher = new;
patcher.patch?;
Algorithm
The 3DM algorithm works in several phases:
- Parsing: Builds tree representations of the XML documents
- Matching: Uses heuristic matching to find corresponding nodes across trees
- Exact matching by content hash
- Fuzzy matching using Q-gram distance
- Structural matching of children
- Merge: Combines changes from both branches relative to the base
- Detects inserts, deletes, moves, and updates
- Resolves operation conflicts using a conflict table
- Merges element content and attributes
- Output: Serializes the merged tree back to XML
Configuration
Copy Detection Threshold
The minimum size (in information bytes) for copy detection can be configured:
use TriMatching;
// Disable copy detection
let matching = with_copy_threshold;
// Increase threshold (default is 128)
let matching = with_copy_threshold;
Custom Matching Algorithms
You can provide custom matching implementations:
use ;
let left_matching = new;
let right_matching = new;
let matching = with_matching;
Limitations
- No namespace awareness (attributes treated as simple name-value pairs)
- Processing instructions are not preserved (comments ARE preserved)
- Limited support for mixed content (element + text siblings)
- Requires well-formed XML input
License
Licensed under the GNU Lesser General Public License v2.1 or later (LGPL-2.1-or-later), matching the original 3DM implementation.
Credits
Original 3DM implementation by Tancred Lindholm (Helsinki University of Technology, 2001).