natural-xml-diff
The natural-xml-diff
crate implements a diffing algorithm that attempts to
produce correct and human readable differences between two XML documents.
Algorithm
The algorithm implemented by this library is based on the paper "Bridging the gap between tracking and detecting changes on XML". It is also implemented by the Java-based jndiff library.
Work in progress
This is still a work in progress!
Credits
Structural diffing
Let's consider the following XML document, taken from the "Bridging the Gap" paper:
Text 1
Text 2
Text 4
Text 5
Text 6
Text 7 Text 8
Text 9
Text 10
Text 11
Text 12
We'll call that "document A", the "before" of the diffing. Here's the "after", "document B":
Text 2
Text 4
Text 25
Text 11
Text 6
Text 7 Text 8
Text 9
Text 10
Text 12
Let's present both as trees with numbered nodes (the root node, 0, is not shown). Here's document A:
graph TD;
1[1 book]-->2
2[2 chapter]-->3
2-->5
3[3 title]-->4
4[4 Text 1]
5[5 para]-->6
6[6 Text 2]
1-->7
7[7 chapter] --> 8
8[8 title] --> 9
9[9 Text 4]
7 --> 10
10[10 para] --> 11
11[11 Text 5]
1 --> 12
12[12 chapter] --> 13
13[13 title] --> 14
14[14 Text 6]
12-->15
15[15 para] --> 16
15 --> 17
15 --> 18
16[16 Text 7]
17[18 img]
18[19 Text 8]
1 --> 19
19[19 chapter]
19 --> 20
20[20 title] --> 21
21[21 Text 9]
19 --> 22
22[22 para] --> 23
23[23 Text 10]
1 --> 24
24[24 chapter] --> 25
25[25 para] --> 26
26[26 Text 11]
24 --> 27
27[27 para] --> 28
28[28 Text 12]
Maintaining the tests
Some tests use test_generator
to generate tests from the testdata
directory.
New tests in that directory aren't automatically picked up however; you have
to force a recompile of the .rs
files that run the tests to do so. You can
do this by using a non-significant whitespace edit in each .rs
file that
uses test_generator
and saving. I hope there's a better solution.