natural-xml-diff 0.2.0

# natural-xml-diff

[![Crates.io](https://img.shields.io/crates/v/natural-xml-diff.svg)](https://crates.io/crates/natural-xml-diff)
[![Documentation](https://docs.rs/natural-xml-diff/badge.svg)](https://docs.rs/natural-xml-diff)

The `natural-xml-diff` crate implements a diffing algorithm that attempts to
produce correct and human readable differences between two XML documents.

[API Documentation](https://docs.rs/natural-xml-diff)

## Algorithm

The algorithm implemented by this library is based on the paper ["Bridging the
gap between tracking and detecting changes on
XML"](https://www.researchgate.net/publication/3943331_Detecting_changes_in_XML_documents).
It is also implemented by the Java-based [jndiff
library](https://jndiff.sourceforge.net/).

## Work in progress

This is still a work in progress!

## Credits

[Paligo](https://paligo.net)

## Structural diffing

Let's consider the following XML document, taken from the "Bridging the Gap" paper:

```xml
<?xml version="1.0"?>
<book>
  <chapter>
    <title>Text 1</title>
    <para>Text 2</para>
  </chapter>
  <chapter>
    <title>Text 4</title>
    <para>Text 5</para>
  </chapter>
  <chapter>
    <title>Text 6</title>
    <para>Text 7<img/>Text 8</para>
  </chapter>
  <chapter>
    <title>Text 9</title>
    <para>Text 10</para>
  </chapter>
  <chapter>
    <para>Text 11</para>
    <para>Text 12</para>
  </chapter>
</book>
```

We'll call that "document A", the "before" of the diffing. Here's the "after", "document B":

```xml
<?xml version="1.0"?>
<book>
  <chapter>
    <para>Text 2</para>
  </chapter>
  <chapter>
    <title>Text 4</title>
    <para>Text 25</para>
    <para>Text 11</para>
  </chapter>
  <chapter>
    <title>Text 6</title>
    <para>Text 7<img/>Text 8</para>
  </chapter>
  <chapter>
    <title>Text 9</title>
    <para>Text 10</para>
  </chapter>
  <chapter>
    <para>Text 12</para>
  </chapter>
</book>
```

Let's present both as trees with numbered nodes (the root node, 0, is not shown).
Here's document A:

```mermaid
graph TD;
    1[1 book]-->2
	  2[2 chapter]-->3
	  2-->5
	  3[3 title]-->4
	  4[4 Text 1]
	  5[5 para]-->6
	  6[6 Text 2]
	  1-->7
	  7[7 chapter] --> 8
	  8[8 title] --> 9
	  9[9 Text 4]
	  7 --> 10
	  10[10 para] --> 11
	  11[11 Text 5]
	  1 --> 12
	  12[12 chapter] --> 13
	  13[13 title] --> 14
	  14[14 Text 6]
	  12-->15
	  15[15 para] --> 16
	  15 --> 17
	  15 --> 18
	  16[16 Text 7]
	  17[18 img]
	  18[19 Text 8]
	  1 --> 19
	  19[19 chapter]
	  19 --> 20
	  20[20 title] --> 21
	  21[21 Text 9]
	  19 --> 22
	  22[22 para] --> 23
	  23[23 Text 10]
	  1 --> 24
	  24[24 chapter] --> 25
	  25[25 para] --> 26
	  26[26 Text 11]
	  24 --> 27
	  27[27 para] --> 28
	  28[28 Text 12]
```

## Maintaining the tests

Some tests use `test_generator` to generate tests from the `testdata` directory.
New tests in that directory aren't automatically picked up however; you have
to force a recompile of the `.rs` files that run the tests to do so. You can
do this by using a non-significant whitespace edit in each `.rs` file that
uses `test_generator` and saving. I hope there's a better solution.