Efficient port of Google's diff-match-patch implemented in Rust
Fastest, accurate and wasm ready implementation of Diff Match Patch in Rust based on Myers' diff algorithm.
Highlights of this crate
- Exposes two modes of operating with
diff-match-patch, aEfficientmode andCompatmode. While theEfficientmode squeezes out the max performance theCompatmode ensures compatibility across other libraries or implementations (rust or otherwise). According to Benchmarks, our slowerCompatmode is still faster than other implementations in rust.Efficientmode works on&[u8]and the generated diffs break compatibility with other implementation. Use theEfficientmode ONLY if you are using this crate at the source of diff generation and the destination.Compatmode on the other hand works on&[char]and the generateddiffsandpatchesare compatible across other implementations ofdiff-match-patch. Checkouttests/compat.rsfor some test cases around this.
wasmready, you can check out a demo here- Accurate, while working on this crate I've realized that there are a bunch of implementations that have major issues (wrong diffs, inaccurate flows, silent errors etc.).
- Helper method pretty_html provided by this crate allows some configurations to control the generated visuals elements.
- Well tested
- Added a
fuzzerfor sanity - Exposes the same APIs as Diff Match Patch with minor changes to make it more idiomatic in Rust.
Benchmarks
Benchmarks are maintained diff-match-patch-bench repository
| Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct |
|---|---|---|---|---|---|---|
rust |
diff_match_patch v0.1.1[^1] | 56.618 ms | 9.00 ms | Criterion | - | ✅ |
rust |
dmp v0.2.0 | 56.609 ms | 12.25 ms | Criterion | - | ✅ |
rust |
diff-match-patch-rsour | 32.699 ms | 533.17 µs | Criterion | Efficient |
✅ |
rust |
diff-match-patch-rsour | 33.171 ms | 993.90 µs | Criterion | Compat |
✅ |
go |
go-diff[^2] | 30.31 ms | 118.2 ms | go test | - | ❗ |
node |
diff-match-patch[^3] | 246.90 ms | 1.07 ms | tinybench | - | ❌ |
python |
diff-match-patch | 1.01 s | 0.25 ms | timeit | - | ✅ |
[^1]: diff_match_patch Adds an extra clone to the iterator because the patch_apply method takes mutable refc. to diffs.
[^2]: go-diff is technically a correct implementation but it generates far more diffs than any other implementation I've tested. E.g. In our test data here the go lib ends up generating ~3000 diffs while other implementations are generating ~2300 diffs. My guess is one of the cleanup passes are skipped by go-diff.
[^3]: diff-match-patch generated patch text and delta breaks on unicode surrogates.
Usage Examples
[]
= "0.5.1"
Effitient mode
use ;
// This is the source text
const TXT_OLD: &str = "I am the very model of a modern Major-General, I've information on vegetable, animal, and mineral, 🚀👏👀";
// Let's assume this to be the text that was editted from the source text
const TXT_NEW: &str = "I am the very model of a cartoon individual, My animation's comical, unusual, and whimsical.😊👀";
// An example of a function that creates a diff and returns a set of patches serialized
Compat mode
use ;
// This is the source text
const TXT_OLD: &str = "I am the very model of a modern Major-General, I've information on vegetable, animal, and mineral, 🚀👏👀";
// Let's assume this to be the text that was editted from the source text
const TXT_NEW: &str = "I am the very model of a cartoon individual, My animation's comical, unusual, and whimsical.😊👀";
// An example of a function that creates a diff and returns a set of patches serialized
Match - fuzzy match of pattern in Text
use ;
// This is the source text
const TXT: &str = "I am the very model of a modern Major-General, I've information on vegetable, animal, and mineral, 🚀👏👀";
// The patter we are trying to fing
const PATTERN: &str = " that berry ";
// Returns `location` of match if found, `None` if not found
Note
The Efficient and Compat mode APIs are identical with the only chage being the generic parameter declared during the calls.
E.g. we initiated a diff in the Efficient mode with dmp.diff_main::<Efficient>( ... ) while for Compat mode we did dmp.diff_main::<Compat>( ... ).
Please checkout the examples directory of the source repo for a few common use-cases.
Gotchas
Diff incompatibility with JavaScript libs:
There are 2 kinds of implementations - one which use a postprocessing function for merging unicode surrogates which break compatibility with every other popular diff-match-patch implementations and the other kind (packages based on the original implementation) break while urlEncode() of unicode surrogates.
As of now, this crate brakes compatibility while working with JS generated diffs with the surrogate patch.
If you are interfacing with JavaScript in browser, using this crate through wasm would be ideal.
Related projects
Diff Match Patch was originally built in 2006 to power Google Docs.
- Diff Match Patch (and it's fork)
- Rust: Distil.io diff_match_patch
- Rust: dmp
- Rust: Dissimilar by the awesome David Tolnay
- Rust: diff_match_patch