pub enum Algorithm {
Histogram,
Myers,
MyersMinimal,
}blob only.Expand description
imara-diff supports multiple different algorithms
for computing an edit sequence.
These algorithms have different performance and all produce different output.
Variants§
Histogram
A variation of the patience diff algorithm described by Bram Cohen’s blog post
that uses a histogram to find the least common LCS.
Just like the patience diff algorithm, this algorithm usually produces
more human-readable output than Myers algorithm.
However, compared to the patience diff algorithm (which is slower than Myers algorithm),
the Histogram algorithm performs much better.
The implementation here was originally ported from git but has been significantly
modified to improve performance.
As a result, it consistently performs better than Myers algorithm (5%-100%) over
a wide variety of test data.
For pathological subsequences that only contain highly repeating tokens (64+ occurrences) the algorithm falls back on Myers algorithm (with heuristics) to avoid quadratic behavior.
Compared to Myers algorithm, the Histogram diff algorithm is more focused on providing human-readable diffs instead of minimal diffs. In practice, this means that the edit sequences produced by the histogram diff are often longer than those produced by Myers algorithm.
The heuristic used by the histogram diff does not work well for inputs with small (often repeated)
tokens. For example, character diffs do not work well as most (English) text is made up of
a fairly small set of characters. The Histogram algorithm will automatically detect these cases and
fall back to Myers algorithm. However, this detection has a nontrivial overhead, so
if it’s known upfront that the sort of tokens is very small, Myers algorithm should
be used instead.
Myers
An implementation of the linear space variant of
Myers O((N+M)D) algorithm.
The algorithm is enhanced with preprocessing that removes
tokens that don’t occur in the other file at all.
Furthermore, two heuristics for the middle snake search are implemented
that ensure reasonable runtime (mostly linear time complexity) even for large files.
Due to the divide-and-conquer nature of the algorithm,
the edit sequences produced are still fairly small even when the middle snake
search is aborted by a heuristic.
However, the produced edit sequences are not guaranteed to be fully minimal.
If that property is vital to you, use the MyersMinimal algorithm instead.
The implementation (including the preprocessing) is mostly
ported from git and gnu-diff, where Myers algorithm is used
as the default diff algorithm.
Therefore, the used heuristics have been heavily battle-tested and
are known to behave well over a large variety of inputs.
MyersMinimal
Same as Myers but the early abort heuristics are disabled to guarantee
a minimal edit sequence.
This can mean significant slowdown in pathological cases.
Trait Implementations§
impl Copy for Algorithm
impl Eq for Algorithm
impl StructuralPartialEq for Algorithm
Auto Trait Implementations§
impl Freeze for Algorithm
impl RefUnwindSafe for Algorithm
impl Send for Algorithm
impl Sync for Algorithm
impl Unpin for Algorithm
impl UnsafeUnpin for Algorithm
impl UnwindSafe for Algorithm
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.