sesdiff 0.1.1

Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
[![Crate](https://img.shields.io/crates/v/sesdiff.svg)](https://crates.io/crates/sesdiff)

# sesdiff: Shortest Edit Script Diff

## Description

This is a small and fast command line tool that reads a two-column tab separated input from standard input and computes the shortest edit script (Myers' diff algorithm) to go from the string in column A to the string in column B. It also computed the edit distance (aka levenshtein distance).

It was written to build lemmatisers.

## Installation

Install it using Rust's package manager:

```
cargo install sesdiff
```

No cargo/rust on your system yet? Do ``sudo apt install cargo`` on Debian/ubuntu based systems, ``brew install rust`` on mac, or use [rustup](https://rustup.rs/).

This tool builds upon [Dissimilar](https://crates.io/crates/dissimilar) that provides the actual diff algorithm (will be
downloaded and compiled in automatically).

## Usage

```
$ sesdiff < input.tsv
```

Example input and output (reformatted for legibility, the first two columns correspond to the input). Output is in a four-column tab separated format:

```
hablaron        hablar     =[hablar]-[on]                  2
contaron        contar     =[contar]-[on]                  2
pidieron        pedir      =[p]-[i]+[e]=[di]-[eron]+[r]    6
говорим         говорить   =[говори]-[м]+[ть]              3
```

By default the full edit script will be provided in a simple language:

* ``=[]`` - The text between brackets is identical in strings A and B
* ``-[]`` - The text between brackets is removed to get to string B
* ``+[]`` - The text between brackets is added to get to string B

For lemmatisation purposes, it makes sense for many languages to look at
suffixes (from right to left) and strip common prefixes. Pass the ``--suffix``
option for that behaviour and output is now:

```
$ sesdiff --suffix < input.tsv
hablaron        hablar          -[on]                      2
contaron        contar          -[on]                      2
pidieron        pedir           -[eron]+[r]=[di]-[i]+[e]   6
говорим         говорить        -[м]+[ть]                  3
```

There is also a ``--prefix`` option that strips common suffixes.


# License

GNU General Public Licence v3