Crate tree_edit_distance

Source
Expand description

§Overview

This crate provides an implementation of a recursive generalized version of the Levenshtein distance for arbitrary sequences that finds the smallest possible diff between two trees, according to a user-defined measure for the cost of inserting and removing nodes. The smallest possible diff is defined by the the lowest cost sequence of edits that transforms one tree into the other.

§Example

use tree_edit_distance::*;
use std::mem::{discriminant, Discriminant};
use std::iter::empty;

enum Json {
    Null,
    Bool(bool),
    Number(f64),
    String(String),
    Array(Vec<Json>),
    Map(Vec<(String, Json)>),
}

impl Node for Json {
    type Kind = Discriminant<Json>;
    fn kind(&self) -> Self::Kind {
        discriminant(self)
    }

    type Weight = u64;
    fn weight(&self) -> Self::Weight {
        1
    }
}

impl Tree for Json {
    type Children<'c> = Box<dyn Iterator<Item = &'c Self> + 'c>
    where
        Self: 'c;

    fn children(&self) -> Self::Children<'_> {
        match self {
            Json::Array(a) => Box::new(a.iter()),
            Json::Map(m) => Box::new(m.iter().map(|(_, v)| v)),
            _ => Box::new(empty()),
        }
    }
}

macro_rules! json {
    ($( $tokens:tt )*) => {
        // ...
    };
}

let john = json! {
    "name": "John Doe",
    "age": 43,
    "phones": [
        "+44 1234567",
        "+44 2345678"
    ]
};

let jane = json! {
    "name": "Jane Doe",
    "maiden name": "Smith",
    "age": 40,
    "phones": [
        "+44 7654321",
    ]
};

let (edits, cost) = diff(&john, &jane);

assert_eq!(cost, 2);

assert_eq!(&*edits, &[
    Edit::Replace(Box::new([
        Edit::Replace(Box::default()),      // "name"
        Edit::Insert,                       // "maiden name"
        Edit::Replace(Box::default()),      // "age"
        Edit::Replace(Box::new([            // "phones"
            Edit::Remove,
            Edit::Replace(Box::default()),
        ])),
    ]))
]);

Enums§

Edit
A single operation between two Nodes.

Traits§

Node
An abstraction for a generic tree node.
Tree
An abstraction for a recursive tree.

Functions§

diff
Finds the lowest cost sequence of Edits that transforms one Tree into the other.