Crate bcmp [−] [src]

bcmp is a simple crate which offers data comparison mechanisms which go beyond the simple equality. It only operates on byte slices, hence its name, and relies on efficiently finding common substrings between two blob of data. The implementation relies on two different linear time algorithms: a HashMap based algorithm called HashMatch and a suffix tree built using Ukkonen algorithm called TreeMatch.

Examples

Iterate over the matches between two strings using HashMatch with a minimum match length of 2 bytes:

use bcmp::{AlgoSpec, MatchIterator};

let a = "abcdefg";
let b = "012abc34cdef56efg78abcdefg";
let match_iter = MatchIterator::new(a.as_bytes(), b.as_bytes(), AlgoSpec::HashMatch(2));
for m in match_iter {
    println!("Match: {:}", &a[m.first_pos..m.first_end()]);
}

Construct a patch set to build the file b from the file a using TreeMatch with a minimum match length of 4 bytes:

use std::fs::File;
use std::io::Read;
 
use bcmp::{AlgoSpec, patch_set};
 
let mut a = Vec::<u8>::new();
let mut b = Vec::<u8>::new();
File::open("a").unwrap().read_to_end(&mut a);
File::open("b").unwrap().read_to_end(&mut b);

let ps = patch_set(&a, &b, AlgoSpec::TreeMatch(4));
for patch in ps {
    println!("b[0x{:x}..0x{:x}] == a[0x{:x}..0x{:x}]", patch.second_pos, patch.second_end(), patch.first_pos, patch.first_end());
}

Modules

hashmatch	HashMatch is a binary matching algorithm based on a `HashMap` to retrieve the begining of matching strings.
treematch	TreeMatch is a binary matching algorithm based on a suffix tree to retrieve matching strings.

Structs

Match	A structure representing a matching substring between two pieces of data.
MatchIterator	A generic wrapper for `HashMatchIterator` and `TreeMatchIterator`.

Enums

AlgoSpec

An enumeration describing the algorithm specification: either HashMatch or TreeMatch with the minimal matching length parameter.

Functions

longest_common_substring	Return the longest common substring between two byte slices.
longest_common_substrings	Return the `N` longest common substrings between two byte slices. The vector is sorted in decreasing order of `Match` length.
patch_set	Identify the smallest set of patches needed the build the second byte slice from the first.
unique_strings	Find the list of unique strings from the second byte slice which can't be found in the first.