Expand description
bytecmp
is a simple crate which offers data comparison mechanisms which go beyond the simple
equality. It only operates on byte slices, hence its name, and relies on efficiently finding
common substrings between two blob of data. The implementation relies on two different linear
time algorithms: a HashMap
based algorithm called HashMatch
and
a suffix tree built using Ukkonen algorithm called TreeMatch
.
§Examples
Iterate over the matches between two strings using HashMatch
with a
minimum match length of 2 bytes:
use bytecmp::{AlgoSpec, MatchIterator};
let a = "abcdefg";
let b = "012abc34cdef56efg78abcdefg";
let match_iter = MatchIterator::new(a.as_bytes(), b.as_bytes(), AlgoSpec::HashMatch(2)).unwrap();
for m in match_iter {
println!("Match: {:}", &a[m.first_pos..m.first_end()]);
}
Construct a patch set to build the file b
from the file a
using TreeMatch
with a minimum match length of 4 bytes:
use std::fs::File;
use std::io::Read;
use bytecmp::{AlgoSpec, patch_set};
let mut a = Vec::<u8>::new();
let mut b = Vec::<u8>::new();
File::open("a").unwrap().read_to_end(&mut a);
File::open("b").unwrap().read_to_end(&mut b);
let ps = patch_set(&a, &b, AlgoSpec::TreeMatch(4)).unwrap();
for patch in ps {
println!("b[0x{:x}..0x{:x}] == a[0x{:x}..0x{:x}]", patch.second_pos, patch.second_end(), patch.first_pos, patch.first_end());
}
Modules§
- hashmatch
- HashMatch is a binary matching algorithm based on a
HashMap
to retrieve the begining of matching strings. - treematch
- TreeMatch is a binary matching algorithm based on a suffix tree to retrieve matching strings.
Structs§
- Match
- A structure representing a matching substring between two pieces of data.
- Match
Iterator - A generic wrapper for
HashMatchIterator
andTreeMatchIterator
.
Enums§
- Algo
Spec - An enumeration describing the algorithm specification: either
HashMatch
orTreeMatch
with the minimal matching length parameter.
Functions§
- longest_
common_ substring - Return the longest common substring between two byte slices.
- longest_
common_ substrings - Return the
N
longest common substrings between two byte slices. The vector is sorted in decreasing order ofMatch
length. - patch_
set - Identify the smallest set of patches needed the build the second byte slice from the first.
- unique_
strings - Find the list of unique strings from the second byte slice which can’t be found in the first.