pub trait BlockHashPositionArrayImpl: BlockHashPositionArrayData {
    // Required methods
    fn is_equiv(&self, other: &[u8]) -> bool;
    fn has_common_substring(&self, other: &[u8]) -> bool;
    fn edit_distance(&self, other: &[u8]) -> u32;
    fn score_strings_raw(&self, other: &[u8]) -> u32;
    fn score_strings(&self, other: &[u8], log_block_size: u8) -> u32;
}
Expand description

The implementation of the block hash position array (safe; immutable).

Required Methods§

source

fn is_equiv(&self, other: &[u8]) -> bool

Compare whether two block hashes are equivalent.

Usage Constraints
source

fn has_common_substring(&self, other: &[u8]) -> bool

Checks whether two given strings have common substrings with a length of block_hash::MIN_LCS_FOR_COMPARISON.

Algorithm Implemented (By Default)

This function implements a Boyer–Moore-like bit-parallel algorithm to find a fixed-length common substring. The original algorithm is the Backward Shift-Add algorithm for the k-LCF problem [Hirvola, 2016] (which searches the longest common substring with up to k errors under the Hamming distance).

This algorithm is modified:

  • to search only perfect matches (up to 0 errors; k = 0),
  • to return as soon as possible if it finds a common substring and
  • to share the position array representation with edit_distance() by reversing a pattern from the original paper (the original algorithm used reverse “incidence matrix”).
Usage Constraints
source

fn edit_distance(&self, other: &[u8]) -> u32

Computes the edit distance between two given strings.

Specifically, it computes the Longest Common Subsequence (LCS) distance, allowing character insertion and deletion as two primitive operations (in cost 1).

Algorithm Implemented (By Default)

[Hyyrö et al., 2005] (doi:10.1007/11427186_33) presented a way to compute so called Indel-Distance using a bit-parallel approach and this method is based on it.

This algorithm is needed to be modified for our purpose because the purpose of the original algorithm is to find a “substring” similar to a pattern string.

Usage Constraints
source

fn score_strings_raw(&self, other: &[u8]) -> u32

Compare two block hashes and computes the similarity score without capping.

This method does not “cap” the score to prevent exaggregating the matches that are not meaningful enough, making this function block size independent.

Usage Constraints
source

fn score_strings(&self, other: &[u8], log_block_size: u8) -> u32

Compare two block hashes and computes the similarity score.

This method “caps” the score to prevent exaggregating the matches that are not meaningful enough. This behavior depends on the block size (score cap gets higher when block size gets higher).

Usage Constraints

Implementors§

source§

impl<T> BlockHashPositionArrayImpl for Twhere T: BlockHashPositionArrayImplInternal,