Trait BlockHashPositionArrayImpl

Source

pub trait BlockHashPositionArrayImpl: BlockHashPositionArrayData {
    // Required methods
    fn is_equiv(&self, other: &[u8]) -> bool;
    fn has_common_substring(&self, other: &[u8]) -> bool;
    fn edit_distance(&self, other: &[u8]) -> u32;
    fn score_strings_raw(&self, other: &[u8]) -> u32;
    fn score_strings(&self, other: &[u8], log_block_size: u8) -> u32;
}

Expand description

The implementation of the block hash position array (safe; immutable).

This trait is automatically implemented for all types implementing the BlockHashPositionArrayData trait.

§Compatibility Notice

This trait is going to be completely private on the next major release. If you need to experiment with internal hashing functions, just vendor the source code for your needs.

Required Methods§

Source

fn is_equiv(&self, other: &[u8]) -> bool

Compare whether two block hashes are equivalent.

§Usage Constraints

The length of other must not exceed 64.
All elements in other must be less than block_hash::ALPHABET_SIZE.

Source

fn has_common_substring(&self, other: &[u8]) -> bool

Checks whether two given strings have common substrings with a length of block_hash::MIN_LCS_FOR_COMPARISON.

§Algorithm Implemented (By Default)

This function implements an extension to the algorithm described in [Baeza-Yates and Gonnet, 1992] (doi:10.1145/135239.135243) to find a fixed-length common substring. The original algorithm is the Backward Shift-Add algorithm for the k-LCF problem as described in [Hirvola, 2016] (which searches the longest common substring with up to k errors under the Hamming distance).

This algorithm is modified:

to search only perfect matches (up to 0 errors; k = 0),
to return as soon as possible if it finds a common substring and
to share the position array representation with edit_distance() by reversing a pattern from the original paper (the original algorithm used reverse “incidence matrix”).

§Usage Constraints

All elements in other must be less than block_hash::ALPHABET_SIZE.

Source

fn edit_distance(&self, other: &[u8]) -> u32

Computes the edit distance between two given strings.

Specifically, it computes the Longest Common Subsequence (LCS) distance, allowing character addition and deletion as two primitive operations (in cost 1).

§Algorithm Implemented (By Default)

This method implements the longest common subsequence length (LCS length or LLCS) algorithm as in [Hyyrö, 2004] and then converts the LCS length to the LCS distance using the basic relation between them.

§Usage Constraints

The length of other must be short enough (up to block_hash::FULL_SIZE is guaranteed to be safe).
All elements in other must be less than block_hash::ALPHABET_SIZE.

Source

fn score_strings_raw(&self, other: &[u8]) -> u32

Compare two block hashes and computes the similarity score without capping.

This method does not “cap” the score to prevent exaggerating the matches that are not meaningful enough, making this function block size independent.

§Usage Constraints

The lengths of both self and other must not exceed block_hash::FULL_SIZE.
All elements in other must be less than block_hash::ALPHABET_SIZE.

Source

fn score_strings(&self, other: &[u8], log_block_size: u8) -> u32

Compare two block hashes and computes the similarity score.

This method “caps” the score to prevent exaggerating the matches that are not meaningful enough. This behavior depends on the block size (score cap gets higher when block size gets higher).

§Usage Constraints

The lengths of both self and other must not exceed block_hash::FULL_SIZE.
All elements in other must be less than block_hash::ALPHABET_SIZE.
log_block_size must be valid or must be equal to block_size::NUM_VALID (this value itself is not valid as a block size for a fuzzy hash object but valid on this method).

Implementors§

Source §

BlockHashPositionArrayImpl

Trait BlockHashPositionArrayImpl Copy item path

§Compatibility Notice

Required Methods§

fn is_equiv(&self, other: &[u8]) -> bool

§Usage Constraints

fn has_common_substring(&self, other: &[u8]) -> bool

§Algorithm Implemented (By Default)

§Usage Constraints

fn edit_distance(&self, other: &[u8]) -> u32

§Algorithm Implemented (By Default)

§Usage Constraints

fn score_strings_raw(&self, other: &[u8]) -> u32

§Usage Constraints

fn score_strings(&self, other: &[u8], log_block_size: u8) -> u32

§Usage Constraints

Implementors§

impl<T> BlockHashPositionArrayImpl for Twhere T: BlockHashPositionArrayImplInternal,

Trait BlockHashPositionArrayImpl

impl<T> BlockHashPositionArrayImpl for T
where T: BlockHashPositionArrayImplInternal,