pub unsafe trait BlockHashPositionArrayImplUnchecked: BlockHashPositionArrayData {
// Required methods
unsafe fn is_equiv_unchecked(&self, other: &[u8]) -> bool;
unsafe fn has_common_substring_unchecked(&self, other: &[u8]) -> bool;
unsafe fn edit_distance_unchecked(&self, other: &[u8]) -> u32;
unsafe fn score_strings_raw_unchecked(&self, other: &[u8]) -> u32;
unsafe fn score_strings_unchecked(
&self,
other: &[u8],
log_block_size: u8
) -> u32;
}unchecked only.Expand description
The implementation of the block hash position array (unchecked; immutable).
Safety
This trait contains unsafe methods and need to comply with constraints
described in each method.
Incompatibility Notice (1)
This trait is renamed from BlockHashPositionArrayImplUnsafe to
BlockHashPositionArrayImplUnchecked in the version 0.2.3. The old name
BlockHashPositionArrayImplUnsafe will be removed on the version 0.3.0.
Incompatibility Notice (2)
Since version 0.3.0, all types implementing BlockHashPositionArrayData
will satisfy automatic implementation of methods in this trait.
It enables remote custom types to utilize our implementation of
position array representation.
Not being able to use auto implementation from any type implementing
BlockHashPositionArrayData was a bug but we need a breaking change to
fix this issue.
Required Methods§
sourceunsafe fn is_equiv_unchecked(&self, other: &[u8]) -> bool
unsafe fn is_equiv_unchecked(&self, other: &[u8]) -> bool
Compare whether two block hashes are equivalent.
Safety
- The length of
othermust not exceed 64. - All elements in
othermust be less thanblock_hash::ALPHABET_SIZE.
If they are not satisfied, it will return a meaningless value.
sourceunsafe fn has_common_substring_unchecked(&self, other: &[u8]) -> bool
unsafe fn has_common_substring_unchecked(&self, other: &[u8]) -> bool
Checks whether two given strings have common substrings with a length
of block_hash::MIN_LCS_FOR_COMPARISON.
Algorithm Implemented (By Default)
This function implements an extension to the algorithm described in [Baeza-Yates and Gonnet, 1992] (doi:10.1145/135239.135243) to find a fixed-length common substring. The original algorithm is the Backward Shift-Add algorithm for the k-LCF problem as described in [Hirvola, 2016] (which searches the longest common substring with up to k errors under the Hamming distance).
This algorithm is modified:
- to search only perfect matches (up to 0 errors; k = 0),
- to return as soon as possible if it finds a common substring and
- to share the position array representation with
BlockHashPositionArrayImpl::edit_distance()by reversing a pattern from the original paper (the original algorithm used reverse “incidence matrix”).
Safety
- All elements in
othermust be less thanblock_hash::ALPHABET_SIZE.
If the condition above is not satisfied, it will return a meaningless value.
sourceunsafe fn edit_distance_unchecked(&self, other: &[u8]) -> u32
unsafe fn edit_distance_unchecked(&self, other: &[u8]) -> u32
Computes the edit distance between two given strings.
Specifically, it computes the Longest Common Subsequence (LCS) distance, allowing character addition and deletion as two primitive operations (in cost 1).
Algorithm Implemented (By Default)
This method implements the longest common subsequence length (LCS length or LLCS) algorithm as in [Hyyrö, 2004] and then converts the LCS length to the LCS distance using the basic relation between them.
Safety
- The length of
othermust be short enough (up toblock_hash::FULL_SIZEis guaranteed to be safe). - All elements in
othermust be less thanblock_hash::ALPHABET_SIZE.
If they are not satisfied, it will return a meaningless distance.
sourceunsafe fn score_strings_raw_unchecked(&self, other: &[u8]) -> u32
unsafe fn score_strings_raw_unchecked(&self, other: &[u8]) -> u32
Compare two block hashes and computes the similarity score without capping.
This method does not “cap” the score to prevent exaggerating the matches that are not meaningful enough, making this function block size independent.
Safety
- The lengths of both
selfandothermust not exceedblock_hash::FULL_SIZE. - All elements in
othermust be less thanblock_hash::ALPHABET_SIZE.
If they are not satisfied, it will return a meaningless score.
sourceunsafe fn score_strings_unchecked(
&self,
other: &[u8],
log_block_size: u8
) -> u32
unsafe fn score_strings_unchecked( &self, other: &[u8], log_block_size: u8 ) -> u32
Compare two block hashes and computes the similarity score.
This method “caps” the score to prevent exaggerating the matches that are not meaningful enough. This behavior depends on the block size (score cap gets higher when block size gets higher).
Safety
- The lengths of both
selfandothermust not exceedblock_hash::FULL_SIZE. - All elements in
othermust be less thanblock_hash::ALPHABET_SIZE. log_block_sizemust be valid or must be equal toblock_size::NUM_VALID(this value itself is not valid as a block size for a fuzzy hash object but valid on this method).
If they are not satisfied, it will return a meaningless score.