pub struct FuzzyHashCompareTarget { /* private fields */ }Expand description
An efficient position array-based fuzzy hash comparison target.
It can be built from a normalized FuzzyHashData object and represents
the normalized contents of two block hashes as two position arrays.
Although that this structure is large, it is particularly useful if
you compare many of fuzzy hashes and you can fix one of the operands
(this is usually over 10 times faster than batched fuzzy_compare calls
in ssdeep 2.13). Even if we generate this object each time we compare
two fuzzy hashes, it’s usually faster than fuzzy_compare in ssdeep 2.13.
In fact, if you just compare two fuzzy hashes in this crate, a temporary
FuzzyHashCompareTarget object containing two block hashes or
corresponding half-baked object containing
one block hash, will be created from either side of the comparison.
See also:
- “Fuzzy Hash Comparison” section of
FuzzyHashData - “Position Array Representation” section of
BlockHashPositionArrayDatafor the details of internal representation
§Examples
// Requires the global allocator to use `Vec` (default on std).
use ssdeep::{FuzzyHash, FuzzyHashCompareTarget};
// Brute force comparison
let hashes: Vec<FuzzyHash> = Vec::new();
/* ... add fuzzy hashes to `hashes` ... */
let mut target: FuzzyHashCompareTarget = FuzzyHashCompareTarget::new();
for hash1 in &hashes {
target.init_from(hash1);
for hash2 in &hashes {
let score = target.compare(hash2);
/* ... */
}
}Implementations§
Source§impl FuzzyHashCompareTarget
impl FuzzyHashCompareTarget
Sourcepub const LOG_BLOCK_SIZE_CAPPING_BORDER: u8 = 4u8
pub const LOG_BLOCK_SIZE_CAPPING_BORDER: u8 = 4u8
The lower bound (inclusive) of the base-2 logarithm form of the block size in which the score capping is no longer required.
If log_block_size is equal to or larger than this value and len1 and
len2 are at least block_hash::MIN_LCS_FOR_COMPARISON in size,
Self::score_cap_on_block_hash_comparison(log_block_size, len1, len2)
is guaranteed to be 100 or greater.
The score “cap” is computed as
(1 << log_block_size) * min(len1, len2).
If this always guaranteed to be 100 or greater,
capping the score is not longer required.
See also: “Fuzzy Hash Comparison” section of FuzzyHashData
§Backgrounds
§Theorem
For all positive integers a, b and c, a <= b * c iff
(a + b - 1) / b <= c (where ceil(a/b) == (a + b - 1) / b).
This is proven by Z3 and Coq in the source code:
- Z3 + Python:
dev/prover/compare/blocksize_capping_theorem.py - Coq:
dev/prover/compare/blocksize_capping_theorem.v
§The Minimum Score Cap
This is expressed as (1 << log_block_size) * MIN_LCS_FOR_COMPARISON
because both block hashes must at least as long as
block_hash::MIN_LCS_FOR_COMPARISON to perform edit distance-based
scoring.
§Computing the Constant
Applying the theorem above,
100 <= (1 << log_block_size) * MIN_LCS_FOR_COMPARISON
is equivalent to
(100 + MIN_LCS_FOR_COMPARISON - 1) / MIN_LCS_FOR_COMPARISON <= (1 << log_block_size).
This leads to the expression to define this constant.
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new FuzzyHashCompareTarget object with empty contents.
This is equivalent to the fuzzy hash string 3::.
Sourcepub fn log_block_size(&self) -> u8
pub fn log_block_size(&self) -> u8
The base-2 logarithm form of the comparison target’s block size.
See also: “Block Size” section of FuzzyHashData
Sourcepub fn block_size(&self) -> u32
pub fn block_size(&self) -> u32
The block size of the comparison target.
Sourcepub fn block_hash_1(
&self,
) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
pub fn block_hash_1( &self, ) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
Position array-based representation of the block hash 1.
This method provides raw access to the internal efficient block hash representation and fast bit-parallel string functions.
You are not recommended to use this unless you know the internal details deeply.
The result has the same lifetime as this object and implements following traits:
BlockHashPositionArrayDataBlockHashPositionArrayImplBlockHashPositionArrayImplUnchecked(only if theuncheckedfeature is enabled)
Sourcepub fn block_hash_2(
&self,
) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
pub fn block_hash_2( &self, ) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
Position array-based representation of the block hash 2.
See also: block_hash_1()
Sourcepub fn full_eq(&self, other: &Self) -> bool
pub fn full_eq(&self, other: &Self) -> bool
Performs full equality checking of the internal structure.
This type intentionally lacks the implementation of PartialEq
because of its large size. However, there’s a case where we need to
compare two comparison targets.
The primary purpose of this is debugging and it compares all internal
members inside the structure (just like auto-generated
PartialEq::eq()).
Note that, despite that it is only relevant to users when the
unchecked feature is enabled but made public without any features
because this method is not unsafe or unchecked in any way.
Sourcepub fn init_from<const S1: usize, const S2: usize>(
&mut self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>,
)where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn init_from<const S1: usize, const S2: usize>(
&mut self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>,
)where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Initialize the object from a given fuzzy hash.
Sourcepub fn is_equiv<const S1: usize, const S2: usize>(
&self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_equiv<const S1: usize, const S2: usize>(
&self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare whether two fuzzy hashes are equivalent.
Sourcepub unsafe fn raw_score_by_edit_distance_unchecked(
len_block_hash_lhs: u8,
len_block_hash_rhs: u8,
edit_distance: u32,
) -> u32
Available on crate feature unchecked only.
pub unsafe fn raw_score_by_edit_distance_unchecked( len_block_hash_lhs: u8, len_block_hash_rhs: u8, edit_distance: u32, ) -> u32
unchecked only.Returns the raw score (without capping) based on the lengths of block hashes and the edit distance between them.
This method assumes that following constraints are satisfied.
§Compatibility Note
The types of len_block_hash_lhs and len_block_hash_rhs were
changed from u32 to u8 on the version 0.3.
§Safety
-
Both
len_block_hash_lhsandlen_block_hash_rhsmust satisfy:- Equal to or greater than
MIN_LCS_FOR_COMPARISON, - Equal to or less than
FULL_SIZE.
- Equal to or greater than
-
edit_distancemust be equal to or less thanlen_block_hash_lhs + len_block_hash_rhs - 2 * MIN_LCS_FOR_COMPARISON.This constraint comes from the constraints of the lengths and the fact that we shall have a common substring of the length
MIN_LCS_FOR_COMPARISONto perform an edit distance-based comparison (reducing maximum possible edit distance by2 * MIN_LCS_FOR_COMPARISON).
If they are not satisfied, it will at least return a meaningless score and will cause a division by zero on the worst case.
Sourcepub fn raw_score_by_edit_distance(
len_block_hash_lhs: u8,
len_block_hash_rhs: u8,
edit_distance: u32,
) -> u32
pub fn raw_score_by_edit_distance( len_block_hash_lhs: u8, len_block_hash_rhs: u8, edit_distance: u32, ) -> u32
Returns the raw score (without capping) based on the lengths of block hashes and the edit distance between them.
This method scales the edit distance to the 0..=100 score familiar to
humans (100 means a perfect match, smaller the score, lower
the similarity).
Note that it doesn’t perform any score capping (that should be performed on smaller block sizes).
See also: “Fuzzy Hash Comparison” section of FuzzyHashData
§Compatibility Note
The types of len_block_hash_lhs and len_block_hash_rhs were
changed from u32 to u8 on the version 0.3.
§Usage Constraints
-
Both
len_block_hash_lhsandlen_block_hash_rhsmust satisfy:- Equal to or greater than
MIN_LCS_FOR_COMPARISON, - Equal to or less than
FULL_SIZE.
- Equal to or greater than
-
edit_distancemust be equal to or less thanlen_block_hash_lhs + len_block_hash_rhs - 2 * MIN_LCS_FOR_COMPARISON.This constraint comes from the constraints of the lengths and the fact that we shall have a common substring of the length
MIN_LCS_FOR_COMPARISONto perform an edit distance-based comparison (reducing maximum possible edit distance by2 * MIN_LCS_FOR_COMPARISON).
§Useful Property
If all arguments are valid, the return value (the raw score) is guaranteed to be greater than zero. Along with the property of the score capping, it means that we should have a non-zero score if we can perform an edit distance-based comparison.
Sourcepub unsafe fn score_cap_on_block_hash_comparison_unchecked(
log_block_size: u8,
len_block_hash_lhs: u8,
len_block_hash_rhs: u8,
) -> u32
Available on crate feature unchecked only.
pub unsafe fn score_cap_on_block_hash_comparison_unchecked( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8, ) -> u32
unchecked only.Returns the “score cap” for a given block size and two block hash lengths, assuming that block size and block hash lengths are small enough so that no arithmetic overflow will occur.
§Safety
log_block_sizemust be less thanFuzzyHashCompareTarget::LOG_BLOCK_SIZE_CAPPING_BORDER.- Both
len_block_hash_lhsandlen_block_hash_rhsmust not exceedblock_hash::FULL_SIZE.
Otherwise, it may cause an arithmetic overflow and return an useless value.
Sourcepub fn score_cap_on_block_hash_comparison(
log_block_size: u8,
len_block_hash_lhs: u8,
len_block_hash_rhs: u8,
) -> u32
pub fn score_cap_on_block_hash_comparison( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8, ) -> u32
Returns the “score cap” for a given block size and two block hash lengths.
The internal block hash comparison method “caps” the score to prevent exaggerating the matches that are not meaningful enough. This behavior depends on the block size (the “cap” gets higher as the block size gets higher) and the minimum of block hash lengths.
The result is not always guaranteed to be in 0..=100 but 100 or
higher means that we don’t need any score capping.
If at least one of the arguments len_block_hash_lhs and
len_block_hash_rhs are less than
block_hash::MIN_LCS_FOR_COMPARISON, the result is
implementation-defined.
If all arguments are valid and log_block_size is
large enough,
100 or greater will be returned, meaning that the score capping is
no longer required.
See also: “Fuzzy Hash Comparison” section of FuzzyHashData
§Compatibility Note
While this method is completely safe even if semantically-invalid parameters are specified (due to arithmetic properties of internal computation and a safety measure in this method), following semantic constraints may be added on the future versions:
log_block_sizemust be valid or must be equal toblock_size::NUM_VALID(this value itself is not valid as a block size for a fuzzy hash object but will be valid on this method).- Both
len_block_hash_lhsandlen_block_hash_rhsmust not exceedblock_hash::FULL_SIZE.
If at least one of the arguments len_block_hash_lhs and
len_block_hash_rhs are less than
block_hash::MIN_LCS_FOR_COMPARISON, this is semantically-invalid.
We haven’t determined whether we need to reject those cases but at
least implementation-defined (may cause a panic in the future).
§Useful Property
If all arguments are valid and both len_block_hash_lhs and
len_block_hash_rhs are non-zero, the return value (the score cap) is
guaranteed to be greater than zero. Along with the property of the
raw scoring, it means that we
should have a non-zero score if we can perform an edit distance-based
comparison.
Source§impl FuzzyHashCompareTarget
impl FuzzyHashCompareTarget
Sourcepub fn is_valid(&self) -> bool
pub fn is_valid(&self) -> bool
Performs full validity checking of the internal structure.
The primary purpose of this is debugging and it should always
return true unless…
- There is a bug in this crate, corrupting this structure,
- A memory corruption is occurred somewhere else or
- An
unsafefunction to construct this object is misused.
Because of its purpose, this method is not designed to be fast.
Note that, despite that it is only relevant to users when the
unchecked feature is enabled but made public without any features
because this method is not unsafe or unchecked in any way.
Sourcepub unsafe fn compare_unequal_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearEq.
§Safety
- Both fuzzy hashes must be different.
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If they are not satisfied, it will return a meaningless score.
Sourcepub fn compare_unequal_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Slow: Compare two fuzzy hashes assuming both are different and
their block sizes have a relation of BlockSizeRelation::NearEq.
§Usage Constraints
- Both fuzzy hashes must be different.
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
§Performance Consideration
This method’s performance is not good enough (because of constraint checking).
Use those instead:
compare_near_eq()(checked)compare_unequal_near_eq_unchecked()(unchecked)
Sourcepub unsafe fn compare_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming their block sizes have
a relation of BlockSizeRelation::NearEq.
§Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If the condition above is not satisfied, it will return a meaningless score.
Sourcepub fn compare_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming their block sizes have
a relation of BlockSizeRelation::NearEq.
§Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
Sourcepub unsafe fn compare_unequal_near_lt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_near_lt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearLt.
§Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
If they are not satisfied, it will return a meaningless score.
Sourcepub fn compare_unequal_near_lt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_lt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearLt.
§Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
Sourcepub unsafe fn compare_unequal_near_gt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_near_gt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearGt.
§Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
If they are not satisfied, it will return a meaningless score.
Sourcepub fn compare_unequal_near_gt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_gt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearGt.
§Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
Sourcepub unsafe fn compare_unequal_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two normalized fuzzy hashes assuming both are different.
§Safety
- Both fuzzy hashes (
selfandother) must be different.
If the condition above is not satisfied, it will return a meaningless score.
Sourcepub fn compare_unequal<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Slow: Compare two normalized fuzzy hashes assuming both are different.
§Usage Constraints
- Both fuzzy hashes (
selfandother) must be different.
§Performance Consideration
This method’s performance is not good enough (because of the constraint checking).
Use those instead:
compare()(checked)compare_unequal_unchecked()(unchecked)
Sourcepub fn compare<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compares two normalized fuzzy hashes.
Sourcepub unsafe fn is_comparison_candidate_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn is_comparison_candidate_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearEq.
See also: is_comparison_candidate()
§Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If the condition above is not satisfied, it will return a meaningless value.
Sourcepub fn is_comparison_candidate_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearEq.
See also: is_comparison_candidate()
§Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
Sourcepub unsafe fn is_comparison_candidate_near_lt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn is_comparison_candidate_near_lt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearLt.
See also: is_comparison_candidate()
§Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
If the condition above is not satisfied, it will return a meaningless value.
Sourcepub fn is_comparison_candidate_near_lt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate_near_lt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearLt.
See also: is_comparison_candidate()
§Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
Sourcepub unsafe fn is_comparison_candidate_near_gt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn is_comparison_candidate_near_gt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearGt.
See also: is_comparison_candidate()
§Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
If the condition above is not satisfied, it will return a meaningless value.
Sourcepub fn is_comparison_candidate_near_gt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate_near_gt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearGt.
See also: is_comparison_candidate()
§Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
Sourcepub fn is_comparison_candidate<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>,
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison.
If this function returns false and self and other are not
equivalent, their similarity will be
calculated to 0.
§Use Case (Example)
This operation is useful to divide a set of unique (normalized) fuzzy hashes into smaller distinct sets. The similarity score can be non-zero if and only if two fuzzy hashes belong to the same set.
§Safety (Warning)
This function (and its variants) can return false if self and
other are equivalent (the base fuzzy hash object of self and other
are the same). In this case, their similarity score is 100.
Because of this, we have to use a set of unique fuzzy hash values on the use case above to prevent false-negative matches.
See “Fuzzy Hash Comparison” section of FuzzyHashData
for the reason why we need to care about those cases.
§Useful Property
If two fuzzy hashes are correctly provided and this method (or its
family) returns true, the similarity score is guaranteed to be
greater than zero.
This property can be used to simplify clustering since we are able to
prove that the similarity score of two different fuzzy hashes is
non-zero if this method (or its family) returns true (i.e. no actual
comparison is required to split clusters on single-linkage clustering).
§Advanced Topic: Implementing Equivalents
While this method family can be important to preprocessing on single-linkage clustering, it can be inefficient as the number of fuzzy hash increases. On such cases, precomputing useful information to compute “comparison candidate” relations will help.
FuzzyHashData::block_hash_1_windows() and family methods provide
window access to block hash windows to enable implementing functionality
equivalent to this method family.
Trait Implementations§
Source§impl Clone for FuzzyHashCompareTarget
impl Clone for FuzzyHashCompareTarget
Source§fn clone(&self) -> FuzzyHashCompareTarget
fn clone(&self) -> FuzzyHashCompareTarget
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more