Struct ssdeep::FuzzyHashCompareTarget
source · pub struct FuzzyHashCompareTarget { /* private fields */ }Expand description
An efficient position array-based fuzzy hash comparison target.
It can be built from a normalized FuzzyHashData object and represents
the normalized contents of two block hashes as two position arrays.
Although that this structure is large, it is particularly useful if
you compare many of fuzzy hashes and you can fix one of the operands
(this is usually over 10 times faster than batched fuzzy_compare calls
in ssdeep 2.13). Even if we generate this object each time we compare
two fuzzy hashes, it’s usually faster than fuzzy_compare in ssdeep 2.13.
In fact, if you just compare two fuzzy hashes in this crate, a temporary
FuzzyHashCompareTarget object is created from either side
of the comparison.
See also: “Fuzzy Hash Comparison” section of FuzzyHashData
Examples
// Requires the global allocator to use `Vec` (default on std).
use ssdeep::{FuzzyHash, FuzzyHashCompareTarget};
// Brute force comparison
let hashes: Vec<FuzzyHash> = Vec::new();
/* ... add fuzzy hashes to `hashes` ... */
let mut target: FuzzyHashCompareTarget = FuzzyHashCompareTarget::new();
for hash1 in &hashes {
target.init_from(hash1);
for hash2 in &hashes {
let score = target.compare(hash2);
/* ... */
}
}Implementations§
source§impl FuzzyHashCompareTarget
impl FuzzyHashCompareTarget
sourcepub const MIN_LCS_FOR_BLOCKHASH: usize = 7usize
👎Deprecated
pub const MIN_LCS_FOR_BLOCKHASH: usize = 7usize
The minimum length of the common substring to compute edit distance between two block hashes.
Use block_hash::MIN_LCS_FOR_COMPARISON instead.
Incompatibility Notice
This constant will be removed on the version 0.3.0.
sourcepub const LOG_BLOCK_SIZE_CAPPING_BORDER: u8 = 4u8
pub const LOG_BLOCK_SIZE_CAPPING_BORDER: u8 = 4u8
The lower bound (inclusive) of the base-2 logarithm form of the block size in which the score capping is no longer required.
If log_block_size is equal to or larger than this value and len1 and
len2 are at least block_hash::MIN_LCS_FOR_COMPARISON in size,
Self::score_cap_on_block_hash_comparison(log_block_size, len1, len2)
is guaranteed to be 100 or greater.
The score “cap” is computed as
(1 << log_block_size) * min(len1, len2).
If this always guaranteed to be 100 or greater,
capping the score is not longer required.
Backgrounds
Theorem
For all positive integers a, b and c, a <= b * c iff
(a + b - 1) / b <= c (where ceil(a/b) == (a + b - 1) / b).
This is proven by Z3 and (partially) Coq in the source code:
- Z3 + Python:
dev/prover/compare/blocksize_capping_theorem.py - Coq (uses existing ceiling function instead of
(a + b - 1) / b):
dev/prover/compare/blocksize_capping_theorem.v
The Minimum Score Cap
This is expressed as (1 << log_block_size) * MIN_LCS_FOR_COMPARISON
because both block hashes must at least as long as
block_hash::MIN_LCS_FOR_COMPARISON to perform edit distance-based
scoring.
Computing the Constant
Applying the theorem above,
100 <= (1 << log_block_size) * MIN_LCS_FOR_COMPARISON
is equivalent to
(100 + MIN_LCS_FOR_COMPARISON - 1) / MIN_LCS_FOR_COMPARISON <= (1 << log_block_size).
This leads to the expression to define this constant.
sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new FuzzyHashCompareTarget object with empty contents.
This is equivalent to the fuzzy hash string 3::.
sourcepub fn log_block_size(&self) -> u8
pub fn log_block_size(&self) -> u8
The base-2 logarithm form of the comparison target’s block size.
See also: “Block Size” section of FuzzyHashData
sourcepub fn block_size(&self) -> u32
pub fn block_size(&self) -> u32
The block size of the comparison target.
sourcepub fn block_hash_1(
&self
) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
pub fn block_hash_1( &self ) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
Position array-based representation of the block hash 1.
This method provices raw access to the internal efficient block hash representation and fast bit-parallel string functions.
You are not recommended to use this unless you know the internal details deeply.
The result has the same lifetime as this object and implements following traits:
BlockHashPositionArrayDataBlockHashPositionArrayImplBlockHashPositionArrayImplUnchecked(only if theuncheckedfeature is enabled)
sourcepub fn block_hash_2(
&self
) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
pub fn block_hash_2( &self ) -> impl '_ + BlockHashPositionArrayImpl + BlockHashPositionArrayImplUnchecked
Position array-based representation of the block hash 2.
See also: block_hash_1()
sourcepub fn full_eq(&self, other: &Self) -> bool
pub fn full_eq(&self, other: &Self) -> bool
Performs full equality checking of the internal structure.
This type intentionally lacks the implementation of PartialEq
because of its large size. However, there’s a case where we need to
compare two comparison targets.
The primary purpose of this is debugging and it compares all internal
members inside the structure (just like auto-generated
PartialEq::eq()).
Note that, despite that it is only relevant to users when the
unchecked feature is enabled but made public without any features
because this method is not unsafe.
sourcepub fn init_from<const S1: usize, const S2: usize>(
&mut self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>
)where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn init_from<const S1: usize, const S2: usize>( &mut self, hash: impl AsRef<FuzzyHashData<S1, S2, true>> )where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Initialize the object from a given fuzzy hash.
sourcepub fn is_equiv<const S1: usize, const S2: usize>(
&self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_equiv<const S1: usize, const S2: usize>( &self, hash: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare whether two fuzzy hashes are equivalent.
sourcepub unsafe fn score_cap_on_block_hash_comparison_unchecked(
log_block_size: u8,
len_block_hash_lhs: u8,
len_block_hash_rhs: u8
) -> u32
Available on crate feature unchecked only.
pub unsafe fn score_cap_on_block_hash_comparison_unchecked( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8 ) -> u32
unchecked only.Returns the “score cap” for a given block size and two block hash lengths, assuming that block size is small enough so that an arithmetic overflow will not occur.
Safety
If log_block_size is equal to or larger than
FuzzyHashCompareTarget::LOG_BLOCK_SIZE_CAPPING_BORDER
and/or both lengths are too large, it may cause an
arithmetic overflow and return an useless value.
sourcepub fn score_cap_on_block_hash_comparison(
log_block_size: u8,
len_block_hash_lhs: u8,
len_block_hash_rhs: u8
) -> u32
pub fn score_cap_on_block_hash_comparison( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8 ) -> u32
Returns the “score cap” for a given block size and two block hash lengths.
The internal block hash comparison method “caps” the score to prevent exaggregating the matches that are not meaningful enough. This behavior depends on the block size (the “cap” gets higher as the block size gets higher) and the minimum of block hash lengths.
The result is not always guaranteed to be in 0..=100 but 100 or
higher means that we don’t need any score capping.
source§impl FuzzyHashCompareTarget
impl FuzzyHashCompareTarget
sourcepub fn is_valid(&self) -> bool
pub fn is_valid(&self) -> bool
Performs full validity checking of the internal structure.
The primary purpose of this is debugging and it should always
return true unless…
- There is a bug in this crate, corrupting this structure or
- A memory corruption is occurred somewhere else.
Because of its purpose, this method is not designed to be fast.
sourcepub unsafe fn compare_unequal_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_near_eq_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearEq.
Safety
- Both fuzzy hashes must be different.
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If they are not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_eq<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Slow: Compare two fuzzy hashes assuming both are different and
their block sizes have a relation of BlockSizeRelation::NearEq.
Usage Constraints
- Both fuzzy hashes must be different.
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
Performance Consideration
This method’s performance is not good enough (because of constraint checking).
Use those instead:
compare_near_eq()(safe Rust)compare_unequal_near_eq_unchecked()(unsafe Rust)
sourcepub unsafe fn compare_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_near_eq_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming their block sizes have
a relation of BlockSizeRelation::NearEq.
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If the condition above is not satisfied, it will return a meaningless score.
sourcepub fn compare_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_near_eq<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming their block sizes have
a relation of BlockSizeRelation::NearEq.
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
sourcepub unsafe fn compare_unequal_near_lt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_near_lt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearLt.
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
If they are not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal_near_lt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_lt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearLt.
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
sourcepub unsafe fn compare_unequal_near_gt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_near_gt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearGt.
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
If they are not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal_near_gt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_gt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearGt.
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
sourcepub unsafe fn compare_unequal_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn compare_unequal_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Compare two normalized fuzzy hashes assuming both are different.
Safety
- Both fuzzy hashes (
selfandother) must be different.
If the condition above is not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Slow: Compare two normalized fuzzy hashes assuming both are different.
Usage Constraints
- Both fuzzy hashes (
selfandother) must be different.
Performance Consideration
This method’s performance is not good enough (because of the constraint checking).
Use those instead:
compare()(safe Rust)compare_unequal_unchecked()(unsafe Rust)
sourcepub fn compare<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compares two normalized fuzzy hashes.
sourcepub unsafe fn is_comparison_candidate_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn is_comparison_candidate_near_eq_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearEq.
See also: is_comparison_candidate()
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If the condition above is not satisfied, it will return a meaningless value.
sourcepub fn is_comparison_candidate_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate_near_eq<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearEq.
See also: is_comparison_candidate()
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
sourcepub unsafe fn is_comparison_candidate_near_lt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn is_comparison_candidate_near_lt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearLt.
See also: is_comparison_candidate()
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
If the condition above is not satisfied, it will return a meaningless value.
sourcepub fn is_comparison_candidate_near_lt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate_near_lt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearLt.
See also: is_comparison_candidate()
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
sourcepub unsafe fn is_comparison_candidate_near_gt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unchecked only.
pub unsafe fn is_comparison_candidate_near_gt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unchecked only.Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearGt.
See also: is_comparison_candidate()
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
If the condition above is not satisfied, it will return a meaningless value.
sourcepub fn is_comparison_candidate_near_gt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate_near_gt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison
assuming that their block sizes have a relation of
BlockSizeRelation::NearGt.
See also: is_comparison_candidate()
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
sourcepub fn is_comparison_candidate<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_comparison_candidate<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Tests whether other is a candidate for edit distance-based comparison.
If this function returns false and self and other are not
equivalent, their similarity will be calculated to 0.
Use Case (Example)
This operation is useful to divide a set of unique (normalized) fuzzy hashes into smaller distinct sets. The similarity score can be non-zero if and only if two fuzzy hashes belong to the same set.
Safety (Warning)
This function (and its variants) can return false if self and
other are equivalent (the base fuzzy hash object of self and other
are the same and their similarity score is 100).
Because of this, we have to use a set of unique fuzzy hash values on the use case above to prevent false-negative matches.
Trait Implementations§
source§impl Clone for FuzzyHashCompareTarget
impl Clone for FuzzyHashCompareTarget
source§fn clone(&self) -> FuzzyHashCompareTarget
fn clone(&self) -> FuzzyHashCompareTarget
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more