Struct ssdeep::FuzzyHashCompareTarget
source · pub struct FuzzyHashCompareTarget { /* private fields */ }Expand description
An efficient position array-based fuzzy hash comparison target.
It can be built from a normalized FuzzyHashData object and represents
the normalized contents of two block hashes as two position arrays.
Although that this structure is large, it is particularly useful if
you compare many of fuzzy hashes and you can fix one of the operands
(this is usually over 10 times faster than batched fuzzy_compare calls
in ssdeep 2.13). Even if we generate this object each time we compare
two fuzzy hashes, it’s usually faster than fuzzy_compare in ssdeep 2.13.
See also: “Fuzzy Hash Comparison” section of FuzzyHashData
Examples (requires the alloc feature)
use ssdeep::{FuzzyHash, FuzzyHashCompareTarget};
// Brute force comparison
let hashes: Vec<FuzzyHash> = Vec::new();
/* ... add fuzzy hashes to `hashes` ... */
let mut target: FuzzyHashCompareTarget = FuzzyHashCompareTarget::new();
for hash1 in &hashes {
target.init_from(hash1);
for hash2 in &hashes {
let score = target.compare(hash2);
/* ... */
}
}Implementations§
source§impl FuzzyHashCompareTarget
impl FuzzyHashCompareTarget
sourcepub const MIN_LCS_FOR_BLOCKHASH: usize = 7usize
pub const MIN_LCS_FOR_BLOCKHASH: usize = 7usize
The minimum length of the common substring to compute edit distance between two block hashes.
To score similarity between two block hashes, ssdeep expects that
two block hashes are similar enough. In other words, ssdeep expects
that they have a common substring of a length
MIN_LCS_FOR_BLOCKHASH or longer
to reduce the possibility of false matches by chance.
Finding such common substrings is a special case of finding a longest common substring (LCS).
For instance, those two strings:
+r/kcOpEYXB+0ZJ7ocOpEYXB+0ZF29
have a common substring cOpEYXB+0Z (length 10), long enough
(≧ MIN_LCS_FOR_BLOCKHASH)
to compute the edit distance to compute the similarity score.
Specifically, ssdeep requires a common substring of a length 7 to compute a similarity score. Otherwise, the block hash comparison method returns zero (meaning, not similar).
Incompatibility Notice
This constant is deprecated on v0.2.0 and going to be removed on v0.3.0 (or v1.0.0 if stabilized).
sourcepub const LOG_BLOCK_SIZE_CAPPING_BORDER: u8 = 4u8
pub const LOG_BLOCK_SIZE_CAPPING_BORDER: u8 = 4u8
The lower bound (inclusive) of the base-2 logarithm form of the block size in which the score capping is no longer required.
If log_block_size is equal to or larger than this value and
len1 and len2 are at least
MIN_LCS_FOR_BLOCKHASH in size,
Self::score_cap_on_block_hash_comparison(log_block_size, len1, len2)
is guaranteed to be 100 or greater.
The score “cap” is computed as
(1 << log_block_size) * min(len1, len2).
If this always guaranteed to be 100 or greater,
capping the score is not longer required.
Backgrounds
Theorem
For all positive integers a, b and c, a <= b * c iff
(a + b - 1) / b <= c (where ceil(a/b) == (a + b - 1) / b).
This is proven by Z3 and (partially) Coq in the source code:
- Z3 + Python:
dev/prover/compare/blocksize_capping_theorem.py - Coq (uses existing ceiling function instead of
(a + b - 1) / b):
dev/prover/compare/blocksize_capping_theorem.v
The Minimum Score Cap
This is expressed as (1 << log_block_size) * MIN_LCS_FOR_BLOCKHASH
because both block hashes must at least as long as
MIN_LCS_FOR_BLOCKHASH to perform
edit distance-based scoring.
Computing the Constant
Applying the theorem above,
100 <= (1 << log_block_size) * MIN_LCS_FOR_BLOCKHASH
is equivalent to
(100 + MIN_LCS_FOR_BLOCKHASH - 1) / MIN_LCS_FOR_BLOCKHASH <= (1 << log_block_size).
This leads to the expression to define this constant.
sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new FuzzyHashCompareTarget object with empty contents.
This is equivalent to the fuzzy hash string 3::.
sourcepub fn log_block_size(&self) -> u8
pub fn log_block_size(&self) -> u8
The base-2 logarithm form of the comparison target’s block size.
See also: “Block Size” section of FuzzyHashData
sourcepub fn block_size(&self) -> u32
pub fn block_size(&self) -> u32
The block size of the comparison target.
sourcepub fn init_from<const S1: usize, const S2: usize>(
&mut self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>
)where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn init_from<const S1: usize, const S2: usize>( &mut self, hash: impl AsRef<FuzzyHashData<S1, S2, true>> )where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Initialize the object from a given fuzzy hash.
sourcepub fn is_equiv<const S1: usize, const S2: usize>(
&self,
hash: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> boolwhere
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn is_equiv<const S1: usize, const S2: usize>( &self, hash: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare whether two fuzzy hashes are equivalent.
sourcepub unsafe fn score_cap_on_block_hash_comparison_unchecked(
log_block_size: u8,
len_block_hash_lhs: u8,
len_block_hash_rhs: u8
) -> u32
Available on crate feature unsafe only.
pub unsafe fn score_cap_on_block_hash_comparison_unchecked( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8 ) -> u32
unsafe only.Returns the “score cap” for a given block size and two block hash lengths, assuming that block size is small enough so that an arithmetic overflow will not occur.
Safety
If log_block_size is equal to or larger than
FuzzyHashCompareTarget::LOG_BLOCK_SIZE_CAPPING_BORDER,
and/or both lengths are too large, it may cause an
arithmetic overflow and return an useless value.
sourcepub fn score_cap_on_block_hash_comparison(
log_block_size: u8,
len_block_hash_lhs: u8,
len_block_hash_rhs: u8
) -> u32
pub fn score_cap_on_block_hash_comparison( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8 ) -> u32
Returns the “score cap” for a given block size and two block hash lengths.
The internal block hash comparison method “caps” the score to prevent exaggregating the matches that are not meaningful enough. This behavior depends on the block size (the “cap” gets higher as the block size gets higher) and the minimum of block hash lengths.
The result is not always guaranteed to be in 0..=100 but 100 or
higher means that we don’t need any score capping.
source§impl FuzzyHashCompareTarget
impl FuzzyHashCompareTarget
sourcepub fn is_valid(&self) -> bool
pub fn is_valid(&self) -> bool
Performs full validity checking of the internal structure.
The primary purpose of this is debugging and it should always
return true unless…
- There is a bug in this crate, corrupting this structure or
- A memory corruption is occurred somewhere else.
Because of its purpose, this method is not designed to be fast.
sourcepub unsafe fn compare_unequal_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unsafe only.
pub unsafe fn compare_unequal_near_eq_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unsafe only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearEq.
Safety
- Both fuzzy hashes must be different.
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If they are not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_eq<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
SLOW: Compare two fuzzy hashes assuming both are different and
their block sizes have a relation of BlockSizeRelation::NearEq.
Usage Constraints
- Both fuzzy hashes must be different.
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
Performance Consideration
This method’s performance is not good enough (because of constraint checking).
Use those instead:
compare_near_eq(safe Rust)compare_unequal_near_eq_unchecked(unsafe Rust)
sourcepub unsafe fn compare_near_eq_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unsafe only.
pub unsafe fn compare_near_eq_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unsafe only.Compare two fuzzy hashes assuming their block sizes have
a relation of BlockSizeRelation::NearEq.
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
If the condition above is not satisfied, it will return a meaningless score.
sourcepub fn compare_near_eq<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_near_eq<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming their block sizes have
a relation of BlockSizeRelation::NearEq.
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearEq.
sourcepub unsafe fn compare_unequal_near_lt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unsafe only.
pub unsafe fn compare_unequal_near_lt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unsafe only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearLt.
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
If they are not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal_near_lt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_lt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearLt.
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearLt.
sourcepub unsafe fn compare_unequal_near_gt_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unsafe only.
pub unsafe fn compare_unequal_near_gt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unsafe only.Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearGt.
Safety
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
If they are not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal_near_gt<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal_near_gt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compare two fuzzy hashes assuming both are different and their
block sizes have a relation of BlockSizeRelation::NearGt.
Usage Constraints
- Both fuzzy hashes (
selfandother) must have block size relation ofBlockSizeRelation::NearGt.
sourcepub unsafe fn compare_unequal_unchecked<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Available on crate feature unsafe only.
pub unsafe fn compare_unequal_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
unsafe only.Compare two normalized fuzzy hashes assuming both are different.
Safety
- Both fuzzy hashes (
selfandother) must be different.
If the condition above is not satisfied, it will return a meaningless score.
sourcepub fn compare_unequal<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare_unequal<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
SLOW: Compare two normalized fuzzy hashes assuming both are different.
Usage Constraints
- Both fuzzy hashes (
selfandother) must be different.
Performance Consideration
This method’s performance is not good enough (because of the constraint checking).
Use those instead:
compare(safe Rust)compare_unequal_unchecked(unsafe Rust)
sourcepub fn compare<const S1: usize, const S2: usize>(
&self,
other: impl AsRef<FuzzyHashData<S1, S2, true>>
) -> u32where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn compare<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Compares two normalized fuzzy hashes.