pub struct FuzzyHashCompareTarget { /* private fields */ }
Expand description

An efficient position array-based fuzzy hash comparison target.

It can be built from a normalized FuzzyHashData object and represents the normalized contents of two block hashes as two position arrays.

Although that this structure is large, it is particularly useful if you compare many of fuzzy hashes and you can fix one of the operands (this is usually over 10 times faster than batched fuzzy_compare calls in ssdeep 2.13). Even if we generate this object each time we compare two fuzzy hashes, it’s usually faster than fuzzy_compare in ssdeep 2.13.

See also: “Fuzzy Hash Comparison” section of FuzzyHashData

Examples (requires the alloc feature)

use ssdeep::{FuzzyHash, FuzzyHashCompareTarget};

// Brute force comparison
let hashes: Vec<FuzzyHash> = Vec::new();
/* ... add fuzzy hashes to `hashes` ... */

let mut target: FuzzyHashCompareTarget = FuzzyHashCompareTarget::new();
for hash1 in &hashes {
    target.init_from(hash1);
    for hash2 in &hashes {
        let score = target.compare(hash2);
        /* ... */
    }
}

Implementations§

source§

impl FuzzyHashCompareTarget

source

pub const MIN_LCS_FOR_BLOCKHASH: usize = 7usize

The minimum length of the common substring to compute edit distance between two block hashes.

To score similarity between two block hashes, ssdeep expects that two block hashes are similar enough. In other words, ssdeep expects that they have a common substring of a length MIN_LCS_FOR_BLOCKHASH or longer to reduce the possibility of false matches by chance.

Finding such common substrings is a special case of finding a longest common substring (LCS).

For instance, those two strings:

  • +r/kcOpEYXB+0ZJ
  • 7ocOpEYXB+0ZF29

have a common substring cOpEYXB+0Z (length 10), long enough (≧ MIN_LCS_FOR_BLOCKHASH) to compute the edit distance to compute the similarity score.

Specifically, ssdeep requires a common substring of a length 7 to compute a similarity score. Otherwise, the block hash comparison method returns zero (meaning, not similar).

Incompatibility Notice

This constant is deprecated on v0.2.0 and going to be removed on v0.3.0 (or v1.0.0 if stabilized).

source

pub const LOG_BLOCK_SIZE_CAPPING_BORDER: u8 = 4u8

The lower bound (inclusive) of the base-2 logarithm form of the block size in which the score capping is no longer required.

If log_block_size is equal to or larger than this value and len1 and len2 are at least MIN_LCS_FOR_BLOCKHASH in size, Self::score_cap_on_block_hash_comparison(log_block_size, len1, len2) is guaranteed to be 100 or greater.

The score “cap” is computed as (1 << log_block_size) * min(len1, len2). If this always guaranteed to be 100 or greater, capping the score is not longer required.

Backgrounds
Theorem

For all positive integers a, b and c, a <= b * c iff (a + b - 1) / b <= c (where ceil(a/b) == (a + b - 1) / b).

This is proven by Z3 and (partially) Coq in the source code:

  • Z3 + Python:
    dev/prover/compare/blocksize_capping_theorem.py
  • Coq (uses existing ceiling function instead of (a + b - 1) / b):
    dev/prover/compare/blocksize_capping_theorem.v
The Minimum Score Cap

This is expressed as (1 << log_block_size) * MIN_LCS_FOR_BLOCKHASH because both block hashes must at least as long as MIN_LCS_FOR_BLOCKHASH to perform edit distance-based scoring.

Computing the Constant

Applying the theorem above, 100 <= (1 << log_block_size) * MIN_LCS_FOR_BLOCKHASH is equivalent to (100 + MIN_LCS_FOR_BLOCKHASH - 1) / MIN_LCS_FOR_BLOCKHASH <= (1 << log_block_size).

This leads to the expression to define this constant.

source

pub fn new() -> Self

Creates a new FuzzyHashCompareTarget object with empty contents.

This is equivalent to the fuzzy hash string 3::.

source

pub fn log_block_size(&self) -> u8

The base-2 logarithm form of the comparison target’s block size.

See also: “Block Size” section of FuzzyHashData

source

pub fn block_size(&self) -> u32

The block size of the comparison target.

source

pub fn init_from<const S1: usize, const S2: usize>( &mut self, hash: impl AsRef<FuzzyHashData<S1, S2, true>> )where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Initialize the object from a given fuzzy hash.

source

pub fn is_equiv<const S1: usize, const S2: usize>( &self, hash: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> boolwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Compare whether two fuzzy hashes are equivalent.

source

pub unsafe fn score_cap_on_block_hash_comparison_unchecked( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8 ) -> u32

Available on crate feature unsafe only.

Returns the “score cap” for a given block size and two block hash lengths, assuming that block size is small enough so that an arithmetic overflow will not occur.

Safety

If log_block_size is equal to or larger than FuzzyHashCompareTarget::LOG_BLOCK_SIZE_CAPPING_BORDER, and/or both lengths are too large, it may cause an arithmetic overflow and return an useless value.

source

pub fn score_cap_on_block_hash_comparison( log_block_size: u8, len_block_hash_lhs: u8, len_block_hash_rhs: u8 ) -> u32

Returns the “score cap” for a given block size and two block hash lengths.

The internal block hash comparison method “caps” the score to prevent exaggregating the matches that are not meaningful enough. This behavior depends on the block size (the “cap” gets higher as the block size gets higher) and the minimum of block hash lengths.

The result is not always guaranteed to be in 0..=100 but 100 or higher means that we don’t need any score capping.

source§

impl FuzzyHashCompareTarget

source

pub fn is_valid(&self) -> bool

Performs full validity checking of the internal structure.

The primary purpose of this is debugging and it should always return true unless…

  1. There is a bug in this crate, corrupting this structure or
  2. A memory corruption is occurred somewhere else.

Because of its purpose, this method is not designed to be fast.

source

pub unsafe fn compare_unequal_near_eq_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Available on crate feature unsafe only.

Compare two fuzzy hashes assuming both are different and their block sizes have a relation of BlockSizeRelation::NearEq.

Safety
  • Both fuzzy hashes must be different.
  • Both fuzzy hashes (self and other) must have block size relation of BlockSizeRelation::NearEq.

If they are not satisfied, it will return a meaningless score.

source

pub fn compare_unequal_near_eq<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

SLOW: Compare two fuzzy hashes assuming both are different and their block sizes have a relation of BlockSizeRelation::NearEq.

Usage Constraints
  • Both fuzzy hashes must be different.
  • Both fuzzy hashes (self and other) must have block size relation of BlockSizeRelation::NearEq.
Performance Consideration

This method’s performance is not good enough (because of constraint checking).

Use those instead:

source

pub unsafe fn compare_near_eq_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Available on crate feature unsafe only.

Compare two fuzzy hashes assuming their block sizes have a relation of BlockSizeRelation::NearEq.

Safety

If the condition above is not satisfied, it will return a meaningless score.

source

pub fn compare_near_eq<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Compare two fuzzy hashes assuming their block sizes have a relation of BlockSizeRelation::NearEq.

Usage Constraints
source

pub unsafe fn compare_unequal_near_lt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Available on crate feature unsafe only.

Compare two fuzzy hashes assuming both are different and their block sizes have a relation of BlockSizeRelation::NearLt.

Safety

If they are not satisfied, it will return a meaningless score.

source

pub fn compare_unequal_near_lt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Compare two fuzzy hashes assuming both are different and their block sizes have a relation of BlockSizeRelation::NearLt.

Usage Constraints
source

pub unsafe fn compare_unequal_near_gt_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Available on crate feature unsafe only.

Compare two fuzzy hashes assuming both are different and their block sizes have a relation of BlockSizeRelation::NearGt.

Safety

If they are not satisfied, it will return a meaningless score.

source

pub fn compare_unequal_near_gt<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Compare two fuzzy hashes assuming both are different and their block sizes have a relation of BlockSizeRelation::NearGt.

Usage Constraints
source

pub unsafe fn compare_unequal_unchecked<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Available on crate feature unsafe only.

Compare two normalized fuzzy hashes assuming both are different.

Safety
  • Both fuzzy hashes (self and other) must be different.

If the condition above is not satisfied, it will return a meaningless score.

source

pub fn compare_unequal<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

SLOW: Compare two normalized fuzzy hashes assuming both are different.

Usage Constraints
  • Both fuzzy hashes (self and other) must be different.
Performance Consideration

This method’s performance is not good enough (because of the constraint checking).

Use those instead:

source

pub fn compare<const S1: usize, const S2: usize>( &self, other: impl AsRef<FuzzyHashData<S1, S2, true>> ) -> u32where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

Compares two normalized fuzzy hashes.

Trait Implementations§

source§

impl Debug for FuzzyHashCompareTarget

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for FuzzyHashCompareTarget

source§

fn default() -> Self

Returns the “default value” for a type. Read more
source§

impl<const S1: usize, const S2: usize> From<&FuzzyHashData<S1, S2, true>> for FuzzyHashCompareTargetwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

source§

fn from(value: &FuzzyHashData<S1, S2, true>) -> Self

Converts to this type from the input type.
source§

impl<const S1: usize, const S2: usize> From<FuzzyHashData<S1, S2, true>> for FuzzyHashCompareTargetwhere BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,

source§

fn from(value: FuzzyHashData<S1, S2, true>) -> Self

Converts to this type from the input type.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.