FuzzyHashData

Struct FuzzyHashData 

Source
#[repr(align(8))]
pub struct FuzzyHashData<const S1: usize, const S2: usize, const NORM: bool>{ /* private fields */ }
Expand description

An efficient fixed size fuzzy hash representation.

§Fuzzy Hash Internals

A fuzzy hash consists of four parts:

  1. Block size (reciprocal of average piece-splitting probability per byte on the block hash 1)

  2. Block hash 1. 6-bit hash (a block hash alphabet) per “piece”, variable-length up to block_hash::FULL_SIZE.

    The average piece-splitting probability is given as 1/block_size.

  3. Block hash 2. 6-bit hash (a block hash alphabet) per “piece”, variable-length up to either

    The average piece-splitting probability is given as 1/block_size/2.

  4. (optional) The input file name, which is ignored by the parser on this type.

This struct stores first three parts of a fuzzy hash.

You can see the following figure for an example:

196608:DfiQF5UWAC2qctjBemsqz7yHlHr4bMCE2J8Y:jBp/Fqz7mlHZCE2J8Y,"/usr/local/bin/rustc"
\____/|\__________________________________/|\________________/|\____________________/
 |    |            Block hash 1            |   Block hash 2   | File name (optional)
 |    |                                    |                  |
 |    +-- (sep:colon)                      +-- (sep:colon)    +-- (sep,comma (optional))
 |
 +-- Block size

§Block Size

In the example above, 1 / 196 608 is the average probability for piece-splitting per byte on the block hash 1. On the block hash 2, the probability is 1 / 393 216 per byte, half of the probability on the block hash 1.

Since ssdeep uses a 32-bit hash function to decide whether to perform a piece-splitting, this probability will get inaccurate as the block size gets larger.

There is an important property of the block size: all valid block sizes can be represented as block_size::MIN * 2n (a power of two where n ≧ 0).

In this crate, the block size is stored as n (the base-2 logarithm form of the block size) for higher efficiency. log_block_size() method returns this raw representation. If you need an actual block size as used in the string representation, block_size() can be used instead.

§Block Hashes

A fuzzy hash has two block hashes (1 and 2).

They are variable-length fields that store an array of 6-bit “piece” hash values (represented as Base64 characters in the string representation and internally stored as Base64 indices).

§Relations with Block Size

The reason a fuzzy hash having two block hashes is, to enable comparing fuzzy hashes with similar block sizes (but not too far).

In principle, we can only compare block hashes with the same effective block size directly. Think following fuzzy hash for example:

6144:SIsMYod+X3oI+YnsMYod+X3oI+YZsMYod+X3oI+YLsMYod+X3oI+YQ:Z5d+X395d+X3X5d+X315d+X3+
     \____________________________________________________/ \_______________________/
      Block hash 1                                                      Block hash 2
      (effective block size: 6144)                      (effective block size: 12288)
                                                               [*] 12288 == 6144 * 2

You can easily compare it with another fuzzy hash with the same block size (but actual block hash similarity scoring only occurs after checking common substring).

Unaligned:
[A] 6144:SIsMYod+X3oI+YnsMYod+X3oI+YZsMYod+X3oI+YLsMYod+X3oI+YQ:Z5d+X395d+X3X5d+X315d+X3+
[B] 6144:SAsMYod+X3oI+YEWnnsMYod+X3oI+Y5sMYod+X3oI+YLsMYod+X3oI+YQ:H5d+X36WnL5d+X3v5d+X315d+X3+

Aligned:
[A] 6144:SIsMYod+X3oI+YnsMYod+X3oI+YZsMYod+X3oI+YLsMYod+X3oI+YQ   :Z5d+X395d+X3X5d+X315d+X3+
[B] 6144:SAsMYod+X3oI+YEWnnsMYod+X3oI+Y5sMYod+X3oI+YLsMYod+X3oI+YQ:H5d+X36WnL5d+X3v5d+X315d+X3+
         \_______________________________________________________/ \__________________________/
                               Comparison 1                                Comparison 2
                      (score([A1], [B1], 6144) = 94)            (score([A2], [B2], 12288) = 85)

score_final([A], [B], 6144) = max(94, 85) = 94

The final similarity score is the maximum of two block hash comparisons (note that the score will be capped on small effective block sizes to prevent exaggeration of matches).

If you have two fuzzy hashes with different block sizes but they are near enough, we can still perform a block hash comparison.

Unaligned:
[A] 3072:S+IiyfkMY+BES09JXAnyrZalI+YuyfkMY+BES09JXAnyrZalI+YQ:S+InsMYod+X3oI+YLsMYod+X3oI+YQ
[B] 6144:SIsMYod+X3oI+YnsMYod+X3oI+YZsMYod+X3oI+YLsMYod+X3oI+YQ:Z5d+X395d+X3X5d+X315d+X3+
[C] 12288:Z5d+X3pz5d+X3985d+X3X5d+X315d+X3+:1+Jr+d++H+5+e

Aligned:
[A] 3072 :S+IiyfkMY+BES09JXAnyrZalI+YuyfkMY+BES09JXAnyrZalI+YQ:S+InsMYod+X3oI+YLsMYod+X3oI+YQ
[B] 6144 :                                                     SIsMYod+X3oI+YnsMYod+X3oI+YZsMYod+X3oI+YLsMYod+X3oI+YQ:Z5d+X395d+X3X5d+X315d+X3+
[C] 12288:                                                                                                            Z5d+X3pz5d+X3985d+X3X5d+X315d+X3+:1+Jr+d++H+5+e
          \__________________________________________________/ \____________________________________________________/ \_______________________________/ \___________/
           Eff.B.S.=3072                                        Eff.B.S.=6144                                          Eff.B.S.=12288                    Eff.B.S.=24576
                                                                Comparison between [A2] and [B1]                       Comparison between [B2] and [C1]
                                                                (score([A2], [B1], 6144) = 72)                         (score([B2], [C1], 12288) = 88)

score_final([A], [B], 3072) = score([A2], [B1],  6144) = 72
score_final([B], [C], 6144) = score([B2], [C1], 12288) = 88
score_final([A], [C], 3072) = 0 (since there's no block hashes to compare)

Such cases are handled with BlockSizeRelation and block_size utility functions. We can outline the relation in the table below. Note that each (effective) block size is denoted as “Actual raw block size (block size in base-2 logarithm)”.

Left (lhs)Right (rhs)Relation
3072 (10)6144 (11)NearLt
6144 (11)3072 (10)NearGt
6144 (11)6144 (11)NearEq
6144 (11)12288 (12)NearLt
12288 (12)6144 (11)NearGt
3072 (10)12288 (12)Far

On highly optimized clustering applications, being aware of the block size relation will be crucial.

See also: BlockSizeRelation

§Normalization

To prevent exaggerating the comparison score from repeating patterns, ssdeep processes each block hash before comparison so that a sequence consisting of the same character longer than block_hash::MAX_SEQUENCE_SIZE cannot exist.

For instance, after processing a block hash 122333444455555 before comparison, it is converted to 122333444555 (four 4s and five 5s are shortened into three 4s and three 5s because block_hash::MAX_SEQUENCE_SIZE is defined to be three (3)).

In this crate, this process is called normalization.

ssdeep normally generates (as well as Generator) not normalized, raw fuzzy hashes. So, making a distinction between normalized and raw forms are important.

§The Strict Parser

If the strict-parser feature is enabled, parsers for fuzzy hashing types will reject ones that would cause an error on the raw variant but not on the normalized variant (on the default parser i.e. if this feature is disabled).

Enabling this feature comes with a cost in performance but it will make the parser less confusing (if either of the variants accepts a string, another will).

§Truncation

ssdeep normally generates (as well as Generator) truncated fuzzy hashes. In the truncated fuzzy hash, length of block hash 2 is limited to block_hash::HALF_SIZE, half of the maximum length of block hash 1 (block_hash::FULL_SIZE).

While libfuzzy allows generating non-truncated, long fuzzy hashes, they are typically useless. So, most operations are performed in short, truncated fuzzy hashes by default. Short variants of FuzzyHashData is smaller than longer variants so it can be used to reduce memory footprint.

§Warning: Truncation is not just “Truncation”

Truncated (regular) fuzzy hashes are not literally “truncated” from the long, non-truncated fuzzy hashes (but individually generated).

For instance (/usr/libexec/geoclue on Ubuntu 23.10):

                                                                                                                                  v
Non-truncated (long):        6144:M5/qVhAWFfzlpxdJ/YQINNbZ2cQpn77+Ptn+7ADOeb8Gj+OK8o4u1TzxwBf71C3O:M5/qzAWFfzlpxdJ/YQINNbZ2cQpn77+Ptn+7ADOeb8Gj+OK8o4u1TzxwBf71ETfJ
Truncated (short / regular): 6144:M5/qVhAWFfzlpxdJ/YQINNbZ2cQpn77+Ptn+7ADOeb8Gj+OK8o4u1TzxwBf71C3O:M5/qzAWFfzlpxdJ/YQINNbZ2cQpn77+i

Beware that the 32nd character of the block hash 2 are different (P and i). This is because the last character of a block hash may contain the information after all other individually stored pieces.

§Fuzzy Hash Comparison

For the basic concept of the comparison, see the “Relations with Block Size” section.

In this section, we describe the full comparison algorithm.

  1. If two normalized hashes A and B are completely the same, the similarity score is 100 (a perfect match) no matter what.

    This case is not subject to the edit distance-based scoring. For instance, FuzzyHashCompareTarget::is_comparison_candidate() may return false on such cases.

    So, this case must be handled separately.

  2. For each block hash pair (in which the effective block size match), compute the sub-similarity score (between bhA and bhB) as follows:

    1. Search for a common substring of the length of block_hash::MIN_LCS_FOR_COMPARISON or longer.

      If we could not find one, the sub-similarity score is 0 and no edit distance-based scoring is performed.

      Note: if we could find one (i.e. can perform edit distance-based comparison), the sub-similarity score (and the final score) is guaranteed to be greater than zero. That means we won’t need to split a cluster (on single-linkage clustering) if all unique elements in the cluster are directly or indirectly connected by “candidate of edit distance-based comparison” relations.

    2. Compute the edit distance between two block hashes and scale it

      • from 0..=(bhA.len()+bhB.len()) (0 is the perfect match)
      • to 0..=100 (100 is the perfect match).

      Note: this scaling takes multiple steps (for a historical reason) and see the source code for the exact behavior (including rounding-related one).

    3. For small effective block sizes, cap the score to prevent exaggerating the matches.

  3. Take the maximum of sub-similarity scores (0 if there’s no sub-similarity scores i.e. block sizes are far).

For actual comparison, a FuzzyHashCompareTarget object or corresponding half-baked object is used.

See FuzzyHashCompareTarget for details.

Implementations§

Source§

impl<const S1: usize, const S2: usize> FuzzyHashData<S1, S2, true>

Additional implementation for normalized fuzzy hashes, enabling comparison between two fuzzy hashes directly.

Source

pub fn compare(&self, other: impl AsRef<Self>) -> u32

Compare two fuzzy hashes and retrieves the similarity score.

Source

pub unsafe fn compare_unequal_unchecked(&self, other: impl AsRef<Self>) -> u32

Available on crate feature unchecked only.

Compare two fuzzy hashes assuming both are different.

§Safety
  • self and other must be different.

If the condition above is not satisfied, it will return a meaningless score.

Source

pub fn compare_unequal(&self, other: impl AsRef<Self>) -> u32

Slow: Compare two fuzzy hashes assuming both are different.

§Usage Constraints
  • self and other must be different.
§Performance Consideration

This method’s performance is not good enough (because of constraint checking).

Use those instead:

Source§

impl<const S1: usize, const S2: usize, const NORM: bool> FuzzyHashData<S1, S2, NORM>

Implementation for all variants of fuzzy hashes.

Constants and methods below are available on all variants of fuzzy hashes.

Source

pub const MAX_BLOCK_HASH_SIZE_1: usize = S1

The maximum size of the block hash 1.

This value is always block_hash::FULL_SIZE.

Source

pub const MAX_BLOCK_HASH_SIZE_2: usize = S2

The maximum size of the block hash 2.

This value is either block_hash::HALF_SIZE or block_hash::FULL_SIZE.

Source

pub const IS_NORMALIZED_FORM: bool = NORM

Denotes whether the fuzzy type only contains a normalized form.

Source

pub const IS_LONG_FORM: bool

Denotes whether the fuzzy type can contain a non-truncated fuzzy hash.

It directly corresponds to MAX_BLOCK_HASH_SIZE_2.

Source

pub const MAX_LEN_IN_STR: usize

The maximum length in the string representation.

This is the maximum possible value of the len_in_str() method.

Note that, this value does not count the file name part of the fuzzy hash (not even an optional “comma” character separating the file name part) because len_in_str() does not.

Source

pub fn new() -> Self

Creates a new fuzzy hash object with empty contents.

This is equivalent to the fuzzy hash string 3::.

Source

pub unsafe fn init_from_internals_raw_unchecked( &mut self, log_block_size: u8, block_hash_1: &[u8; S1], block_hash_2: &[u8; S2], block_hash_1_len: u8, block_hash_2_len: u8, )

Available on crate feature unchecked only.

Initialize the fuzzy hash object with internal contents (raw).

§Safety
  • Valid range of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • Invalid ranges of block_hash_1 and block_hash_2 must be filled with zeroes.
  • block_hash_1_len and block_hash_2_len must be valid.
  • log_block_size must hold a valid base-2 logarithm form of a block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.

If they are not satisfied, the resulting object is corrupted.

Source

pub fn init_from_internals_raw( &mut self, log_block_size: u8, block_hash_1: &[u8; S1], block_hash_2: &[u8; S2], block_hash_1_len: u8, block_hash_2_len: u8, )

Initialize the fuzzy hash object with internal contents (raw).

Because this function assumes that you know the fuzzy hash internals, it panics when you fail to satisfy fuzzy hash constraints.

§Usage Constraints
  • Valid range of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • Invalid ranges of block_hash_1 and block_hash_2 must be filled with zeroes.
  • block_hash_1_len and block_hash_2_len must be valid.
  • log_block_size must hold a valid base-2 logarithm form of a block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.
Source

pub unsafe fn new_from_internals_raw_unchecked( log_block_size: u8, block_hash_1: &[u8; S1], block_hash_2: &[u8; S2], block_hash_1_len: u8, block_hash_2_len: u8, ) -> Self

Available on crate feature unchecked only.

Creates a new fuzzy hash object with internal contents (raw).

§Safety
  • Valid range of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • Invalid ranges of block_hash_1 and block_hash_2 must be filled with zeroes.
  • block_hash_1_len and block_hash_2_len must be valid.
  • log_block_size must hold a valid base-2 logarithm form of a block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.

If they are not satisfied, the resulting object is corrupted.

Source

pub fn new_from_internals_raw( log_block_size: u8, block_hash_1: &[u8; S1], block_hash_2: &[u8; S2], block_hash_1_len: u8, block_hash_2_len: u8, ) -> Self

Creates a new fuzzy hash object with internal contents (raw).

Because this function assumes that you know the fuzzy hash internals, it panics when you fail to satisfy fuzzy hash constraints.

§Usage Constraints
  • Valid range of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • Invalid ranges of block_hash_1 and block_hash_2 must be filled with zeroes.
  • block_hash_1_len and block_hash_2_len must be valid.
  • log_block_size must hold a valid base-2 logarithm form of a block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.
Source

pub unsafe fn new_from_internals_near_raw_unchecked( log_block_size: u8, block_hash_1: &[u8], block_hash_2: &[u8], ) -> Self

Available on crate feature unchecked only.

Creates a new fuzzy hash object with internal contents (with raw block size).

§Safety
  • block_hash_1 and block_hash_2 must have valid lengths.
  • Elements of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • log_block_size must hold a valid base-2 logarithm form of a block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.

If they are not satisfied, the resulting object will be corrupted.

Source

pub fn new_from_internals_near_raw( log_block_size: u8, block_hash_1: &[u8], block_hash_2: &[u8], ) -> Self

Creates a new fuzzy hash object with internal contents (with raw block size).

Because this function assumes that you know the fuzzy hash internals, it panics when you fail to satisfy fuzzy hash constraints.

§Usage Constraints
  • block_hash_1 and block_hash_2 must have valid lengths.
  • Elements of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • log_block_size must hold a valid base-2 logarithm form of a block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.
Source

pub unsafe fn new_from_internals_unchecked( block_size: u32, block_hash_1: &[u8], block_hash_2: &[u8], ) -> Self

Available on crate feature unchecked only.

Creates a new fuzzy hash object with internal contents.

§Safety
  • block_hash_1 and block_hash_2 must have valid lengths.
  • Elements of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • block_size must hold a valid block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.

If they are not satisfied, the resulting object will be corrupted.

Source

pub fn new_from_internals( block_size: u32, block_hash_1: &[u8], block_hash_2: &[u8], ) -> Self

Creates a new fuzzy hash object with internal contents.

Because this function assumes that you know the fuzzy hash internals, it panics when you fail to satisfy fuzzy hash constraints.

§Usage Constraints
  • block_hash_1 and block_hash_2 must have valid lengths.
  • Elements of block_hash_1 and block_hash_2 must consist of valid Base64 indices.
  • block_size must hold a valid block size.
  • On the normalized variant, contents of block_hash_1 and block_hash_2 must be normalized.
Source

pub fn log_block_size(&self) -> u8

The base-2 logarithm form of the block size.

See also: “Block Size” section of FuzzyHashData

Source

pub fn block_size(&self) -> u32

The block size of the fuzzy hash.

Source

pub fn block_hash_1(&self) -> &[u8]

A reference to the block hash 1.

§Safety

You cannot modify a fuzzy hash while block hashes are borrowed through block_hash_1() or block_hash_2().

let mut hash: ssdeep::RawFuzzyHash = str::parse("3:aaaa:bbbb").unwrap();
let bh1 = hash.block_hash_1();
hash.normalize_in_place(); // <- ERROR: because the block hash 1 is borrowed.
// If normalize_in_place succeeds, bh1 will hold an invalid slice
// because the block hash 1 is going to be length 3 after the normalization.
assert_eq!(bh1.len(), 4);
Source

pub fn block_hash_1_as_array(&self) -> &[u8; S1]

A reference to the block hash 1 (in fixed-size array).

Elements that are not a part of the block hash are filled with zeroes.

See also: block_hash_1()

Source

pub fn block_hash_1_len(&self) -> usize

The length of the block hash 1.

See also: block_hash_1()

Source

pub fn block_hash_2(&self) -> &[u8]

A reference to the block hash 2.

§Safety

You cannot modify a fuzzy hash while block hashes are borrowed through block_hash_1() or block_hash_2().

let mut hash: ssdeep::RawFuzzyHash = str::parse("3:aaaa:bbbb").unwrap();
let bh2 = hash.block_hash_2();
hash.normalize_in_place(); // <- ERROR: because the block hash 2 is borrowed.
// If normalize_in_place succeeds, bh2 will hold an invalid slice
// because the block hash 2 is going to be length 3 after the normalization.
assert_eq!(bh2.len(), 4);
Source

pub fn block_hash_2_as_array(&self) -> &[u8; S2]

A reference to the block hash 2 (in fixed-size array).

Elements that are not a part of the block hash are filled with zeroes.

See also: block_hash_2()

Source

pub fn block_hash_2_len(&self) -> usize

The length of the block hash 2.

See also: block_hash_2()

Source

pub fn len_in_str(&self) -> usize

The length of this fuzzy hash in the string representation.

This is the exact size (bytes and characters) required to store the string representation corresponding this fuzzy hash object.

Source

pub fn to_string(&self) -> String

Available on crate feature alloc only.

Converts the fuzzy hash to the corresponding string representation.

Source

pub fn store_into_bytes( &self, buffer: &mut [u8], ) -> Result<usize, FuzzyHashOperationError>

Store the string representation of the fuzzy hash into the bytes. Returns whether the operation has succeeded.

If this method succeeds, it returns Ok(n) where n is the number of bytes written to buffer.

The only case this function will fail (returns an Err) is, when buffer does not have enough size to store string representation of the fuzzy hash. In this case, buffer is not overwritten.

Required size of the buffer is len_in_str() bytes. This required size is exact (buffer may be larger than that but never be shorter).

§Compatibility Note

Before version 0.3.0, the result type was Result<(), FuzzyHashOperationError>.

Additional usize in the version 0.3.0 will simplify handling the result and the semantics are now similar to e.g. std::io::Read::read().

Source

pub fn from_bytes_with_last_index( str: &[u8], index: &mut usize, ) -> Result<Self, ParseError>

Parse a fuzzy hash from given bytes (a slice of u8) of a string representation.

If the parser succeeds, it also updates the index argument to the first non-used index to construct the fuzzy hash, which is that of either the end of the string or the character ',' to separate the rest of the fuzzy hash and the file name field.

If the parser fails, index is not updated.

The behavior of this method is affected by the strict-parser feature. For more information, see The Strict Parser.

Source

pub fn from_bytes(str: &[u8]) -> Result<Self, ParseError>

Parse a fuzzy hash from given bytes (a slice of u8) of a string representation.

The behavior of this method is affected by the strict-parser feature. For more information, see The Strict Parser.

Source

pub fn is_normalized(&self) -> bool

Returns whether the fuzzy hash is normalized.

For a non-normalized fuzzy hash type (in raw form), it checks whether the fuzzy hash is already normalized.

Note that this method is only for convenience purposes and checking whether a fuzzy hash is normalized does not usually improve the performance.

Source

pub fn normalize_in_place(&mut self)

Normalize the fuzzy hash in place (or don’t, depending on the type normalization).

After calling this method, self will be normalized.

See also: “Normalization” section of FuzzyHashData

Source

pub fn normalize(&self) -> FuzzyHashData<S1, S2, true>

Converts the fuzzy hash to a normalized form (with normalization).

On the normalized variant, this is effectively a copy.

See also: “Normalization” section of FuzzyHashData

Source

pub fn clone_normalized(&self) -> Self

Clones the fuzzy hash with normalization but without changing a type.

On the normalized variant, this is effectively a clone.

Source

pub fn is_valid(&self) -> bool

Performs full validity checking of the internal structure.

The primary purpose of this is debugging and it should always return true unless…

  • There is a bug in this crate, corrupting this structure,
  • A memory corruption is occurred somewhere else or
  • An unsafe function to construct this object is misused.

Because of its purpose, this method is not designed to be fast.

Note that, despite that it is only relevant to users when the unchecked feature is enabled but made public without any features because this method is not unsafe or unchecked in any way.

§Safety: No Panic Guarantee

This method is guaranteed to be panic-free as long as the underlying memory region corresponding to self is sound. In other words, it won’t cause panic by itself if any data is contained in this object.

Source

pub fn full_eq(&self, other: &Self) -> bool

Performs full equality checking of the internal structure.

While PartialEq::eq() for this type is designed to be fast by ignoring non-block hash bytes, this method performs full equality checking, not ignoring “non-block hash” bytes.

The primary purpose of this is debugging and it should always return the same value as PartialEq::eq() result unless…

  • There is a bug in this crate, corrupting this structure,
  • A memory corruption is occurred somewhere else or
  • An unsafe function to construct this object is misused.

Because of its purpose, this method is not designed to be fast.

Note that, despite that it is only relevant to users when the unchecked feature is enabled but made public without any features because this method is not unsafe or unchecked in any way.

§Safety: No Panic Guarantee

This method is guaranteed to be panic-free as long as the underlying memory region corresponding to self is sound. In other words, it won’t cause panic by itself if any data is contained in this object.

Source

pub fn compare_block_sizes( lhs: impl AsRef<Self>, rhs: impl AsRef<Self>, ) -> BlockSizeRelation

Compare two block size values from given two fuzzy hashes to determine their block size relation.

Source

pub fn is_block_sizes_near(lhs: impl AsRef<Self>, rhs: impl AsRef<Self>) -> bool

Checks whether two block size values from given two fuzzy hashes form a near relation.

§Compatibility Notice

This method will be renamed to is_block_size_near() on the next major release, taking the first argument as a reference to self.

Source

pub fn is_block_sizes_near_eq( lhs: impl AsRef<Self>, rhs: impl AsRef<Self>, ) -> bool

Checks whether two block size values from given two fuzzy hashes form a BlockSizeRelation::NearEq relation.

§Compatibility Notice

This method will be renamed to is_block_size_near_eq() on the next major release, taking the first argument as a reference to self.

Source

pub fn is_block_sizes_near_lt( lhs: impl AsRef<Self>, rhs: impl AsRef<Self>, ) -> bool

Checks whether two block size values from given two fuzzy hashes form a BlockSizeRelation::NearLt relation.

§Compatibility Notice

This method will be renamed to is_block_size_near_lt() on the next major release, taking the first argument as a reference to self.

Source

pub fn is_block_sizes_near_gt( lhs: impl AsRef<Self>, rhs: impl AsRef<Self>, ) -> bool

Checks whether two block size values from given two fuzzy hashes form a BlockSizeRelation::NearGt relation.

§Compatibility Notice

This method will be renamed to is_block_size_near_gt() on the next major release, taking the first argument as a reference to self.

Source

pub fn cmp_by_block_size(&self, other: &Self) -> Ordering

Compare two fuzzy hashes only by their block sizes.

Source§

impl<const S1: usize, const S2: usize> FuzzyHashData<S1, S2, true>

Implementation of normalized fuzzy hashes.

Methods below are available on normalized fuzzy hashes (FuzzyHash or LongFuzzyHash).

Source

pub fn block_hash_1_windows(&self) -> Windows<'_, u8>

Windows representing normalized substrings suitable for filtering block hashes to match (block hash 1).

To compare two normalized block hashes with the same effective block size, the scoring function requires that two strings contain a common substring with a length of block_hash::MIN_LCS_FOR_COMPARISON.

This method provides an access to substrings of that length, allowing the specialized clustering application to filter fuzzy hashes to compare prior to actual comparison. It makes possible to implement a function equivalent to FuzzyHashCompareTarget::is_comparison_candidate() with pre-computation.

Note: This is particularly useful for large scale clustering because there is a guarantee that the final similarity score is greater than zero if we have a common substring. So, finding a common substring is a fundamental operation to split a set of unique fuzzy hashes into disjoint sets of single-linkage clusters (two elements in the same set may (or may not) be a member of a cluster with a non-zero similarity but elements in the different set cannot).

For instance, you may store fuzzy hashes indexed by the elements of this window.

§Example (pseudo code)
use ssdeep::FuzzyHash;

// Fuzzy hash index in the database
struct FuzzyHashIndex(u64);

// It generates the index of corresponding fuzzy hash.
fn get_idx_of_fuzzy_hash(hash: &FuzzyHash) -> FuzzyHashIndex { /* ... */ }

// It stores a fuzzy hash with keys (with duplicates) like this:
//     db_entries(log_block_size, substring).add(hash_index)
// ... to enable later filtering.
fn insert_to_database(key: (u8, &[u8]), value: &FuzzyHashIndex) { /* ... */ }

// let hash_str = ...;
let hash: FuzzyHash = str::parse(hash_str).unwrap();
let idx: FuzzyHashIndex = get_idx_of_fuzzy_hash(&hash);
for window in hash.block_hash_1_windows() {
    insert_to_database((hash.log_block_size(), window), &idx);
}
for window in hash.block_hash_2_windows() {
    insert_to_database((hash.log_block_size() + 1, window), &idx);
}
Source

pub fn block_hash_1_numeric_windows(&self) -> NumericWindows<'_>

Windows representing normalized substrings, converted to unique numeric value (block hash 1).

This is very similar to block_hash_1_windows() but each window is a numeric value corresponding each substring.

See also: block_hash::NumericWindows

Source

pub fn block_hash_1_index_windows(&self) -> IndexWindows<'_>

Windows representing normalized substrings with effective block size, converted to unique numeric value (block hash 1).

This is very similar to block_hash_1_numeric_windows() except that each window contains block hash 1’s effective block size (base-2 logarithm form of the block size of the hash).

See also: block_hash::IndexWindows

§Example (pseudo code)
use ssdeep::FuzzyHash;

// Fuzzy hash index in the database
struct FuzzyHashIndex(u64);

// It generates the index of corresponding fuzzy hash.
fn get_idx_of_fuzzy_hash(hash: &FuzzyHash) -> FuzzyHashIndex { /* ... */ }

// It stores a fuzzy hash with keys (with duplicates) like this:
//     db_entries(concat(log_block_size, substring)).add(hash_index)
// ... to enable later filtering.
fn insert_to_database(key: u64, value: &FuzzyHashIndex) { /* ... */ }

// let hash_str = ...;
let hash: FuzzyHash = str::parse(hash_str).unwrap();
let idx: FuzzyHashIndex = get_idx_of_fuzzy_hash(&hash);
for window in hash.block_hash_1_index_windows() {
    insert_to_database(window, &idx);
}
for window in hash.block_hash_2_index_windows() {
    insert_to_database(window, &idx);
}

Compared to numeric windows, the effective block size is embedded in the index windows. That makes writing ssdeep database easier.

§Effectively Deprecated from the Start

This is a preview of a feature in the next major release. Because block hash handling functions are bloating, the next version will introduce basic block hash proxy object.

For instance, hash.block_hash_1_index_windows() will turn into something like: hash.block_hash_1().index_windows().

The only reason this function is not marked deprecated is, all block hash functions will change in the next major release and deprecating all of them gives the developer wrong impressions (it doesn’t and won’t have non-deprecated interface in v0.3.x anyway).

Source

pub fn block_hash_2_windows(&self) -> Windows<'_, u8>

Windows representing substrings suitable for filtering block hashes to match (block hash 2).

See also: block_hash_1_windows()

Source

pub fn block_hash_2_numeric_windows(&self) -> NumericWindows<'_>

Windows representing normalized substrings, converted to unique numeric value (block hash 2).

This is very similar to block_hash_2_windows() but each window is a numeric value corresponding each substring.

See also: block_hash::NumericWindows

Source

pub fn block_hash_2_index_windows(&self) -> IndexWindows<'_>

Windows representing normalized substrings with effective block size, converted to unique numeric value (block hash 2).

This is very similar to block_hash_2_numeric_windows() except that each window contains block hash 2’s effective block size (one larger than base-2 logarithm form of the block size of the hash) at the top.

See also:

§Effectively Deprecated from the Start

This is a preview of a feature in the next major release. Because block hash handling functions are bloating, the next version will introduce basic block hash proxy object.

For instance, hash.block_hash_2_index_windows() will turn into something like: hash.block_hash_2().index_windows().

The only reason this function is not marked deprecated is, all block hash functions will change in the next major release and deprecating all of them gives the developer wrong impressions (it doesn’t and won’t have non-deprecated interface in v0.3.x anyway).

Source

pub fn from_raw_form(source: &FuzzyHashData<S1, S2, false>) -> Self

Converts the fuzzy hash from a raw form, normalizing it.

Source

pub fn to_raw_form(&self) -> FuzzyHashData<S1, S2, false>

Converts the fuzzy hash to a raw form.

Source

pub fn into_mut_raw_form(&self, dest: &mut FuzzyHashData<S1, S2, false>)

Copy the fuzzy hash to another (output is a raw form).

Source§

impl<const S1: usize, const S2: usize> FuzzyHashData<S1, S2, false>

Implementation of non-normalized fuzzy hashes (in raw form).

Methods below are available on non-normalized fuzzy hashes (RawFuzzyHash or LongRawFuzzyHash).

Source

pub fn from_normalized(source: &FuzzyHashData<S1, S2, true>) -> Self

Converts the fuzzy hash from a normalized form.

Source§

impl<const NORM: bool> FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::HALF_SIZE }, NORM>

Implementation of short fuzzy hashes.

Methods below are available on short (truncated) fuzzy hashes (FuzzyHash or RawFuzzyHash).

Source

pub fn to_long_form( &self, ) -> FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::FULL_SIZE }, NORM>

Converts the fuzzy hash to a long form.

Source

pub fn into_mut_long_form( &self, dest: &mut FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::FULL_SIZE }, NORM>, )

Copy the fuzzy hash to another (output is a long form).

Source§

impl<const NORM: bool> FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::FULL_SIZE }, NORM>

Implementation of long fuzzy hashes.

Methods below are available on long (non-truncated) fuzzy hashes (LongFuzzyHash or LongRawFuzzyHash).

Source

pub fn from_short_form( source: &FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::HALF_SIZE }, NORM>, ) -> Self

Converts the fuzzy hash from a short, truncated form.

Source

pub fn try_into_mut_short( &self, dest: &mut FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::HALF_SIZE }, NORM>, ) -> Result<(), FuzzyHashOperationError>

Tries to copy the fuzzy hash to another (output is a short form).

Trait Implementations§

Source§

impl<const S1: usize, const S2: usize, const NORM: bool> AsRef<FuzzyHashData<S1, S2, NORM>> for FuzzyHashData<S1, S2, NORM>

Source§

fn as_ref(&self) -> &FuzzyHashData<S1, S2, NORM>

Converts this type into a shared reference of the (usually inferred) input type.
Source§

impl<const S1: usize, const S2: usize, const C1: usize, const C2: usize> AsRef<FuzzyHashData<S1, S2, true>> for FuzzyHashDualData<S1, S2, C1, C2>
where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes, ReconstructionBlockSize<S1, C1>: ConstrainedReconstructionBlockSize, ReconstructionBlockSize<S2, C2>: ConstrainedReconstructionBlockSize,

Source§

fn as_ref(&self) -> &FuzzyHashData<S1, S2, true>

Converts this type into a shared reference of the (usually inferred) input type.
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Clone for FuzzyHashData<S1, S2, NORM>

Source§

fn clone(&self) -> FuzzyHashData<S1, S2, NORM>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Debug for FuzzyHashData<S1, S2, NORM>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Default for FuzzyHashData<S1, S2, NORM>

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Display for FuzzyHashData<S1, S2, NORM>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using a given formatter.

§Safety

This method assumes that the fuzzy hash data is not broken.

Unlike this method, Debug implementation does not cause problems if a given fuzzy hash is broken.

Source§

impl<const S1: usize, const S2: usize> From<&FuzzyHashData<S1, S2, true>> for FuzzyHashCompareTarget

Source§

fn from(value: &FuzzyHashData<S1, S2, true>) -> Self

Converts to this type from the input type.
Source§

impl<const NORM: bool> From<FuzzyHashData<{block_hash::FULL_SIZE}, {block_hash::HALF_SIZE}, NORM>> for FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::FULL_SIZE }, NORM>

Source§

fn from( value: FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::HALF_SIZE }, NORM>, ) -> Self

Converts to this type from the input type.
Source§

impl From<FuzzyHashData<{block_hash::FULL_SIZE}, {block_hash::HALF_SIZE}, true>> for FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::FULL_SIZE }, false>

Source§

fn from( value: FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::HALF_SIZE }, true>, ) -> Self

Converts to this type from the input type.
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> From<FuzzyHashData<S1, S2, NORM>> for String

Available on crate feature alloc only.
Source§

fn from(value: FuzzyHashData<S1, S2, NORM>) -> Self

Converts to this type from the input type.
Source§

impl<const S1: usize, const S2: usize> From<FuzzyHashData<S1, S2, false>> for FuzzyHashData<S1, S2, true>

§Compatibility Note

Because this conversion breaks a semantic rule of the From trait, it will be removed in the next major release.

Source§

fn from(value: FuzzyHashData<S1, S2, false>) -> Self

Converts to this type from the input type.
Source§

impl<const S1: usize, const S2: usize, const C1: usize, const C2: usize> From<FuzzyHashData<S1, S2, false>> for FuzzyHashDualData<S1, S2, C1, C2>
where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes, ReconstructionBlockSize<S1, C1>: ConstrainedReconstructionBlockSize, ReconstructionBlockSize<S2, C2>: ConstrainedReconstructionBlockSize,

Source§

fn from(value: FuzzyHashData<S1, S2, false>) -> Self

Converts to this type from the input type.
Source§

impl<const S1: usize, const S2: usize> From<FuzzyHashData<S1, S2, true>> for FuzzyHashCompareTarget

Source§

fn from(value: FuzzyHashData<S1, S2, true>) -> Self

Converts to this type from the input type.
Source§

impl<const S1: usize, const S2: usize> From<FuzzyHashData<S1, S2, true>> for FuzzyHashData<S1, S2, false>

Source§

fn from(value: FuzzyHashData<S1, S2, true>) -> Self

Converts to this type from the input type.
Source§

impl<const S1: usize, const S2: usize, const C1: usize, const C2: usize> From<FuzzyHashData<S1, S2, true>> for FuzzyHashDualData<S1, S2, C1, C2>
where BlockHashSize<S1>: ConstrainedBlockHashSize, BlockHashSize<S2>: ConstrainedBlockHashSize, BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes, ReconstructionBlockSize<S1, C1>: ConstrainedReconstructionBlockSize, ReconstructionBlockSize<S2, C2>: ConstrainedReconstructionBlockSize,

Source§

fn from(value: FuzzyHashData<S1, S2, true>) -> Self

Converts to this type from the input type.
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> FromStr for FuzzyHashData<S1, S2, NORM>

Source§

type Err = ParseError

The associated error which can be returned from parsing.
Source§

fn from_str(s: &str) -> Result<Self, Self::Err>

Parses a string s to return a value of this type. Read more
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Hash for FuzzyHashData<S1, S2, NORM>

Source§

fn hash<H: Hasher>(&self, state: &mut H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Ord for FuzzyHashData<S1, S2, NORM>

Source§

fn cmp(&self, other: &Self) -> Ordering

This method returns an Ordering between self and other. Read more
1.21.0 · Source§

fn max(self, other: Self) -> Self
where Self: Sized,

Compares and returns the maximum of two values. Read more
1.21.0 · Source§

fn min(self, other: Self) -> Self
where Self: Sized,

Compares and returns the minimum of two values. Read more
1.50.0 · Source§

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

Restrict a value to a certain interval. Read more
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> PartialEq for FuzzyHashData<S1, S2, NORM>

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> PartialOrd for FuzzyHashData<S1, S2, NORM>

Source§

fn partial_cmp(&self, other: &Self) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
1.0.0 · Source§

fn lt(&self, other: &Rhs) -> bool

Tests less than (for self and other) and is used by the < operator. Read more
1.0.0 · Source§

fn le(&self, other: &Rhs) -> bool

Tests less than or equal to (for self and other) and is used by the <= operator. Read more
1.0.0 · Source§

fn gt(&self, other: &Rhs) -> bool

Tests greater than (for self and other) and is used by the > operator. Read more
1.0.0 · Source§

fn ge(&self, other: &Rhs) -> bool

Tests greater than or equal to (for self and other) and is used by the >= operator. Read more
Source§

impl<const NORM: bool> TryFrom<FuzzyHashData<{block_hash::FULL_SIZE}, {block_hash::FULL_SIZE}, NORM>> for FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::HALF_SIZE }, NORM>

Source§

type Error = FuzzyHashOperationError

The type returned in the event of a conversion error.
Source§

fn try_from( value: FuzzyHashData<{ block_hash::FULL_SIZE }, { block_hash::FULL_SIZE }, NORM>, ) -> Result<Self, Self::Error>

Performs the conversion.
Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Copy for FuzzyHashData<S1, S2, NORM>

Source§

impl<const S1: usize, const S2: usize, const NORM: bool> Eq for FuzzyHashData<S1, S2, NORM>

Auto Trait Implementations§

§

impl<const S1: usize, const S2: usize, const NORM: bool> Freeze for FuzzyHashData<S1, S2, NORM>

§

impl<const S1: usize, const S2: usize, const NORM: bool> RefUnwindSafe for FuzzyHashData<S1, S2, NORM>

§

impl<const S1: usize, const S2: usize, const NORM: bool> Send for FuzzyHashData<S1, S2, NORM>

§

impl<const S1: usize, const S2: usize, const NORM: bool> Sync for FuzzyHashData<S1, S2, NORM>

§

impl<const S1: usize, const S2: usize, const NORM: bool> Unpin for FuzzyHashData<S1, S2, NORM>

§

impl<const S1: usize, const S2: usize, const NORM: bool> UnwindSafe for FuzzyHashData<S1, S2, NORM>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.