FuzzyHashType

Trait FuzzyHashType 

Source
pub trait FuzzyHashType:
    Sized
    + FromStr<Err = ParseError>
    + Display {
    type ChecksumType: FuzzyHashChecksum;
    type BodyType: FuzzyHashBody;

    const NUMBER_OF_BUCKETS: usize;
    const SIZE_IN_BYTES: usize;
    const LEN_IN_STR_EXCEPT_PREFIX: usize;
    const LEN_IN_STR: usize;

    // Required methods
    fn checksum(&self) -> &Self::ChecksumType;
    fn length(&self) -> &FuzzyHashLengthEncoding;
    fn qratios(&self) -> &FuzzyHashQRatios;
    fn body(&self) -> &Self::BodyType;
    fn from_str_bytes(
        bytes: &[u8],
        prefix: Option<HexStringPrefix>,
    ) -> Result<Self, ParseError>;
    fn store_into_bytes(&self, out: &mut [u8]) -> Result<usize, OperationError>;
    fn store_into_str_bytes(
        &self,
        out: &mut [u8],
        prefix: HexStringPrefix,
    ) -> Result<usize, OperationError>;
    fn max_distance(config: ComparisonConfiguration) -> u32;
    fn compare_with_config(
        &self,
        other: &Self,
        config: ComparisonConfiguration,
    ) -> u32;

    // Provided methods
    fn from_str_with(
        s: &str,
        prefix: Option<HexStringPrefix>,
    ) -> Result<Self, ParseError> { ... }
    fn compare(&self, other: &Self) -> u32 { ... }
}
Expand description

The trait to represent a fuzzy hash (TLSH).

§TLSH Internals

A fuzzy hash (TLSH) is composed of up to four parts:

  1. Checksum (checksum of the input, 1 or 3 bytes)
  2. Data Length (approximated, encoded as an 8-bit integer)
  3. Q ratio pair, each Q ratio value reflecting the statistic distribution.
  4. Body. Encoded as the specific number of quartile values (each in 2-bits), in which the quartile count equals the number of “buckets”, used to gather statistic information (local features) of the given input.

Note that the checksum part can be always zero on some TLSH configurations (i.e. multi-threading is enabled or private flag is set).

This trait is implemented by FuzzyHash.

Required Associated Constants§

Source

const NUMBER_OF_BUCKETS: usize

Number of the buckets.

Specifically, this constant denotes the number of effective buckets that are used to construct a fuzzy hash.

Sometimes, the number of physical buckets (number of possible results after the Pearson hashing or its variant) differs from the number of effective buckets.

VariantEffective BucketsPhysical Buckets
Short48*49
Normal128*256
Long256256

On those cases, only the first effective buckets are used and the rest are ignored / dropped.

Source

const SIZE_IN_BYTES: usize

Total size of the fuzzy hash (if represented as a byte array) in bytes (in the binary representation).

This is the fixed size and required buffer size for the store_into_bytes() method.

Source

const LEN_IN_STR_EXCEPT_PREFIX: usize

Length in the hexadecimal string representation (except the prefix "T1").

This is always LEN_IN_STR minus 2.

This is the fixed size and required buffer size for the store_into_str_bytes() method with prefix of HexStringPrefix::Empty.

Source

const LEN_IN_STR: usize

Length in the hexadecimal string representation.

This is always LEN_IN_STR_EXCEPT_PREFIX plus 2.

This is the fixed size and required buffer size for the store_into_str_bytes() method with prefix of HexStringPrefix::WithVersion.

Required Associated Types§

Source

type ChecksumType: FuzzyHashChecksum

The type of the checksum part.

This is an instantiation of crate::hash::checksum::FuzzyHashChecksumData.

Source

type BodyType: FuzzyHashBody

The type of the body part.

This is an instantiation of crate::hash::body::FuzzyHashBodyData.

Required Methods§

Source

fn checksum(&self) -> &Self::ChecksumType

Returns the checksum part.

Source

fn length(&self) -> &FuzzyHashLengthEncoding

Returns the length part.

Source

fn qratios(&self) -> &FuzzyHashQRatios

Returns the Q ratio pair part.

Source

fn body(&self) -> &Self::BodyType

Returns the body part.

Source

fn from_str_bytes( bytes: &[u8], prefix: Option<HexStringPrefix>, ) -> Result<Self, ParseError>

Try parsing the fuzzy hash object from the given TLSH’s hexadecimal representation and the operation mode.

If the argument prefix is None, the existence of the prefix will be auto-detected. Otherwise, the existence of the specified prefix is checked.

Source

fn store_into_bytes(&self, out: &mut [u8]) -> Result<usize, OperationError>

Store the contents of this object to the specified slice (in a binary format).

This method stores the contents as a binary format suitable for serialization and parsing, to the specified slice.

§The Binary Format with a Warning

The binary format slightly differs from the representation you might expect from the TLSH’s hexadecimal representation.

The TLSH’s hexadecimal representation has weird nibble endianness on the header (checksum, length and Q ratio pair parts). For instance, the checksum part in the TLSH’s hex representation "42" means the real checksum value of 0x24.

The body part is also reversed in a sense but this part is handled equivalently by this crate (because only “byte” ordering is reversed in one interpretation). So, you may get the one you may expect from the TLSH’s hexadecimal representation, at least in the body.

The binary format used by this method doesn’t do that conversion on the header.

For instance, following TLSH hash (normal 128 buckets with long 3-byte checksum):

T170F37CF0DC36520C1B007FD320B9B266559FD998A0200725E75AFCEAC99F5881184A4B1AA2 (raw)

T1 70F37C F0 DC 36520C1B007FD320B9B266559FD998A0200725E75AFCEAC99F5881184A4B1AA2 (decomposed)
|  |      |  |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|  |      |  |                Body / Buckets in quartiles (32-byte, 128 buckets)
|  |      |  +- Q ratio pair (reversed; Q1 ratio -> Q2 ratio)
|  |      +- Length (reversed)
|  +- 3-byte Checksum (reversed per byte; AB CD EF -> BA DC FE)
+- Header and version

will be written as the following byte sequence by this method:

073FC70FCD36520C1B007FD320B9B266559FD998A0200725E75AFCEAC99F5881184A4B1AA2 (raw)

__ 073FC7 0F CD 36520C1B007FD320B9B266559FD998A0200725E75AFCEAC99F5881184A4B1AA2 (decomposed)
|  |      |  |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|  |      |  |                Body / Buckets in quartiles (32-byte, 128 buckets)
|  |      |  +- Q ratio pair (Q2 ratio -> Q1 ratio)                 (kept as is)
|  |      +- Length
|  +- 3-byte Checksum
+- No header and version
§The Specification

This method concatenates:

  1. Checksum
  2. Length encoding
  3. Q ratio pair
  4. Body

in that order (without any explicit variable length encodings or separators). Each binary representation of the part can be retrieved as either an u8 value or a slice / array of u8.

See struct documentation for details.

Source

fn store_into_str_bytes( &self, out: &mut [u8], prefix: HexStringPrefix, ) -> Result<usize, OperationError>

Store the contents of this object to the specified slice (in the TLSH’s hexadecimal representation).

This method stores the contents as a TLSH’s hexadecimal string representation with the specified prefix.

Source

fn max_distance(config: ComparisonConfiguration) -> u32

Compute the max distance on comparison with the specified comparison configuration.

If you need the maximum distance on the default configuration, use the first argument of Default::default().

Source

fn compare_with_config( &self, other: &Self, config: ComparisonConfiguration, ) -> u32

Compare with another instance (with a configuration) and return the distance between them.

Normally, you will likely use the default configuration and use compare() instead.

Provided Methods§

Source

fn from_str_with( s: &str, prefix: Option<HexStringPrefix>, ) -> Result<Self, ParseError>

Try parsing the fuzzy hash object from the given TLSH’s hexadecimal representation and the operation mode.

If the argument prefix is None, the existence of the prefix will be auto-detected. Otherwise, the existence of the specified prefix is checked.

Source

fn compare(&self, other: &Self) -> u32

Compare with another instance with the default configuration and return the distance between them.

If you need to use a non-default option, use compare_with_config() instead.

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§

Source§

impl<const SIZE_CKSUM: usize, const SIZE_BUCKETS: usize> FuzzyHashType for FuzzyHash<SIZE_CKSUM, SIZE_BUCKETS>
where FuzzyHashParams<SIZE_CKSUM, SIZE_BUCKETS>: ConstrainedFuzzyHashParams,

Source§

const NUMBER_OF_BUCKETS: usize = <<FuzzyHashParams<SIZE_CKSUM, SIZE_BUCKETS> as ConstrainedFuzzyHashParams>::InnerFuzzyHashType>::NUMBER_OF_BUCKETS

Source§

const SIZE_IN_BYTES: usize = <<FuzzyHashParams<SIZE_CKSUM, SIZE_BUCKETS> as ConstrainedFuzzyHashParams>::InnerFuzzyHashType>::SIZE_IN_BYTES

Source§

const LEN_IN_STR_EXCEPT_PREFIX: usize = <<FuzzyHashParams<SIZE_CKSUM, SIZE_BUCKETS> as ConstrainedFuzzyHashParams>::InnerFuzzyHashType>::LEN_IN_STR_EXCEPT_PREFIX

Source§

const LEN_IN_STR: usize = <<FuzzyHashParams<SIZE_CKSUM, SIZE_BUCKETS> as ConstrainedFuzzyHashParams>::InnerFuzzyHashType>::LEN_IN_STR

Source§

type ChecksumType = <<FuzzyHashParams<SIZE_CKSUM, SIZE_BUCKETS> as ConstrainedFuzzyHashParams>::InnerFuzzyHashType as FuzzyHashType>::ChecksumType

Source§

type BodyType = <<FuzzyHashParams<SIZE_CKSUM, SIZE_BUCKETS> as ConstrainedFuzzyHashParams>::InnerFuzzyHashType as FuzzyHashType>::BodyType