BlockHashPositionArrayData

Trait BlockHashPositionArrayData 

Source
pub trait BlockHashPositionArrayData {
    // Required methods
    fn representation(&self) -> &[u64; 64];
    fn len(&self) -> u8;

    // Provided methods
    fn is_empty(&self) -> bool { ... }
    fn is_valid(&self) -> bool { ... }
    fn is_valid_and_normalized(&self) -> bool { ... }
}
Expand description

Represents abstract representation of the block hash position array.

§Position Array Representation

Each element of the position array indicates which positions in the corresponding block hash has the given alphabet (note that the array index is of the alphabet).

For instance, if representation()[5] == 0x81, it means the block hash contains the alphabet index 5 in the positions 0 and 7 (block hash glob: F??????F* except that wildcards don’t allow F).

This is because the bit 0 (0x01) at the index 5 means that position 0 has the alphabet with index 5 (F; the 6th alphabet). Likewise, the bit 7 (0x80) at the index 5 corresponds to the fact that position 7 has the alphabet with index 5 (F).

This representation makes it possible to make some dynamic programming algorithms bit-parallel. In other words, some table updates of certain top-down dynamic programming algorithms can be represented as logical expressions (with some arithmetic ones to enable, for instance, horizontal propagation). This is particularly effective on ssdeep because each block hash has a maximum size of block_hash::FULL_SIZE (64; many 64-bit machines would handle that efficiently and even 32-bit machines can benefit from).

This is so fast so that the bit-parallel approach is still faster even if we don’t use any batching.

For an example of such algorithms, see Bitap algorithm.

See also:

§Alphabet / Character Sets

Despite that the algorithm itself is independent from the number of alphabets in the string, this trait is defined for ssdeep and requires that the all elements inside the string is less than block_hash::ALPHABET_SIZE (64).

In other words, a string must be an array of Base64 indices (not Base64 alphabets).

§Compatibility Note

Since version 0.3.0, all types implementing this trait will automatically implement following public traits:

§Compatibility Notice

This trait is going to be completely private on the next major release. If you need to experiment with internal hashing functions, just vendor the source code for your needs.

Required Methods§

Source

fn representation(&self) -> &[u64; 64]

Returns the raw representation of the block hash position array.

Source

fn len(&self) -> u8

Returns the length of the block hash.

Provided Methods§

Source

fn is_empty(&self) -> bool

Returns whether the block hash is empty.

Source

fn is_valid(&self) -> bool

Performs full validity checking of a position array object.

§Compatibility Note

Note that, since version 0.2, this method does not check whether the object contains a normalized string. For this purpose, use is_valid_and_normalized() instead.

Source

fn is_valid_and_normalized(&self) -> bool

Performs full validity checking and the normalization test of a position array object.

If it returns true, the position array representation is valid and the corresponding string is already normalized.

To pass this validity test, the string cannot contain a sequence consisting of the same character longer than block_hash::MAX_SEQUENCE_SIZE.

See also: “Normalization” section of FuzzyHashData

Implementors§