pub trait BlockHashPositionArrayData {
// Required methods
fn representation(&self) -> &[u64; 64];
fn len(&self) -> u8;
// Provided methods
fn is_empty(&self) -> bool { ... }
fn is_valid(&self) -> bool { ... }
fn is_valid_and_normalized(&self) -> bool { ... }
}
Expand description
Represents abstract representation of the block hash position array.
§Position Array Representation
Each element of the position array indicates which positions in the corresponding block hash has the given alphabet (note that the array index is of the alphabet).
For instance, if representation()[5] == 0x81
, it means the block hash
contains the alphabet index 5
in the positions 0
and 7
(block hash glob: F??????F*
except that wildcards don’t allow F
).
This is because the bit 0 (0x01
) at the index 5 means that position 0 has
the alphabet with index 5
(F
; the 6th alphabet). Likewise, the bit 7
(0x80
) at the index 5 corresponds to the fact that position 7 has the
alphabet with index 5
(F
).
This representation makes it possible to make some dynamic programming
algorithms bit-parallel. In other words, some table updates of
certain top-down dynamic programming algorithms can be
represented as logical expressions (with some arithmetic ones
to enable, for instance, horizontal propagation). This is particularly
effective on ssdeep because each block hash has a maximum size of
block_hash::FULL_SIZE
(64; many 64-bit machines would handle that
efficiently and even 32-bit machines can benefit from).
This is so fast so that the bit-parallel approach is still faster even if we don’t use any batching.
For an example of such algorithms, see Bitap algorithm.
See also:
BlockHashPositionArrayImpl
for algorithms based on this representation.FuzzyHashCompareTarget
for the full fuzzy hash object based on this representation.
§Alphabet / Character Sets
Despite that the algorithm itself is independent from the number of
alphabets in the string, this trait is defined for ssdeep and requires
that the all elements inside the string is less than
block_hash::ALPHABET_SIZE
(64).
In other words, a string must be an array of Base64 indices (not Base64 alphabets).
§Compatibility Note
Since version 0.3.0, all types implementing this trait will automatically implement following public traits:
BlockHashPositionArrayImpl
BlockHashPositionArrayImplUnchecked
(when theunchecked
feature is enabled)
§Compatibility Notice
This trait is going to be completely private on the next major release. If you need to experiment with internal hashing functions, just vendor the source code for your needs.
Required Methods§
Sourcefn representation(&self) -> &[u64; 64]
fn representation(&self) -> &[u64; 64]
Returns the raw representation of the block hash position array.
Provided Methods§
Sourcefn is_valid(&self) -> bool
fn is_valid(&self) -> bool
Performs full validity checking of a position array object.
§Compatibility Note
Note that, since version 0.2, this method does not check whether
the object contains a normalized string. For this purpose, use
is_valid_and_normalized()
instead.
Sourcefn is_valid_and_normalized(&self) -> bool
fn is_valid_and_normalized(&self) -> bool
Performs full validity checking and the normalization test of a position array object.
If it returns true
, the position array representation is valid and
the corresponding string is already normalized.
To pass this validity test, the string cannot contain a sequence
consisting of the same character longer than
block_hash::MAX_SEQUENCE_SIZE
.
See also: “Normalization” section of FuzzyHashData