pub struct Generator(/* private fields */);
Expand description
Fuzzy hash generator.
This type generates fuzzy hashes from a given data.
§Default Output
§Normalization
The output of the generator is not normalized. If you want to convert it
to a normalized form, use separate methods like
RawFuzzyHash::normalize()
.
In other words, this generator (itself) does not have the direct equivalent
to the FUZZY_FLAG_ELIMSEQ
flag of libfuzzy’s fuzzy_digest
function.
§Truncation
By default (using finalize()
method), the output has a
short, truncated form.
By using finalize_without_truncation()
,
you can retrieve a non-truncated form as a result. This is equivalent to
the FUZZY_FLAG_NOTRUNC
flag of libfuzzy’s fuzzy_digest
function.
§Input Types
This type has three update methods accepting three different types:
update()
(accepting a slice ofu8
- byte buffer)update_by_iter()
(accepting an iterator ofu8
- stream of bytes)update_by_byte()
(acceptingu8
- single byte)
§Input Size
The input size has a hard maximum limit (inclusive):
MAX_INPUT_SIZE
(192GiB).
This is due to the mathematical limit of
the 32-bit rolling hash and piece-splitting behavior.
On the other hand, if the input size is too small, the result will not be
meaningful enough. This soft lower limit (inclusive) is declared as
MIN_RECOMMENDED_INPUT_SIZE
and
you can check the
may_warn_about_small_input_size()
method to check whether the size is too small to be meaningful enough.
Note: even if it’s doubtful to be meaningful enough, a fuzzy hash generated from such a small input is still valid. You don’t have to reject them just because they are too small. This soft limit is for diagnostics.
If you know the total size of the input, you can improve the performance by
using either the set_fixed_input_size()
method
or the set_fixed_input_size_in_usize()
method.
§Examples
use ssdeep::{Generator, RawFuzzyHash};
let mut generator = Generator::new();
let buf1: &[u8] = b"Hello, ";
let buf2: &[u8; 6] = b"World!";
// Optional but supplying the *total* input size first improves the performance.
// This is the total size of three update calls below.
generator.set_fixed_input_size_in_usize(buf1.len() + buf2.len() + 1).unwrap();
// Update the internal state of the generator.
// Of course, you can update multiple times.
generator.update(buf1);
generator.update_by_iter((*buf2).into_iter());
generator.update_by_byte(b'\n');
// Retrieve the fuzzy hash and convert to the string.
let hash: RawFuzzyHash = generator.finalize().unwrap();
assert_eq!(hash.to_string(), "3:aaX8v:aV");
§Compatibility Notice
+=
operator is going to be removed in the next major release.
Implementations§
Source§impl Generator
impl Generator
Sourcepub const MAX_INPUT_SIZE: u64 = 206_158_430_208u64
pub const MAX_INPUT_SIZE: u64 = 206_158_430_208u64
The maximum input size (inclusive).
ssdeep has an upper limit of 192GiB (inclusive).
This is a hard limit. Feeding data larger than this constant size is an invalid operation.
Sourcepub const MIN_RECOMMENDED_INPUT_SIZE: u64 = 4_097u64
pub const MIN_RECOMMENDED_INPUT_SIZE: u64 = 4_097u64
The recommended minimum input size (inclusive).
This is a soft limit. Although it’s doubtful that the result from the input smaller than this constant size is meaningful enough, it’s still valid. It might be useful for diagnostics.
Sourcepub fn reset(&mut self)
pub fn reset(&mut self)
Performs a partial initialization.
It effectively resets the state to the initial one but does not necessarily reinitialize all internal fields.
Sourcepub fn input_size(&self) -> u64
pub fn input_size(&self) -> u64
Retrieves the input size fed to the generator object.
Sourcepub fn may_warn_about_small_input_size(&self) -> bool
pub fn may_warn_about_small_input_size(&self) -> bool
Checks whether a ssdeep-compatible client may raise a warning due to its small input size (less meaningful fuzzy hashes will be generated on the finalization).
The result is based on either the fixed size or the current input size. So, this method should be used after calling either:
set_fixed_input_size()
or similar methodsfinalize()
or similar methods
and before resetting the state.
Sourcepub fn set_fixed_input_size(&mut self, size: u64) -> Result<(), GeneratorError>
pub fn set_fixed_input_size(&mut self, size: u64) -> Result<(), GeneratorError>
Set the fixed input size for optimal performance.
This method sets the internal upper limit of the block size to update per byte. It improves the performance by preventing unnecessary block hash updates (that will never be used by the final fuzzy hash).
This method returns an error if:
size
is larger thanMAX_INPUT_SIZE
(GeneratorError::FixedSizeTooLarge
) or- The fixed size is previously set but the new one is different
(
GeneratorError::FixedSizeMismatch
).
Sourcepub fn set_fixed_input_size_in_usize(
&mut self,
size: usize,
) -> Result<(), GeneratorError>
pub fn set_fixed_input_size_in_usize( &mut self, size: usize, ) -> Result<(), GeneratorError>
Set the fixed input size for optimal performance.
This is a thin wrapper of the
set_fixed_input_size()
method.
Although that this implementation handles u64
as the native input
size type and
the file size in the Rust standard library
is represented as u64
, it’s not rare that you want to give a
usize
to hash a buffer (or your program uses usize
for its
native size representation).
It accepts size
in usize
and if this size is larger than
64-bits, an error containing GeneratorError::FixedSizeTooLarge
is returned. Other than that, this is the same as
set_fixed_input_size()
.
Source§impl Generator
impl Generator
Sourcepub fn update(&mut self, buffer: &[u8]) -> &mut Self
pub fn update(&mut self, buffer: &[u8]) -> &mut Self
Process data, updating the internal state.
Sourcepub fn update_by_iter(&mut self, iter: impl Iterator<Item = u8>) -> &mut Self
pub fn update_by_iter(&mut self, iter: impl Iterator<Item = u8>) -> &mut Self
Process data (an iterator), updating the internal state.
Sourcepub fn update_by_byte(&mut self, ch: u8) -> &mut Self
pub fn update_by_byte(&mut self, ch: u8) -> &mut Self
Process a byte, updating the internal state.
Sourcepub fn finalize_raw<const TRUNC: bool, const S1: usize, const S2: usize>(
&self,
) -> Result<FuzzyHashData<S1, S2, false>, GeneratorError>where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
pub fn finalize_raw<const TRUNC: bool, const S1: usize, const S2: usize>(
&self,
) -> Result<FuzzyHashData<S1, S2, false>, GeneratorError>where
BlockHashSize<S1>: ConstrainedBlockHashSize,
BlockHashSize<S2>: ConstrainedBlockHashSize,
BlockHashSizes<S1, S2>: ConstrainedBlockHashSizes,
Retrieves the resulting fuzzy hash.
Usually, you should use the finalize()
method (a
wrapper of this method) instead because it passes the TRUNC
option
true
to this method (as the default ssdeep option).
Although some methods including this is named finalize, you can
continue feeding more data and updating the internal state without
problems. Still, it’s hard to find such use cases so that using
Generator
like this is useful.
Sourcepub fn finalize(&self) -> Result<RawFuzzyHash, GeneratorError>
pub fn finalize(&self) -> Result<RawFuzzyHash, GeneratorError>
Retrieves the resulting fuzzy hash.
The type of resulting fuzzy hash (RawFuzzyHash
) is in
a raw form (not normalized). This is the default behavior of ssdeep.
This is equivalent to calling libfuzzy’s fuzzy_digest
function
with default flags.
Sourcepub fn finalize_without_truncation(
&self,
) -> Result<LongRawFuzzyHash, GeneratorError>
pub fn finalize_without_truncation( &self, ) -> Result<LongRawFuzzyHash, GeneratorError>
Retrieves the resulting fuzzy hash, not truncating the second block hash.
Note that not doing the truncation is usually not what you want.
This is equivalent to calling libfuzzy’s fuzzy_digest
function
with the flag FUZZY_FLAG_NOTRUNC
.
Trait Implementations§
Source§impl AddAssign<&[u8]> for Generator
impl AddAssign<&[u8]> for Generator
Source§fn add_assign(&mut self, buffer: &[u8])
fn add_assign(&mut self, buffer: &[u8])
Updates the hash value by processing a slice of u8
.
Source§impl AddAssign<u8> for Generator
impl AddAssign<u8> for Generator
Source§fn add_assign(&mut self, byte: u8)
fn add_assign(&mut self, byte: u8)
Updates the hash value by processing a byte.