Generator

Struct Generator 

Source
pub struct Generator(/* private fields */);
Expand description

Fuzzy hash generator.

This type generates fuzzy hashes from a given data.

§Default Output

§Normalization

The output of the generator is not normalized. If you want to convert it to a normalized form, use separate methods like RawFuzzyHash::normalize().

In other words, this generator (itself) does not have the direct equivalent to the FUZZY_FLAG_ELIMSEQ flag of libfuzzy’s fuzzy_digest function.

§Truncation

By default (using finalize() method), the output has a short, truncated form.

By using finalize_without_truncation(), you can retrieve a non-truncated form as a result. This is equivalent to the FUZZY_FLAG_NOTRUNC flag of libfuzzy’s fuzzy_digest function.

§Input Types

This type has three update methods accepting three different types:

  1. update() (accepting a slice of u8 - byte buffer)
  2. update_by_iter() (accepting an iterator of u8 - stream of bytes)
  3. update_by_byte() (accepting u8 - single byte)

§Input Size

The input size has a hard maximum limit (inclusive): MAX_INPUT_SIZE (192GiB). This is due to the mathematical limit of the 32-bit rolling hash and piece-splitting behavior.

On the other hand, if the input size is too small, the result will not be meaningful enough. This soft lower limit (inclusive) is declared as MIN_RECOMMENDED_INPUT_SIZE and you can check the may_warn_about_small_input_size() method to check whether the size is too small to be meaningful enough.

Note: even if it’s doubtful to be meaningful enough, a fuzzy hash generated from such a small input is still valid. You don’t have to reject them just because they are too small. This soft limit is for diagnostics.

If you know the total size of the input, you can improve the performance by using either the set_fixed_input_size() method or the set_fixed_input_size_in_usize() method.

§Examples

use ssdeep::{Generator, RawFuzzyHash};

let mut generator = Generator::new();
let buf1: &[u8]    = b"Hello, ";
let buf2: &[u8; 6] = b"World!";

// Optional but supplying the *total* input size first improves the performance.
// This is the total size of three update calls below.
generator.set_fixed_input_size_in_usize(buf1.len() + buf2.len() + 1).unwrap();

// Update the internal state of the generator.
// Of course, you can update multiple times.
generator.update(buf1);
generator.update_by_iter((*buf2).into_iter());
generator.update_by_byte(b'\n');

// Retrieve the fuzzy hash and convert to the string.
let hash: RawFuzzyHash = generator.finalize().unwrap();
assert_eq!(hash.to_string(), "3:aaX8v:aV");

§Compatibility Notice

+= operator is going to be removed in the next major release.

Implementations§

Source§

impl Generator

Source

pub const MAX_INPUT_SIZE: u64 = 206_158_430_208u64

The maximum input size (inclusive).

ssdeep has an upper limit of 192GiB (inclusive).

This is a hard limit. Feeding data larger than this constant size is an invalid operation.

The recommended minimum input size (inclusive).

This is a soft limit. Although it’s doubtful that the result from the input smaller than this constant size is meaningful enough, it’s still valid. It might be useful for diagnostics.

Source

pub fn new() -> Self

Creates a new Generator object.

Source

pub fn reset(&mut self)

Performs a partial initialization.

It effectively resets the state to the initial one but does not necessarily reinitialize all internal fields.

Source

pub fn input_size(&self) -> u64

Retrieves the input size fed to the generator object.

Source

pub fn may_warn_about_small_input_size(&self) -> bool

Checks whether a ssdeep-compatible client may raise a warning due to its small input size (less meaningful fuzzy hashes will be generated on the finalization).

The result is based on either the fixed size or the current input size. So, this method should be used after calling either:

and before resetting the state.

Source

pub fn set_fixed_input_size(&mut self, size: u64) -> Result<(), GeneratorError>

Set the fixed input size for optimal performance.

This method sets the internal upper limit of the block size to update per byte. It improves the performance by preventing unnecessary block hash updates (that will never be used by the final fuzzy hash).

This method returns an error if:

  1. size is larger than MAX_INPUT_SIZE (GeneratorError::FixedSizeTooLarge) or
  2. The fixed size is previously set but the new one is different (GeneratorError::FixedSizeMismatch).
Source

pub fn set_fixed_input_size_in_usize( &mut self, size: usize, ) -> Result<(), GeneratorError>

Set the fixed input size for optimal performance.

This is a thin wrapper of the set_fixed_input_size() method.

Although that this implementation handles u64 as the native input size type and the file size in the Rust standard library is represented as u64, it’s not rare that you want to give a usize to hash a buffer (or your program uses usize for its native size representation).

It accepts size in usize and if this size is larger than 64-bits, an error containing GeneratorError::FixedSizeTooLarge is returned. Other than that, this is the same as set_fixed_input_size().

Source§

impl Generator

Source

pub fn update(&mut self, buffer: &[u8]) -> &mut Self

Process data, updating the internal state.

Source

pub fn update_by_iter(&mut self, iter: impl Iterator<Item = u8>) -> &mut Self

Process data (an iterator), updating the internal state.

Source

pub fn update_by_byte(&mut self, ch: u8) -> &mut Self

Process a byte, updating the internal state.

Source

pub fn finalize_raw<const TRUNC: bool, const S1: usize, const S2: usize>( &self, ) -> Result<FuzzyHashData<S1, S2, false>, GeneratorError>

Retrieves the resulting fuzzy hash.

Usually, you should use the finalize() method (a wrapper of this method) instead because it passes the TRUNC option true to this method (as the default ssdeep option).

Although some methods including this is named finalize, you can continue feeding more data and updating the internal state without problems. Still, it’s hard to find such use cases so that using Generator like this is useful.

Source

pub fn finalize(&self) -> Result<RawFuzzyHash, GeneratorError>

Retrieves the resulting fuzzy hash.

The type of resulting fuzzy hash (RawFuzzyHash) is in a raw form (not normalized). This is the default behavior of ssdeep.

This is equivalent to calling libfuzzy’s fuzzy_digest function with default flags.

Source

pub fn finalize_without_truncation( &self, ) -> Result<LongRawFuzzyHash, GeneratorError>

Retrieves the resulting fuzzy hash, not truncating the second block hash.

Note that not doing the truncation is usually not what you want.

This is equivalent to calling libfuzzy’s fuzzy_digest function with the flag FUZZY_FLAG_NOTRUNC.

Trait Implementations§

Source§

impl AddAssign<&[u8]> for Generator

Source§

fn add_assign(&mut self, buffer: &[u8])

Updates the hash value by processing a slice of u8.

Source§

impl<const N: usize> AddAssign<&[u8; N]> for Generator

Source§

fn add_assign(&mut self, buffer: &[u8; N])

Updates the hash value by processing an array of u8.

Source§

impl AddAssign<u8> for Generator

Source§

fn add_assign(&mut self, byte: u8)

Updates the hash value by processing a byte.

Source§

impl Clone for Generator

Source§

fn clone(&self) -> Generator

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Generator

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Generator

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.