Struct img_hash::HasherConfig[][src]

pub struct HasherConfig<B = Box<[u8]>> { /* fields omitted */ }

Start here. Configuration builder for Hasher.

Playing with the various options on this struct allows you to tune the performance of image hashing to your needs.

Sane, reasonably fast defaults are provided by the ::new() constructor. If you just want to start hashing images and don’t care about the details, it’s as simple as:

use img_hash::HasherConfig;

let hasher = HasherConfig::new().to_hasher();
// hasher.hash_image(image);

Configuration Options

The hash API is highly configurable to tune both performance characteristics and hash resilience.

Hash Size

Setter: .hash_size()

Dimensions of the final hash, as width x height, in bits. A hash size of 8, 8 produces an 8 x 8 bit (8 byte) hash. Larger hash sizes take more time to compute as well as more memory, but aren’t necessarily better for comparing images. The best hash size depends on both the hash algorithm and the input dataset. If your images are mostly wide aspect ratio (landscape) then a larger width and a smaller height hash size may be preferable. Optimal values can really only be discovered empirically though.

(As the author experiments, suggested values will be added here for various algorithms.)

Hash Algorithm

Setter: .hash_alg() Definition: HashAlg

Multiple methods of calculating image hashes are provided in this crate under the HashAlg enum. Each algorithm is different but they all produce the same size hashes as governed by hash_size.

Hash Bytes Container / B Type Param

Use with_bytes_type::<B>() instead of new() to customize.

This hash API allows you to specify the bytes container type for generated hashes. The default allows for any arbitrary hash size (see above) but requires heap-allocation. Instead, you can select an array type which allows hashes to be allocated inline, but requires consideration of the possible sizes of hash you want to generate so you don’t waste memory.

Another advantage of using a constant-sized hash type is that the compiler may be able to produce more optimal code for generating and comparing hashes.


// Use default container type, good for any hash size
let config = HasherConfig::new();

/// Inline hash container that exactly fits the default hash size
let config = HasherConfig::with_bytes_type::<[u8; 8]>();

Implementations

impl HasherConfig<Box<[u8]>>[src]

pub fn new() -> Self[src]

Construct a new hasher config with sane, reasonably fast defaults.

A default hash container type is provided as a default type parameter which is guaranteed to fit any hash size.

pub fn with_bytes_type<B_: HashBytes>() -> HasherConfig<B_>[src]

Construct a new config with the selected HashBytes impl.

You may opt for an array type which allows inline allocation of hash data.

Note

The default hash size requires 64 bits / 8 bytes of storage. You can change this with .hash_size().

impl<B: HashBytes> HasherConfig<B>[src]

pub fn hash_size(self, width: u32, height: u32) -> Self[src]

Set a new hash width and height; these can be the same.

The number of bits in the resulting hash will be width * height. If you are using a fixed-size HashBytes type then you must ensure it can hold at least this many bits. You can check this with HashBytes::max_bits().

Rounding Behavior

Certain hash algorithms need to round this value to function properly:

If the chosen values already satisfy these requirements then nothing is changed.

The hash granularity increases with width * height, although there are diminishing returns for higher values. Start small. A good starting value to try is 8, 8.

When using DCT preprocessing having width and height be the same value will improve hashing performance as only one set of coefficients needs to be used.

pub fn resize_filter(self, resize_filter: FilterType) -> Self[src]

Set the filter used to resize images during hashing.

Note when picking a filter that images are almost always reduced in size. Has no effect with the Blockhash algorithm as it does not resize.

pub fn hash_alg(self, hash_alg: HashAlg) -> Self[src]

Set the algorithm used to generate hashes.

Each algorithm has different performance characteristics.

pub fn preproc_dct(self) -> Self[src]

Enable preprocessing with the Discrete Cosine Transform (DCT).

Does nothing when used with the Blockhash.io algorithm which does not scale the image. (RFC: it would be possible to shoehorn a DCT into the Blockhash algorithm but it’s not clear what benefits, if any, that would provide).

After conversion to grayscale, the image is scaled down to width * 2 x height * 2 and then the Discrete Cosine Transform is performed on the luminance values. The DCT essentially transforms the 2D image from the spatial domain with luminance values to a 2D frequency domain where the values are amplitudes of cosine waves. The resulting 2D matrix is then cropped to the low width * height corner and the configured hash algorithm is performed on that.

In layman’s terms, this essentially converts the image into a mathematical representation of the “broad strokes” of the data, which allows the subsequent hashing step to be more robust against changes that may otherwise produce different hashes, such as significant edits to portions of the image.

However, on most machines this usually adds an additional 50-100% to the average hash time.

This is a very similar process to JPEG compression, although the implementation is too different for this to be optimized specifically for JPEG encoded images.

Further Reading:

  • http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html Krawetz describes a “pHash” algorithm which is equivalent to Mean + DCT preprocessing here. However there is nothing to say that DCT preprocessing cannot compose with other hash algorithms; Gradient + DCT might well perform better in some aspects.
  • https://en.wikipedia.org/wiki/Discrete_cosine_transform

pub fn preproc_diff_gauss(self) -> Self[src]

Enable preprocessing with the Difference of Gaussians algorithm with default sigma values.

Recommended only for use with the Blockhash.io algorithm as it significantly reduces entropy in the scaled down image for other algorithms.

See `Self::preproc_diff_gauss_sigmas() for more info.

pub fn preproc_diff_gauss_sigmas(self, sigma_a: f32, sigma_b: f32) -> Self[src]

Enable preprocessing with the Difference of Gaussians algorithm with the given sigma values.

Recommended only for use with the Blockhash.io algorithm as it significantly reduces entropy in the scaled down image for other algorithms.

After the image is converted to grayscale, it is blurred with a Gaussian blur using two different sigmas, and then the images are subtracted from each other. This reduces the image to just sharp transitions in luminance, i.e. edges. Varying the sigma values changes how sharp the edges are^[citation needed].

Further reading:

  • https://en.wikipedia.org/wiki/Difference_of_Gaussians
  • http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm (Difference of Gaussians is an approximation of a Laplacian of Gaussian filter)

pub fn to_hasher(&self) -> Hasher<B>[src]

Create a Hasher from this config which can be used to hash images.

Panics

If the chosen hash size (width x height, rounded for the algorithm if necessary) is too large for the chosen container type (B::max_bits()).

Trait Implementations

impl<B> Debug for HasherConfig<B>[src]

impl<'de, B> Deserialize<'de> for HasherConfig<B>[src]

impl<B> Serialize for HasherConfig<B>[src]

Auto Trait Implementations

impl<B> RefUnwindSafe for HasherConfig<B> where
    B: RefUnwindSafe

impl<B> Send for HasherConfig<B> where
    B: Send

impl<B> Sync for HasherConfig<B> where
    B: Sync

impl<B> Unpin for HasherConfig<B> where
    B: Unpin

impl<B> UnwindSafe for HasherConfig<B> where
    B: UnwindSafe

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> DeserializeOwned for T where
    T: for<'de> Deserialize<'de>, 
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.