Crate tlsh

Source
Expand description

§fast-tlsh: Fast TLSH-compatible Fuzzy Hashing Library in pure Rust

TLSH stands for Trendmicro Locality Sensitive Hash. TLSH can be used to detect similar files.

You can generate / parse / compare (TLSH-compatible) LSHs with this crate.

Thanks to SIMD-friendly optimizations and its memory layout, comparing two LSHs are significantly faster than the original implementation. Even if you turn off real SIMD (to forbid any unsafe code), it employs pseudo-SIMD operations and additional tables to speed up the comparison.

Also, it speeds up generating fuzzy hashes (~50% faster) using the “double update” table optimization.

§Crate Features

  • alloc and std (default)
    This crate supports no_std (by disabling both of them) and alloc and std are built on the minimum no_std implementation. Those features enable implementations that depend on alloc and std, respectively.
  • easy-functions (default)
    It provides easy-to-use high-level functions.
  • simd (default; fast but unsafe)
    This crate is unsafe by default (due to the use of SIMD instructions). But you can benefit from other optimizations even if you disable it.
  • detect-features (default; marginally slow but convenient)
    This feature depends on std.
    If the simd feature is enabled and there’s the case where switching between SIMD and non-SIMD implementations are feasible, it turns on the runtime checks to switch the implementation dynamically.
  • opt-default (default; Recommended if no default features are enabled)
    This crate implements number of optimizations and may be tuned separately. If you turn off all default features, all such optimizations are turned off. You may enable this feature for recommended set of optimizations except real SIMD-based ones (that are generally unsafe).
  • opt-embedded-default (Turn off the default features if you use this)
    By default, this crate is optimized for cache-rich environment. For embedded devices with a smaller cache memory, you may use this feature to turn off generating large tables. It makes the code slightly bigger but currently reduces the static memory footprint by 128.25KiB.
  • strict-parser
    It enables the strict parser which enforces additional validity. This is disabled by default (because it is not implemented in the official implementation) but enabling it will make the parser more robust.
  • unsafe (marginally fast but unsafe)
    Other unsafe features not related to SIMD are masked behind the default-disabled feature: unsafe. Note that, enabling this feature will not (normally) speed up the program significantly.
  • unstable
    This feature enables some features specific to the Nightly Rust except portable SIMD. Note that this feature heavily depends on the version of rustc and should not be considered stable (don’t expect SemVer-compatible semantics).
  • serde
    It enables integration with Serde to serialize / deserialize fuzzy hashes.
  • tests-slow
    They will enable “slow” tests (including fuzzing tests).

For all features (including minor tuning-related ones), see the documentation.

§Documentation / Guides

Modules§

_docs
The documentation.
buckets
The TLSH buckets and their mappings.
compare
Comparison-related metrics and the configuration type.
errors
Types representing specific types of errors.
generate
The fuzzy hash generator.
hash
The fuzzy hash and its parts (unless a part has its own module).
hashes
Fuzzy hashes with specific parameters.
length
Data length encodings and other handlings.
pearsonDeprecatedexperiment-pearson
Pearson hashing and the TLSH’s B (bucket) mapping.
prelude
The recommended set (prelude) to import.

Traits§

FuzzyHashType
The trait to represent a fuzzy hash (TLSH).
GeneratorType
The trait to represent a fuzzy hash generator.

Functions§

compareeasy-functions
Compare two fuzzy hashes.
compare_witheasy-functions
Compare two fuzzy hashes with specified intermediate fuzzy hash type.
hash_bufeasy-functions
Generates a fuzzy hash from a given buffer.
hash_buf_foreasy-functions
Generates a fuzzy hash from a given buffer (with specified output type).
hash_filestd and easy-functions
Generates a fuzzy hash from a given file.
hash_file_forstd and easy-functions
Generates a fuzzy hash from a given file (with specified output type).
hash_streamstd and easy-functions
Generates a fuzzy hash from a given reader stream.
hash_stream_forstd and easy-functions
Generates a fuzzy hash from a given reader stream (with specified output type).

Type Aliases§

Tlsh
The default fuzzy hash type.
TlshGenerator
The fuzzy hash generator with the default parameter.
TlshGeneratorFor
The fuzzy hash generator with specified parameter (or output fuzzy hash type).