Expand description
§fast-tlsh: Fast TLSH-compatible Fuzzy Hashing Library in pure Rust
TLSH stands for Trendmicro Locality Sensitive Hash. TLSH can be used to detect similar files.
You can generate / parse / compare (TLSH-compatible) LSHs with this crate.
Thanks to SIMD-friendly optimizations and its memory layout, comparing two LSHs are significantly faster than the original implementation. Even if you turn off real SIMD (to forbid any unsafe code), it employs pseudo-SIMD operations and additional tables to speed up the comparison.
Also, it speeds up generating fuzzy hashes (~50% faster) using the “double update” table optimization.
§Crate Features
alloc
andstd
(default)
This crate supportsno_std
(by disabling both of them) andalloc
andstd
are built on the minimumno_std
implementation. Those features enable implementations that depend onalloc
andstd
, respectively.easy-functions
(default)
It provides easy-to-use high-level functions.simd
(default; fast but unsafe)
This crate is unsafe by default (due to the use of SIMD instructions). But you can benefit from other optimizations even if you disable it.detect-features
(default; marginally slow but convenient)
This feature depends onstd
.
If thesimd
feature is enabled and there’s the case where switching between SIMD and non-SIMD implementations are feasible, it turns on the runtime checks to switch the implementation dynamically.opt-default
(default; Recommended if no default features are enabled)
This crate implements number of optimizations and may be tuned separately. If you turn off all default features, all such optimizations are turned off. You may enable this feature for recommended set of optimizations except real SIMD-based ones (that are generally unsafe).opt-embedded-default
(Turn off the default features if you use this)
By default, this crate is optimized for cache-rich environment. For embedded devices with a smaller cache memory, you may use this feature to turn off generating large tables. It makes the code slightly bigger but currently reduces the static memory footprint by 128.25KiB.strict-parser
It enables the strict parser which enforces additional validity. This is disabled by default (because it is not implemented in the official implementation) but enabling it will make the parser more robust.unsafe
(marginally fast but unsafe)
Other unsafe features not related to SIMD are masked behind the default-disabled feature:unsafe
. Note that, enabling this feature will not (normally) speed up the program significantly.unstable
This feature enables some features specific to the Nightly Rust except portable SIMD. Note that this feature heavily depends on the version ofrustc
and should not be considered stable (don’t expect SemVer-compatible semantics).serde
It enables integration with Serde to serialize / deserialize fuzzy hashes.tests-slow
They will enable “slow” tests (including fuzzing tests).
For all features (including minor tuning-related ones), see the documentation.
§Documentation / Guides
Modules§
- _docs
- The documentation.
- buckets
- The TLSH buckets and their mappings.
- compare
- Comparison-related metrics and the configuration type.
- errors
- Types representing specific types of errors.
- generate
- The fuzzy hash generator.
- hash
- The fuzzy hash and its parts (unless a part has its own module).
- hashes
- Fuzzy hashes with specific parameters.
- length
- Data length encodings and other handlings.
- pearson
Deprecated experiment-pearson
- Pearson hashing and the TLSH’s B (bucket) mapping.
- prelude
- The recommended set (prelude) to import.
Traits§
- Fuzzy
Hash Type - The trait to represent a fuzzy hash (TLSH).
- Generator
Type - The trait to represent a fuzzy hash generator.
Functions§
- compare
easy-functions
- Compare two fuzzy hashes.
- compare_
with easy-functions
- Compare two fuzzy hashes with specified intermediate fuzzy hash type.
- hash_
buf easy-functions
- Generates a fuzzy hash from a given buffer.
- hash_
buf_ for easy-functions
- Generates a fuzzy hash from a given buffer (with specified output type).
- hash_
file std
andeasy-functions
- Generates a fuzzy hash from a given file.
- hash_
file_ for std
andeasy-functions
- Generates a fuzzy hash from a given file (with specified output type).
- hash_
stream std
andeasy-functions
- Generates a fuzzy hash from a given reader stream.
- hash_
stream_ for std
andeasy-functions
- Generates a fuzzy hash from a given reader stream (with specified output type).
Type Aliases§
- Tlsh
- The default fuzzy hash type.
- Tlsh
Generator - The fuzzy hash generator with the default parameter.
- Tlsh
Generator For - The fuzzy hash generator with specified parameter (or output fuzzy hash type).