imgfprint

High-performance image fingerprinting library for Rust with multi-algorithm perceptual hashing, exact matching, and semantic embeddings.

Overview

imgfprint provides multiple complementary approaches to image identification and similarity detection:

Method	Use Case	Speed	Precision
BLAKE3	Exact deduplication	~0.2ms	100% exact
AHash	Fast similarity	~0.3ms	Average-based, simplest
PHash	Perceptual similarity	~1.5ms	DCT-based, resilient to compression
DHash	Structural similarity	~0.5ms	Gradient-based, good for crops
Multi	Combined accuracy	~1.8ms	Weighted AHash+PHash+DHash (10/60/30)
Semantic	Content understanding	Local or API	Captures visual meaning

Perfect for:

Duplicate image detection
Similarity search
Content moderation
Image deduplication at scale
Content-based image retrieval (CBIR)

Features

Multi-Algorithm Support - AHash (average) + PHash (DCT-based) + DHash (gradient-based) with weighted combination
Deterministic Output - Same input always produces same fingerprint
BLAKE3 Exact Hash - Byte-identical detection (6-8x faster than SHA256)
Block-Level Hashing - 4x4 grid for crop resistance
EXIF Orientation - Automatically corrects JPEG orientation from camera metadata
Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
Embedding Model ID - Tag embeddings with model identifiers to prevent comparing incompatible models
SIMD Acceleration - AVX2/NEON optimized resizing
Parallel Processing - Multi-core batch operations
Zero-Copy APIs - Minimal allocations in hot paths
Serde Support - JSON/binary serialization
Security Hardened - OOM protection (8192px max), no panics on malformed input
Multiple Formats - PNG, JPEG, GIF, WebP, BMP

Installation

[dependencies]
imgfprint = "0.4.3"

Feature Flags

Feature	Default	Description
`serde`	Yes	Serialization support (JSON, binary)
`parallel`	Yes	Parallel batch processing with rayon
`local-embedding`	No	Local ONNX model inference for semantic embeddings
`tracing`	No	Observability hooks for production debugging

Minimal build (no parallel processing):

[dependencies]
imgfprint = { version = "0.4.3", default-features = false }

With local embeddings (requires ONNX model):

[dependencies]
imgfprint = { version = "0.4.3", features = ["local-embedding"] }

Quick Start

use imgfprint::ImageFingerprinter;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let img1 = std::fs::read("photo1.jpg")?;
    let img2 = std::fs::read("photo2.jpg")?;
    
    // Compute all hashes (AHash + PHash + DHash) for best accuracy
    let fp1 = ImageFingerprinter::fingerprint(&img1)?;
    let fp2 = ImageFingerprinter::fingerprint(&img2)?;
    
    let sim = fp1.compare(&fp2);
    
    println!("Similarity: {:.2}", sim.score);
    println!("Exact match: {}", sim.exact_match);
    
    if sim.score > 0.8 {
        println!("Images are perceptually similar");
    }
    
    Ok(())
}

Single Algorithm Mode

use imgfprint::{ImageFingerprinter, HashAlgorithm};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let img = std::fs::read("photo.jpg")?;
    
    // Use specific algorithm for better speed
    let fp = ImageFingerprinter::fingerprint_with(&img, HashAlgorithm::DHash)?;
    
    Ok(())
}

Documentation

For complete API reference and usage examples, see USAGE.md.

Architecture

Fingerprint Types

MultiHashFingerprint (Default)

Contains AHash, PHash, and DHash for enhanced accuracy:

MultiHashFingerprint
├── exact:       [u8; 32]     // BLAKE3 of original bytes
├── ahash:       ImageFingerprint  // AHash results
│   ├── global_hash: u64
│   └── block_hashes: [u64; 16]
├── phash:       ImageFingerprint  // PHash results
│   ├── global_hash: u64
│   └── block_hashes: [u64; 16]
└── dhash:       ImageFingerprint  // DHash results
    ├── global_hash: u64
    └── block_hashes: [u64; 16]

Single Algorithm Mode

ImageFingerprint
├── exact:       [u8; 32]     // BLAKE3 of original bytes
├── global_hash: u64         // Algorithm-specific hash (center 32x32)
└── block_hashes: [u64; 16]   // Block-level hashes (4x4 grid, 64x64 each)

Algorithm Pipeline

Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB with EXIF orientation correction for JPEG
Normalize - Resize to 256x256 using SIMD-accelerated Lanczos3 filter
Convert - RGB to Grayscale (luminance)
Parallel Hash Computation - All three algorithms computed simultaneously:
- AHash: Average-based, resample to 8x8, compare to mean
- PHash: DCT-based, center 32x32 + 4x4 blocks
- DHash: Gradient-based, resample to 9x8
Exact Hash - BLAKE3 of original bytes

Multi-Algorithm Comparison

When using MultiHashFingerprint, the similarity score uses weighted combination with block-level similarity. The defaults below ship as MultiHashConfig::default() and are reproduced by compare():

10% AHash similarity (average hash, fastest, simplest)
60% PHash similarity (DCT-based, robust to compression)
30% DHash similarity (gradient-based, good for structural changes)

Within each algorithm, similarity is computed as:

40% global hash similarity (overall structure)
60% block-level similarity (crop resistance via 4x4 grid)

All six knobs above plus block_distance_threshold (default 32 of 64) are tunable via MultiHashConfig without forking.

This provides superior crop resistance compared to global-only comparison.

Tuning similarity

Pass a MultiHashConfig to compare_with_config to shift the trade-off — useful when an integrator (UCFP, downstream pipelines) wants per-deployment scoring without forking:

use imgfprint::{ImageFingerprinter, MultiHashConfig};

let bytes_a = std::fs::read("a.jpg")?;
let bytes_b = std::fs::read("b.jpg")?;

let fp_a = ImageFingerprinter::fingerprint(&bytes_a)?;
let fp_b = ImageFingerprinter::fingerprint(&bytes_b)?;

// PHash-only scoring — useful when AHash/DHash aren't trusted on this corpus.
// Setting an algorithm weight to 0.0 removes it from the score.
let cfg = MultiHashConfig {
    ahash_weight: 0.0,
    phash_weight: 1.0,
    dhash_weight: 0.0,
    ..MultiHashConfig::default()
};
let sim = fp_a.compare_with_config(&fp_b, &cfg);
# Ok::<_, Box<dyn std::error::Error>>(())

Tuning decode-time guards

PreprocessConfig exposes the input-size and dimension caps that were previously hardcoded. The same config gates both the pre-read file-size check and the decode-time guards, so tightened limits aren't silently bypassed via the path API:

use imgfprint::{ImageFingerprinter, PreprocessConfig};

// Tight ingest path: 1 MiB max, 2048 max edge, default 32 min edge.
let cfg = PreprocessConfig {
    max_input_bytes: 1 * 1024 * 1024,
    max_dimension: 2048,
    ..PreprocessConfig::default()
};
let fp = ImageFingerprinter::fingerprint_path_with_preprocess("untrusted.jpg", &cfg)?;
# Ok::<_, Box<dyn std::error::Error>>(())

Performance

Benchmarked on Intel i5 11th gen (16 GB RAM , 4 cores 8 threads):

Operation	Time	Throughput
`fingerprint()`	1.35ms	~740 images/sec
`compare()`	385ns	2.6B comparisons/sec
`batch()` (10 images)	6.16ms	1,620 images/sec (parallel)
`semantic_similarity()`	~500ns	2M comparisons/sec

Run benchmarks:

cargo bench

Memory Safety

Maximum image dimension: 8192x8192 (OOM protection)
Dimension check before full decode
Pre-allocated buffers in context API
Zero-copy where possible

Security

OOM Protection: Maximum image size 8192x8192 pixels (configurable)
Deterministic Output: Same input always produces same output
No Panics: All error conditions return Result
Fast Hashing: BLAKE3 computation (6-8x faster than SHA256)
Input Validation: Comprehensive format and size validation

Comparison with Alternatives

Feature	imgfprint-rs	imagehash	img_hash
BLAKE3 exact	Yes	No	No
AHash	Yes	Yes	Yes
PHash	Yes	Yes	Yes
DHash	Yes	Yes	Yes
Multi-algorithm	Yes	No	No
Block hashes	Yes	No	No
Semantic embeddings	Yes	No	No
Local ONNX inference	Yes	No	No
Parallel batch	Yes	No	No
SIMD acceleration	Yes	No	No
Context API	Yes	No	No

Examples

See the examples/ directory for complete working examples:

batch_process.rs - Process millions of images efficiently
compare_images.rs - Compare two images and show similarity
find_duplicates.rs - Find duplicate images in a directory
serialize.rs - Serialize/deserialize fingerprints to JSON and binary
zero_copy_persistence.rs - Persist MultiHashFingerprint slices as bytes with bytemuck
similarity_search.rs - Perceptual similarity search in a directory
semantic_search.rs - Content-based image search with CLIP embeddings (requires local-embedding feature)

Run an example:

# Compare two images
cargo run --example compare_images -- images/photo1.jpg images/photo2.jpg

# Semantic search with local CLIP model (requires local-embedding feature)
cargo run --example semantic_search --features local-embedding -- model.onnx query.jpg ./images 0.85

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Run tests: cargo test
Run clippy: cargo clippy --all-targets -- -D warnings
Run benchmarks: cargo bench
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone
git clone https://github.com/themankindproject/imgfprint-rs
cd imgfprint-rs

# Run tests
cargo test

# Run with all features
cargo test --all-features

# Generate documentation
cargo doc --no-deps --open

License

MIT License - See LICENSE file for details.

imgfprint 0.4.3

imgfprint

Overview

Features

Installation

Feature Flags

Quick Start

Single Algorithm Mode

Documentation

Architecture

Fingerprint Types

MultiHashFingerprint (Default)

Single Algorithm Mode

Algorithm Pipeline

Multi-Algorithm Comparison

Tuning similarity

Tuning decode-time guards

Performance

Memory Safety

Security

Comparison with Alternatives

Examples

Contributing

Development Setup

License