imgfprint 0.3.3

High-performance, deterministic image fingerprinting library
Documentation

imgfprint

Crates.io Documentation License Build Status Rust Version

High-performance image fingerprinting library for Rust with multi-algorithm perceptual hashing, exact matching, and semantic embeddings.

Overview

imgfprint provides multiple complementary approaches to image identification and similarity detection:

Method Use Case Speed Precision
BLAKE3 Exact deduplication ~0.2ms 100% exact
AHash Fast similarity ~0.3ms Average-based, simplest
PHash Perceptual similarity ~1.5ms DCT-based, resilient to compression
DHash Structural similarity ~0.5ms Gradient-based, good for crops
Multi Combined accuracy ~1.8ms Weighted AHash+PHash+DHash (10/60/30)
Semantic Content understanding Local or API Captures visual meaning

Perfect for:

  • Duplicate image detection
  • Similarity search
  • Content moderation
  • Image deduplication at scale
  • Content-based image retrieval (CBIR)

Features

  • Multi-Algorithm Support - AHash (average) + PHash (DCT-based) + DHash (gradient-based) with weighted combination
  • Deterministic Output - Same input always produces same fingerprint
  • BLAKE3 Exact Hash - Byte-identical detection (6-8x faster than SHA256)
  • Block-Level Hashing - 4x4 grid for crop resistance
  • EXIF Orientation - Automatically corrects JPEG orientation from camera metadata
  • Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
  • Embedding Model ID - Tag embeddings with model identifiers to prevent comparing incompatible models
  • SIMD Acceleration - AVX2/NEON optimized resizing
  • Parallel Processing - Multi-core batch operations
  • Zero-Copy APIs - Minimal allocations in hot paths
  • Serde Support - JSON/binary serialization
  • Security Hardened - OOM protection (8192px max), no panics on malformed input
  • Multiple Formats - PNG, JPEG, GIF, WebP, BMP

Installation

[dependencies]
imgfprint = "0.3.3"

Feature Flags

Feature Default Description
serde Yes Serialization support (JSON, binary)
parallel Yes Parallel batch processing with rayon
local-embedding No Local ONNX model inference for semantic embeddings
tracing No Observability hooks for production debugging

Minimal build (no parallel processing):

[dependencies]
imgfprint = { version = "0.3.3", default-features = false }

With local embeddings (requires ONNX model):

[dependencies]
imgfprint = { version = "0.3.3", features = ["local-embedding"] }

Quick Start

use imgfprint::ImageFingerprinter;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let img1 = std::fs::read("photo1.jpg")?;
    let img2 = std::fs::read("photo2.jpg")?;
    
    // Compute all hashes (AHash + PHash + DHash) for best accuracy
    let fp1 = ImageFingerprinter::fingerprint(&img1)?;
    let fp2 = ImageFingerprinter::fingerprint(&img2)?;
    
    let sim = fp1.compare(&fp2);
    
    println!("Similarity: {:.2}", sim.score);
    println!("Exact match: {}", sim.exact_match);
    
    if sim.score > 0.8 {
        println!("Images are perceptually similar");
    }
    
    Ok(())
}

Single Algorithm Mode

use imgfprint::{ImageFingerprinter, HashAlgorithm};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let img = std::fs::read("photo.jpg")?;
    
    // Use specific algorithm for better speed
    let fp = ImageFingerprinter::fingerprint_with(&img, HashAlgorithm::DHash)?;
    
    Ok(())
}

Documentation

For complete API reference and usage examples, see USAGE.md.

Architecture

Fingerprint Types

MultiHashFingerprint (Default)

Contains AHash, PHash, and DHash for enhanced accuracy:

MultiHashFingerprint
├── exact:       [u8; 32]     // BLAKE3 of original bytes
├── ahash:       ImageFingerprint  // AHash results
│   ├── global_hash: u64
│   └── block_hashes: [u64; 16]
├── phash:       ImageFingerprint  // PHash results
│   ├── global_hash: u64
│   └── block_hashes: [u64; 16]
└── dhash:       ImageFingerprint  // DHash results
    ├── global_hash: u64
    └── block_hashes: [u64; 16]

Single Algorithm Mode

ImageFingerprint
├── exact:       [u8; 32]     // BLAKE3 of original bytes
├── global_hash: u64         // Algorithm-specific hash (center 32x32)
└── block_hashes: [u64; 16]   // Block-level hashes (4x4 grid, 64x64 each)

Algorithm Pipeline

  1. Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB with EXIF orientation correction for JPEG
  2. Normalize - Resize to 256x256 using SIMD-accelerated Lanczos3 filter
  3. Convert - RGB to Grayscale (luminance)
  4. Parallel Hash Computation - All three algorithms computed simultaneously:
    • AHash: Average-based, resample to 8x8, compare to mean
    • PHash: DCT-based, center 32x32 + 4x4 blocks
    • DHash: Gradient-based, resample to 9x8
  5. Exact Hash - BLAKE3 of original bytes

Multi-Algorithm Comparison

When using MultiHashFingerprint, the similarity score uses weighted combination with block-level similarity:

  • 10% AHash similarity (average hash, fastest, simplest)
  • 60% PHash similarity (DCT-based, robust to compression)
  • 30% DHash similarity (gradient-based, good for structural changes)

Within each algorithm, similarity is computed as:

  • 40% global hash similarity (overall structure)
  • 60% block-level similarity (crop resistance via 4x4 grid)

This provides superior crop resistance compared to global-only comparison.

Performance

Benchmarked on Intel i5 11th gen (16 GB RAM , 4 cores 8 threads):

Operation Time Throughput
fingerprint() 1.35ms ~740 images/sec
compare() 385ns 2.6B comparisons/sec
batch() (10 images) 6.16ms 1,620 images/sec (parallel)
semantic_similarity() ~500ns 2M comparisons/sec

Run benchmarks:

cargo bench

Memory Safety

  • Maximum image dimension: 8192x8192 (OOM protection)
  • Dimension check before full decode
  • Pre-allocated buffers in context API
  • Zero-copy where possible

Security

  • OOM Protection: Maximum image size 8192x8192 pixels (configurable)
  • Deterministic Output: Same input always produces same output
  • No Panics: All error conditions return Result
  • Fast Hashing: BLAKE3 computation (6-8x faster than SHA256)
  • Input Validation: Comprehensive format and size validation

Comparison with Alternatives

Feature imgfprint-rs imagehash img_hash
BLAKE3 exact Yes No No
AHash Yes Yes Yes
PHash Yes Yes Yes
DHash Yes Yes Yes
Multi-algorithm Yes No No
Block hashes Yes No No
Semantic embeddings Yes No No
Local ONNX inference Yes No No
Parallel batch Yes No No
SIMD acceleration Yes No No
Context API Yes No No

Examples

See the examples/ directory for complete working examples:

  • batch_process.rs - Process millions of images efficiently
  • compare_images.rs - Compare two images and show similarity
  • find_duplicates.rs - Find duplicate images in a directory
  • serialize.rs - Serialize/deserialize fingerprints to JSON and binary
  • similarity_search.rs - Perceptual similarity search in a directory
  • semantic_search.rs - Content-based image search with CLIP embeddings (requires local-embedding feature)

Run an example:

# Compare two images
cargo run --example compare_images -- images/photo1.jpg images/photo2.jpg

# Semantic search with local CLIP model (requires local-embedding feature)
cargo run --example semantic_search --features local-embedding -- model.onnx query.jpg ./images 0.85

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Run tests: cargo test
  4. Run clippy: cargo clippy --all-targets -- -D warnings
  5. Run benchmarks: cargo bench
  6. Commit changes (git commit -m 'Add amazing feature')
  7. Push to branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

Development Setup

# Clone
git clone https://github.com/themankindproject/imgfprint-rs
cd imgfprint-rs

# Run tests
cargo test

# Run with all features
cargo test --all-features

# Generate documentation
cargo doc --no-deps --open

License

MIT License - See LICENSE file for details.