imgfprint

High-performance image fingerprinting library for Rust with perceptual hashing, exact matching, and semantic embeddings.

Overview

imgfprint provides multiple complementary approaches to image identification and similarity detection:

Method	Use Case	Speed	Precision
SHA256	Exact deduplication	~1ms	100% exact
pHash	Perceptual similarity	~1.5ms	Resilient to compression, resizing
Semantic	Content understanding	Local or API	Captures visual meaning

Perfect for:

Duplicate image detection
Similarity search
Content moderation
Image deduplication at scale
Content-based image retrieval (CBIR)

Features

Deterministic Output - Same input always produces same fingerprint
SHA256 Exact Hash - Byte-identical detection
pHash Perceptual Hash - DCT-based similarity (resilient to compression, resizing, minor edits)
Block-Level Hashing - 4x4 grid for crop resistance
Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
SIMD Acceleration - AVX2/NEON optimized resizing
Parallel Processing - Multi-core batch operations
Zero-Copy APIs - Minimal allocations in hot paths
Serde Support - JSON/binary serialization
Security Hardened - OOM protection (8192px max), no panics on malformed input
Multiple Formats - PNG, JPEG, GIF, WebP, BMP

Installation

[dependencies]
imgfprint = "0.1.3"

Feature Flags

Feature	Default	Description
`serde`	Yes	Serialization support (JSON, binary)
`parallel`	Yes	Parallel batch processing with rayon
`local-embedding`	No	Local ONNX model inference for semantic embeddings

Minimal build (no parallel processing):

[dependencies]
imgfprint = { version = "0.1.3", default-features = false }

With local embeddings (requires ONNX model):

[dependencies]
imgfprint = { version = "0.1.3", features = ["local-embedding"] }

Quick Start

use imgfprint::ImageFingerprinter;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let img1 = std::fs::read("photo1.jpg")?;
    let img2 = std::fs::read("photo2.jpg")?;
    
    let fp1 = ImageFingerprinter::fingerprint(&img1)?;
    let fp2 = ImageFingerprinter::fingerprint(&img2)?;
    
    let sim = ImageFingerprinter::compare(&fp1, &fp2);
    
    println!("Similarity: {:.2}", sim.score);
    println!("Exact match: {}", sim.exact_match);
    
    if sim.score > 0.8 {
        println!("Images are perceptually similar");
    }
    
    Ok(())
}

Documentation

For complete API reference and usage examples, see USAGE.md.

Architecture

Fingerprint Structure (168 bytes)

ImageFingerprint
├── exact:      [u8; 32]     // SHA256 of original bytes
├── global_phash: u64        // DCT-based hash (center 32x32)
└── block_hashes: [u64; 16]  // DCT hashes (4x4 grid, 64x64 each)

Algorithm Pipeline

Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB
Normalize - Resize to 256x256 using SIMD-accelerated Lanczos3 filter
Convert - RGB to Grayscale (luminance)
Global Hash - Extract center 32x32, DCT, pHash
Block Hashes - Split into 4x4 grid (64x64 blocks), DCT, 16 pHashes
Exact Hash - SHA256 of original bytes

Similarity Computation

The similarity score is a weighted combination:

40% Global perceptual hash similarity
60% Block-level hash similarity (crop-resistant)

Block distances > 32 are filtered (handles cropping by ignoring missing blocks).

Performance

Benchmarked on AMD Ryzen 9 5900X (single core unless noted):

Operation	Time	Throughput
`fingerprint()`	1.35ms	~740 images/sec
`compare()`	385ns	2.6B comparisons/sec
`batch()` (10 images)	6.16ms	1,620 images/sec (parallel)
`semantic_similarity()`	~500ns	2M comparisons/sec

Run benchmarks:

cargo bench

Memory Safety

Maximum image dimension: 8192x8192 (OOM protection)
Dimension check before full decode
Pre-allocated buffers in context API
Zero-copy where possible

Security

OOM Protection: Maximum image size 8192x8192 pixels (configurable)
Deterministic Output: Same input always produces same output
No Panics: All error conditions return Result
Constant-Time Hashing: SHA256 computation
Input Validation: Comprehensive format and size validation

Comparison with Alternatives

Feature	imgfprint-rs	imagehash	img_hash
SHA256 exact	Yes	No	No
pHash	Yes	Yes	Yes
Block hashes	Yes	No	No
Semantic embeddings	Yes	No	No
Local ONNX inference	Yes	No	No
Parallel batch	Yes	No	No
SIMD acceleration	Yes	No	No
Context API	Yes	No	No

Examples

See the examples/ directory for complete working examples:

batch_process.rs - Process millions of images efficiently
compare_images.rs - Compare two images and show similarity
find_duplicates.rs - Find duplicate images in a directory
serialize.rs - Serialize/deserialize fingerprints to JSON and binary
similarity_search.rs - Perceptual similarity search in a directory
semantic_search.rs - Content-based image search with CLIP embeddings (requires local-embedding feature)

Run an example:

# Compare two images
cargo run --example compare_images -- images/photo1.jpg images/photo2.jpg

# Semantic search with local CLIP model (requires local-embedding feature)
cargo run --example semantic_search --features local-embedding -- model.onnx query.jpg ./images 0.85

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Run tests: cargo test
Run clippy: cargo clippy --all-targets -- -D warnings
Run benchmarks: cargo bench
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone
git clone https://github.com/themankindproject/imgfprint-rs
cd imgfprint-rs

# Run tests
cargo test

# Run with all features
cargo test --all-features

# Generate documentation
cargo doc --no-deps --open

License

MIT License - See LICENSE file for details.

imgfprint 0.1.3