imgfprint
High-performance image fingerprinting library for Rust with multi-algorithm perceptual hashing, exact matching, and semantic embeddings.
Overview
imgfprint provides multiple complementary approaches to image identification and similarity detection:
| Method | Use Case | Speed | Precision |
|---|---|---|---|
| BLAKE3 | Exact deduplication | ~0.2ms | 100% exact |
| AHash | Fast similarity | ~0.3ms | Average-based, simplest |
| PHash | Perceptual similarity | ~1.5ms | DCT-based, resilient to compression |
| DHash | Structural similarity | ~0.5ms | Gradient-based, good for crops |
| Multi | Combined accuracy | ~1.8ms | Weighted AHash+PHash+DHash (10/60/30) |
| Semantic | Content understanding | Local or API | Captures visual meaning |
Perfect for:
- Duplicate image detection
- Similarity search
- Content moderation
- Image deduplication at scale
- Content-based image retrieval (CBIR)
Features
- Multi-Algorithm Support - AHash (average) + PHash (DCT-based) + DHash (gradient-based) with weighted combination
- Deterministic Output - Same input always produces same fingerprint
- BLAKE3 Exact Hash - Byte-identical detection (6-8x faster than SHA256)
- Block-Level Hashing - 4x4 grid for crop resistance
- EXIF Orientation - Automatically corrects JPEG orientation from camera metadata
- Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
- Embedding Model ID - Tag embeddings with model identifiers to prevent comparing incompatible models
- SIMD Acceleration - AVX2/NEON optimized resizing
- Parallel Processing - Multi-core batch operations
- Zero-Copy APIs - Minimal allocations in hot paths
- Serde Support - JSON/binary serialization
- Security Hardened - OOM protection (8192px max), no panics on malformed input
- Multiple Formats - PNG, JPEG, GIF, WebP, BMP
Installation
[]
= "0.3.3"
Feature Flags
| Feature | Default | Description |
|---|---|---|
serde |
Yes | Serialization support (JSON, binary) |
parallel |
Yes | Parallel batch processing with rayon |
local-embedding |
No | Local ONNX model inference for semantic embeddings |
tracing |
No | Observability hooks for production debugging |
Minimal build (no parallel processing):
[]
= { = "0.3.3", = false }
With local embeddings (requires ONNX model):
[]
= { = "0.3.3", = ["local-embedding"] }
Quick Start
use ImageFingerprinter;
Single Algorithm Mode
use ;
Documentation
For complete API reference and usage examples, see USAGE.md.
Architecture
Fingerprint Types
MultiHashFingerprint (Default)
Contains AHash, PHash, and DHash for enhanced accuracy:
MultiHashFingerprint
├── exact: [u8; 32] // BLAKE3 of original bytes
├── ahash: ImageFingerprint // AHash results
│ ├── global_hash: u64
│ └── block_hashes: [u64; 16]
├── phash: ImageFingerprint // PHash results
│ ├── global_hash: u64
│ └── block_hashes: [u64; 16]
└── dhash: ImageFingerprint // DHash results
├── global_hash: u64
└── block_hashes: [u64; 16]
Single Algorithm Mode
ImageFingerprint
├── exact: [u8; 32] // BLAKE3 of original bytes
├── global_hash: u64 // Algorithm-specific hash (center 32x32)
└── block_hashes: [u64; 16] // Block-level hashes (4x4 grid, 64x64 each)
Algorithm Pipeline
- Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB with EXIF orientation correction for JPEG
- Normalize - Resize to 256x256 using SIMD-accelerated Lanczos3 filter
- Convert - RGB to Grayscale (luminance)
- Parallel Hash Computation - All three algorithms computed simultaneously:
- AHash: Average-based, resample to 8x8, compare to mean
- PHash: DCT-based, center 32x32 + 4x4 blocks
- DHash: Gradient-based, resample to 9x8
- Exact Hash - BLAKE3 of original bytes
Multi-Algorithm Comparison
When using MultiHashFingerprint, the similarity score uses weighted combination with block-level similarity:
- 10% AHash similarity (average hash, fastest, simplest)
- 60% PHash similarity (DCT-based, robust to compression)
- 30% DHash similarity (gradient-based, good for structural changes)
Within each algorithm, similarity is computed as:
- 40% global hash similarity (overall structure)
- 60% block-level similarity (crop resistance via 4x4 grid)
This provides superior crop resistance compared to global-only comparison.
Performance
Benchmarked on Intel i5 11th gen (16 GB RAM , 4 cores 8 threads):
| Operation | Time | Throughput |
|---|---|---|
fingerprint() |
1.35ms | ~740 images/sec |
compare() |
385ns | 2.6B comparisons/sec |
batch() (10 images) |
6.16ms | 1,620 images/sec (parallel) |
semantic_similarity() |
~500ns | 2M comparisons/sec |
Run benchmarks:
Memory Safety
- Maximum image dimension: 8192x8192 (OOM protection)
- Dimension check before full decode
- Pre-allocated buffers in context API
- Zero-copy where possible
Security
- OOM Protection: Maximum image size 8192x8192 pixels (configurable)
- Deterministic Output: Same input always produces same output
- No Panics: All error conditions return
Result - Fast Hashing: BLAKE3 computation (6-8x faster than SHA256)
- Input Validation: Comprehensive format and size validation
Comparison with Alternatives
| Feature | imgfprint-rs | imagehash | img_hash |
|---|---|---|---|
| BLAKE3 exact | Yes | No | No |
| AHash | Yes | Yes | Yes |
| PHash | Yes | Yes | Yes |
| DHash | Yes | Yes | Yes |
| Multi-algorithm | Yes | No | No |
| Block hashes | Yes | No | No |
| Semantic embeddings | Yes | No | No |
| Local ONNX inference | Yes | No | No |
| Parallel batch | Yes | No | No |
| SIMD acceleration | Yes | No | No |
| Context API | Yes | No | No |
Examples
See the examples/ directory for complete working examples:
batch_process.rs- Process millions of images efficientlycompare_images.rs- Compare two images and show similarityfind_duplicates.rs- Find duplicate images in a directoryserialize.rs- Serialize/deserialize fingerprints to JSON and binarysimilarity_search.rs- Perceptual similarity search in a directorysemantic_search.rs- Content-based image search with CLIP embeddings (requireslocal-embeddingfeature)
Run an example:
# Compare two images
# Semantic search with local CLIP model (requires local-embedding feature)
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests:
cargo test - Run clippy:
cargo clippy --all-targets -- -D warnings - Run benchmarks:
cargo bench - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone
# Run tests
# Run with all features
# Generate documentation
License
MIT License - See LICENSE file for details.