imgfprint
High-performance image fingerprinting library for Rust with multi-algorithm perceptual hashing, exact matching, and semantic embeddings.
Overview
imgfprint provides multiple complementary approaches to image identification and similarity detection:
| Method | Use Case | Speed | Precision |
|---|---|---|---|
| SHA256 | Exact deduplication | ~1ms | 100% exact |
| PHash | Perceptual similarity | ~1.5ms | DCT-based, resilient to compression |
| DHash | Structural similarity | ~0.5ms | Gradient-based, good for crops |
| Multi | Combined accuracy | ~1.8ms | Weighted PHash+DHash (60/40) |
| Semantic | Content understanding | Local or API | Captures visual meaning |
Perfect for:
- Duplicate image detection
- Similarity search
- Content moderation
- Image deduplication at scale
- Content-based image retrieval (CBIR)
Features
- Multi-Algorithm Support - PHash (DCT-based) + DHash (gradient-based) with weighted combination
- Deterministic Output - Same input always produces same fingerprint
- SHA256 Exact Hash - Byte-identical detection
- Block-Level Hashing - 4x4 grid for crop resistance
- Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
- SIMD Acceleration - AVX2/NEON optimized resizing
- Parallel Processing - Multi-core batch operations
- Zero-Copy APIs - Minimal allocations in hot paths
- Serde Support - JSON/binary serialization
- Security Hardened - OOM protection (8192px max), no panics on malformed input
- Multiple Formats - PNG, JPEG, GIF, WebP, BMP
Installation
[]
= "0.2.0"
Feature Flags
| Feature | Default | Description |
|---|---|---|
serde |
Yes | Serialization support (JSON, binary) |
parallel |
Yes | Parallel batch processing with rayon |
local-embedding |
No | Local ONNX model inference for semantic embeddings |
Minimal build (no parallel processing):
[]
= { = "0.2.0", = false }
With local embeddings (requires ONNX model):
[]
= { = "0.2.0", = ["local-embedding"] }
Quick Start
use ImageFingerprinter;
Single Algorithm Mode
use ;
Documentation
For complete API reference and usage examples, see USAGE.md.
Architecture
Fingerprint Types
MultiHashFingerprint (Default)
Contains both PHash and DHash for enhanced accuracy:
MultiHashFingerprint
├── exact: [u8; 32] // SHA256 of original bytes
├── phash: ImageFingerprint // PHash results
│ ├── global_phash: u64
│ └── block_hashes: [u64; 16]
└── dhash: ImageFingerprint // DHash results
├── global_phash: u64
└── block_hashes: [u64; 16]
Single Algorithm Mode
ImageFingerprint
├── exact: [u8; 32] // SHA256 of original bytes
├── global_phash: u64 // Algorithm-specific hash (center 32x32)
└── block_hashes: [u64; 16] // Block-level hashes (4x4 grid, 64x64 each)
Algorithm Pipeline
- Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB
- Normalize - Resize to 256x256 using SIMD-accelerated Lanczos3 filter
- Convert - RGB to Grayscale (luminance)
- Parallel Hash Computation - Both algorithms computed simultaneously:
- PHash: DCT-based, center 32x32 + 4x4 blocks
- DHash: Gradient-based, resample to 9x8
- Exact Hash - SHA256 of original bytes
Multi-Algorithm Comparison
When using MultiHashFingerprint, the similarity score uses weighted combination:
- 60% PHash similarity (DCT-based, robust to compression)
- 40% DHash similarity (gradient-based, good for structural changes)
This provides better accuracy than any single algorithm alone.
Performance
Benchmarked on Intel i5 11th gen (16 GB RAM , 4 cores 8 threads):
| Operation | Time | Throughput |
|---|---|---|
fingerprint() |
1.35ms | ~740 images/sec |
compare() |
385ns | 2.6B comparisons/sec |
batch() (10 images) |
6.16ms | 1,620 images/sec (parallel) |
semantic_similarity() |
~500ns | 2M comparisons/sec |
Run benchmarks:
Memory Safety
- Maximum image dimension: 8192x8192 (OOM protection)
- Dimension check before full decode
- Pre-allocated buffers in context API
- Zero-copy where possible
Security
- OOM Protection: Maximum image size 8192x8192 pixels (configurable)
- Deterministic Output: Same input always produces same output
- No Panics: All error conditions return
Result - Constant-Time Hashing: SHA256 computation
- Input Validation: Comprehensive format and size validation
Comparison with Alternatives
| Feature | imgfprint-rs | imagehash | img_hash |
|---|---|---|---|
| SHA256 exact | Yes | No | No |
| PHash | Yes | Yes | Yes |
| DHash | Yes | Yes | Yes |
| Multi-algorithm | Yes | No | No |
| Block hashes | Yes | No | No |
| Semantic embeddings | Yes | No | No |
| Local ONNX inference | Yes | No | No |
| Parallel batch | Yes | No | No |
| SIMD acceleration | Yes | No | No |
| Context API | Yes | No | No |
Examples
See the examples/ directory for complete working examples:
batch_process.rs- Process millions of images efficientlycompare_images.rs- Compare two images and show similarityfind_duplicates.rs- Find duplicate images in a directoryserialize.rs- Serialize/deserialize fingerprints to JSON and binarysimilarity_search.rs- Perceptual similarity search in a directorysemantic_search.rs- Content-based image search with CLIP embeddings (requireslocal-embeddingfeature)
Run an example:
# Compare two images
# Semantic search with local CLIP model (requires local-embedding feature)
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests:
cargo test - Run clippy:
cargo clippy --all-targets -- -D warnings - Run benchmarks:
cargo bench - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone
# Run tests
# Run with all features
# Generate documentation
License
MIT License - See LICENSE file for details.