imgfprint
High-performance image fingerprinting library for Rust with perceptual hashing, exact matching, and semantic embeddings.
Overview
imgfprint provides multiple complementary approaches to image identification and similarity detection:
| Method | Use Case | Speed | Precision |
|---|---|---|---|
| SHA256 | Exact deduplication | ~1ms | 100% exact |
| pHash | Perceptual similarity | ~1.5ms | Resilient to compression, resizing |
| Semantic | Content understanding | Local or API | Captures visual meaning |
Perfect for:
- Duplicate image detection
- Similarity search
- Content moderation
- Image deduplication at scale
- Content-based image retrieval (CBIR)
Features
- Deterministic Output - Same input always produces same fingerprint
- SHA256 Exact Hash - Byte-identical detection
- pHash Perceptual Hash - DCT-based similarity (resilient to compression, resizing, minor edits)
- Block-Level Hashing - 4x4 grid for crop resistance
- Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
- SIMD Acceleration - AVX2/NEON optimized resizing
- Parallel Processing - Multi-core batch operations
- Zero-Copy APIs - Minimal allocations in hot paths
- Serde Support - JSON/binary serialization
- Security Hardened - OOM protection (8192px max), no panics on malformed input
- Multiple Formats - PNG, JPEG, GIF, WebP, BMP
Installation
[]
= "0.1.3"
Feature Flags
| Feature | Default | Description |
|---|---|---|
serde |
Yes | Serialization support (JSON, binary) |
parallel |
Yes | Parallel batch processing with rayon |
local-embedding |
No | Local ONNX model inference for semantic embeddings |
Minimal build (no parallel processing):
[]
= { = "0.1.3", = false }
With local embeddings (requires ONNX model):
[]
= { = "0.1.3", = ["local-embedding"] }
Quick Start
use ImageFingerprinter;
Documentation
For complete API reference and usage examples, see USAGE.md.
Architecture
Fingerprint Structure (168 bytes)
ImageFingerprint
├── exact: [u8; 32] // SHA256 of original bytes
├── global_phash: u64 // DCT-based hash (center 32x32)
└── block_hashes: [u64; 16] // DCT hashes (4x4 grid, 64x64 each)
Algorithm Pipeline
- Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB
- Normalize - Resize to 256x256 using SIMD-accelerated Lanczos3 filter
- Convert - RGB to Grayscale (luminance)
- Global Hash - Extract center 32x32, DCT, pHash
- Block Hashes - Split into 4x4 grid (64x64 blocks), DCT, 16 pHashes
- Exact Hash - SHA256 of original bytes
Similarity Computation
The similarity score is a weighted combination:
- 40% Global perceptual hash similarity
- 60% Block-level hash similarity (crop-resistant)
Block distances > 32 are filtered (handles cropping by ignoring missing blocks).
Performance
Benchmarked on AMD Ryzen 9 5900X (single core unless noted):
| Operation | Time | Throughput |
|---|---|---|
fingerprint() |
1.35ms | ~740 images/sec |
compare() |
385ns | 2.6B comparisons/sec |
batch() (10 images) |
6.16ms | 1,620 images/sec (parallel) |
semantic_similarity() |
~500ns | 2M comparisons/sec |
Run benchmarks:
Memory Safety
- Maximum image dimension: 8192x8192 (OOM protection)
- Dimension check before full decode
- Pre-allocated buffers in context API
- Zero-copy where possible
Security
- OOM Protection: Maximum image size 8192x8192 pixels (configurable)
- Deterministic Output: Same input always produces same output
- No Panics: All error conditions return
Result - Constant-Time Hashing: SHA256 computation
- Input Validation: Comprehensive format and size validation
Comparison with Alternatives
| Feature | imgfprint-rs | imagehash | img_hash |
|---|---|---|---|
| SHA256 exact | Yes | No | No |
| pHash | Yes | Yes | Yes |
| Block hashes | Yes | No | No |
| Semantic embeddings | Yes | No | No |
| Local ONNX inference | Yes | No | No |
| Parallel batch | Yes | No | No |
| SIMD acceleration | Yes | No | No |
| Context API | Yes | No | No |
Examples
See the examples/ directory for complete working examples:
batch_process.rs- Process millions of images efficientlycompare_images.rs- Compare two images and show similarityfind_duplicates.rs- Find duplicate images in a directoryserialize.rs- Serialize/deserialize fingerprints to JSON and binarysimilarity_search.rs- Perceptual similarity search in a directorysemantic_search.rs- Content-based image search with CLIP embeddings (requireslocal-embeddingfeature)
Run an example:
# Compare two images
# Semantic search with local CLIP model (requires local-embedding feature)
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests:
cargo test - Run clippy:
cargo clippy --all-targets -- -D warnings - Run benchmarks:
cargo bench - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone
# Run tests
# Run with all features
# Generate documentation
License
MIT License - See LICENSE file for details.