imgfprint
High-performance image fingerprinting library for Rust with perceptual hashing, exact matching, and semantic embeddings.
Overview
imgfprint provides multiple complementary approaches to image identification and similarity detection:
| Method | Use Case | Speed | Precision |
|---|---|---|---|
| SHA256 | Exact deduplication | ~1ms | 100% exact |
| pHash | Perceptual similarity | ~1-5ms | Resilient to compression, resizing |
| Semantic | Content understanding | Local or API | Captures visual meaning |
Perfect for:
- Duplicate image detection
- Similarity search
- Content moderation
- Image deduplication at scale
- Content-based image retrieval (CBIR)
Features
- Deterministic Output - Same input always produces same fingerprint
- SHA256 Exact Hash - Byte-identical detection
- pHash Perceptual Hash - DCT-based similarity (resilient to compression, resizing, minor edits)
- Block-Level Hashing - 4×4 grid for crop resistance
- Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
- SIMD Acceleration - AVX2/NEON optimized resizing
- Parallel Processing - Multi-core batch operations
- Zero-Copy APIs - Minimal allocations in hot paths
- Serde Support - JSON/binary serialization
- Security Hardened - OOM protection (8192px max), no panics on malformed input
- Multiple Formats - PNG, JPEG, GIF, WebP, BMP
Installation
[]
= "0.1"
Feature Flags
| Feature | Default | Description |
|---|---|---|
serde |
✅ | Serialization support (JSON, binary) |
parallel |
✅ | Parallel batch processing with rayon |
local-embedding |
❌ | Local ONNX model inference for semantic embeddings |
Minimal build (no parallel processing):
[]
= { = "0.1", = false }
With local embeddings (requires ONNX model):
[]
= { = "0.1", = ["local-embedding"] }
Quick Start
Basic Fingerprinting
use ImageFingerprinter;
Semantic Embeddings
use ;
// Implement your provider (OpenAI, HuggingFace, local model, etc.)
;
Local Embeddings (ONNX)
With the local-embedding feature, you can run CLIP models locally without external APIs:
use ;
API Reference
Creating Fingerprints
Single Image
use ImageFingerprinter;
let fp = fingerprint?;
Batch Processing
Process thousands of images efficiently with automatic parallelization:
use ImageFingerprinter;
let images: = vec!;
let results = fingerprint_batch;
for in results
High-Throughput Context
For sustained high-throughput scenarios, use FingerprinterContext to enable buffer reuse:
use FingerprinterContext;
let mut ctx = new;
for path in &image_paths
Accessing Fingerprint Components
// Exact SHA256 hash (32 bytes)
let exact: & = fp.exact_hash;
// Global perceptual hash (center 32×32 region)
let global: u64 = fp.global_phash;
// Block hashes (16 regions in 4×4 grid)
let blocks: & = fp.block_hashes;
Comparing Fingerprints
Using Similarity Scores
use ImageFingerprinter;
let sim = compare;
println!; // 0.0 - 1.0
println!; // true/false
println!; // 0-64
Direct Distance Methods
// Hamming distance between global hashes (0-64)
let dist = fp1.distance;
// Check against threshold
if fp1.is_similar
Semantic Embeddings
Creating Embeddings with Custom Provider
use ;
// Your provider implementation
Computing Cosine Similarity
use ;
let emb1 = new?;
let emb2 = new?;
// Returns f32 in range [-1.0, 1.0]
let sim = semantic_similarity?;
Using ImageFingerprinter Methods
Alternatively, you can use the convenience methods on ImageFingerprinter:
use ;
;
Architecture
Fingerprint Structure (168 bytes)
ImageFingerprint
├── exact: [u8; 32] // SHA256 of original bytes
├── global_phash: u64 // DCT-based hash (center 32×32)
└── block_hashes: [u64; 16] // DCT hashes (4×4 grid, 64×64 each)
Algorithm Pipeline
- Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB
- Normalize - Resize to 256×256 using SIMD-accelerated Lanczos3 filter
- Convert - RGB → Grayscale (luminance)
- Global Hash - Extract center 32×32 → DCT → pHash
- Block Hashes - Split into 4×4 grid (64×64 blocks) → DCT → 16 pHashes
- Exact Hash - SHA256 of original bytes
Similarity Computation
The similarity score is a weighted combination:
- 40% Global perceptual hash similarity
- 60% Block-level hash similarity (crop-resistant)
Block distances > 32 are filtered (handles cropping by ignoring missing blocks).
Embedding Validation
Embeddings are validated on creation:
- ✅ Non-empty vector
- ✅ All values finite (no NaN or infinity)
- ✅ Consistent dimensions for comparison
Performance
Benchmarked on AMD Ryzen 9 5900X (single core unless noted):
| Operation | Time | Throughput |
|---|---|---|
fingerprint() |
1.35ms | ~740 images/sec |
compare() |
385ns | 2.6B comparisons/sec |
batch() (10 images) |
6.16ms | 1,620 images/sec (parallel) |
semantic_similarity() |
~500ns | 2M comparisons/sec |
Note:
semantic_similarity()performance depends on embedding dimension (typically 512-1024). The above is for 512-dim embeddings.
Run benchmarks:
Memory Safety
- Maximum image dimension: 8192×8192 (OOM protection)
- Dimension check before full decode
- Pre-allocated buffers in context API
- Zero-copy where possible
Error Handling
use ;
match fingerprint
Serialization
JSON
use ImageFingerprinter;
use serde_json;
let fp = fingerprint?;
// Serialize
let json = to_string?;
// Deserialize
let fp: ImageFingerprint = from_str?;
Binary
use bincode;
// Serialize
let bytes = serialize?;
// Deserialize
let fp: ImageFingerprint = deserialize?;
Security
- OOM Protection: Maximum image size 8192×8192 pixels (configurable)
- Deterministic Output: Same input always produces same output
- No Panics: All error conditions return
Result - Constant-Time Hashing: SHA256 computation
- Input Validation: Comprehensive format and size validation
Comparison with Alternatives
| Feature | imgfprint-rs | imagehash | img_hash |
|---|---|---|---|
| SHA256 exact | ✅ | ❌ | ❌ |
| pHash | ✅ | ✅ | ✅ |
| Block hashes | ✅ | ❌ | ❌ |
| Semantic embeddings | ✅ | ❌ | ❌ |
| Local ONNX inference | ✅ | ❌ | ❌ |
| Parallel batch | ✅ | ❌ | ❌ |
| SIMD acceleration | ✅ | ❌ | ❌ |
| Context API | ✅ | ❌ | ❌ |
Examples
See the examples/ directory for complete working examples:
batch_process.rs- Process millions of images efficientlycompare_images.rs- Compare two images and show similarityfind_duplicates.rs- Find duplicate images in a directoryserialize.rs- Serialize/deserialize fingerprints to JSON and binarysimilarity_search.rs- Perceptual similarity search in a directorysemantic_search.rs- Content-based image search with CLIP embeddings (requireslocal-embeddingfeature)
Run an example:
# Compare two images
# Semantic search with local CLIP model (requires local-embedding feature)
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests:
cargo test - Run clippy:
cargo clippy --all-targets -- -D warnings - Run benchmarks:
cargo bench - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone
# Run tests
# Run with all features
# Generate documentation