imgfprint

High-performance image fingerprinting library for Rust with perceptual hashing, exact matching, and semantic embeddings.

Overview

imgfprint provides multiple complementary approaches to image identification and similarity detection:

Method	Use Case	Speed	Precision
SHA256	Exact deduplication	~1ms	100% exact
pHash	Perceptual similarity	~1-5ms	Resilient to compression, resizing
Semantic	Content understanding	Local or API	Captures visual meaning

Perfect for:

Duplicate image detection
Similarity search
Content moderation
Image deduplication at scale
Content-based image retrieval (CBIR)

Features

Deterministic Output - Same input always produces same fingerprint
SHA256 Exact Hash - Byte-identical detection
pHash Perceptual Hash - DCT-based similarity (resilient to compression, resizing, minor edits)
Block-Level Hashing - 4×4 grid for crop resistance
Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
SIMD Acceleration - AVX2/NEON optimized resizing
Parallel Processing - Multi-core batch operations
Zero-Copy APIs - Minimal allocations in hot paths
Serde Support - JSON/binary serialization
Security Hardened - OOM protection (8192px max), no panics on malformed input
Multiple Formats - PNG, JPEG, GIF, WebP, BMP

Installation

[dependencies]
imgfprint = "0.1"

Feature Flags

Feature	Default	Description
`serde`	✅	Serialization support (JSON, binary)
`parallel`	✅	Parallel batch processing with rayon
`local-embedding`	❌	Local ONNX model inference for semantic embeddings

Minimal build (no parallel processing):

[dependencies]
imgfprint = { version = "0.1", default-features = false }

With local embeddings (requires ONNX model):

[dependencies]
imgfprint = { version = "0.1", features = ["local-embedding"] }

Quick Start

Basic Fingerprinting

use imgfprint::ImageFingerprinter;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let img1 = std::fs::read("photo1.jpg")?;
    let img2 = std::fs::read("photo2.jpg")?;
    
    let fp1 = ImageFingerprinter::fingerprint(&img1)?;
    let fp2 = ImageFingerprinter::fingerprint(&img2)?;
    
    let sim = ImageFingerprinter::compare(&fp1, &fp2);
    
    println!("Similarity: {:.2}", sim.score);
    println!("Exact match: {}", sim.exact_match);
    
    if sim.score > 0.8 {
        println!("Images are perceptually similar");
    }
    
    Ok(())
}

Semantic Embeddings

use imgfprint::{EmbeddingProvider, Embedding, ImgFprintError};

// Implement your provider (OpenAI, HuggingFace, local model, etc.)
struct MyClipProvider;

impl EmbeddingProvider for MyClipProvider {
    fn embed(&self, image: &[u8]) -> Result<Embedding, ImgFprintError> {
        // Call your embedding API or local model
        // Return a Vec<f32> with 512-1024 dimensions
        let vector = call_your_embedding_api(image)?;
        Embedding::new(vector)
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = MyClipProvider;
    
    let img1 = std::fs::read("cat.jpg")?;
    let img2 = std::fs::read("dog.jpg")?;
    
    let emb1 = provider.embed(&img1)?;
    let emb2 = provider.embed(&img2)?;
    
    // Cosine similarity in range [-1.0, 1.0]
    let similarity = imgfprint::semantic_similarity(&emb1, &emb2)?;
    
    println!("Semantic similarity: {:.4}", similarity);
    // 1.0 = identical content
    // 0.0 = unrelated content
    // -1.0 = opposite content (rare with CLIP)
    
    Ok(())
}

Local Embeddings (ONNX)

With the local-embedding feature, you can run CLIP models locally without external APIs:

use imgfprint::{LocalProvider, LocalProviderConfig, EmbeddingProvider};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load a CLIP model in ONNX format
    let provider = LocalProvider::from_file("clip-vit-base-patch32.onnx")?;
    
    // Or with custom configuration
    let provider = LocalProvider::from_file_with_config(
        "clip.onnx",
        LocalProviderConfig::clip_vit_large_patch14()
    )?;
    
    // Generate embedding
    let image = std::fs::read("photo.jpg")?;
    let embedding = provider.embed(&image)?;
    
    println!("Generated {}-dimensional embedding", embedding.len());
    
    Ok(())
}

API Reference

Creating Fingerprints

Single Image

use imgfprint::ImageFingerprinter;

let fp = ImageFingerprinter::fingerprint(&image_bytes)?;

Batch Processing

Process thousands of images efficiently with automatic parallelization:

use imgfprint::ImageFingerprinter;

let images: Vec<(String, Vec<u8>)> = vec![
    ("img1.jpg".into(), bytes1),
    ("img2.jpg".into(), bytes2),
    // ... thousands more
];

let results = ImageFingerprinter::fingerprint_batch(&images);

for (id, result) in results {
    match result {
        Ok(fp) => println!("{}: fingerprinted", id),
        Err(e) => println!("{}: error - {}", id, e),
    }
}

High-Throughput Context

For sustained high-throughput scenarios, use FingerprinterContext to enable buffer reuse:

use imgfprint::FingerprinterContext;

let mut ctx = FingerprinterContext::new();

for path in &image_paths {
    let bytes = std::fs::read(path)?;
    let fp = ctx.fingerprint(&bytes)?;
    // Process fingerprint...
}

Accessing Fingerprint Components

// Exact SHA256 hash (32 bytes)
let exact: &[u8; 32] = fp.exact_hash();

// Global perceptual hash (center 32×32 region)
let global: u64 = fp.global_phash();

// Block hashes (16 regions in 4×4 grid)
let blocks: &[u64; 16] = fp.block_hashes();

Comparing Fingerprints

Using Similarity Scores

use imgfprint::ImageFingerprinter;

let sim = ImageFingerprinter::compare(&fp1, &fp2);

println!("Score: {:.2}", sim.score);           // 0.0 - 1.0
println!("Exact match: {}", sim.exact_match);   // true/false
println!("Hamming distance: {}", sim.perceptual_distance); // 0-64

Direct Distance Methods

// Hamming distance between global hashes (0-64)
let dist = fp1.distance(&fp2);

// Check against threshold
if fp1.is_similar(&fp2, 0.85) {
    println!("Images are similar (threshold: 0.85)");
}

Semantic Embeddings

Creating Embeddings with Custom Provider

use imgfprint::{EmbeddingProvider, Embedding, ImgFprintError};

// Your provider implementation
struct HuggingFaceProvider { /* ... */ }

impl EmbeddingProvider for HuggingFaceProvider {
    fn embed(&self, image: &[u8]) -> Result<Embedding, ImgFprintError> {
        // Call HuggingFace Inference API
        let response = self.client.post("https://api-inference.huggingface.co/models/openai/clip-vit-base-patch32")
            .body(image.to_vec())
            .send()?;
        
        let vector: Vec<f32> = response.json()?;
        Embedding::new(vector)
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = HuggingFaceProvider;
    
    let img1 = std::fs::read("cat.jpg")?;
    let img2 = std::fs::read("dog.jpg")?;
    
    let emb1 = provider.embed(&img1)?;
    let emb2 = provider.embed(&img2)?;
    
    // Compare embeddings
    let similarity = imgfprint::semantic_similarity(&emb1, &emb2)?;
    println!("Similarity: {:.4}", similarity);
    
    Ok(())
}

Computing Cosine Similarity

use imgfprint::{semantic_similarity, Embedding};

let emb1 = Embedding::new(vec![0.1, 0.2, 0.3, 0.4])?;
let emb2 = Embedding::new(vec![0.2, 0.3, 0.4, 0.5])?;

// Returns f32 in range [-1.0, 1.0]
let sim = semantic_similarity(&emb1, &emb2)?;

Using ImageFingerprinter Methods

Alternatively, you can use the convenience methods on ImageFingerprinter:

use imgfprint::{ImageFingerprinter, EmbeddingProvider, Embedding, ImgFprintError};

struct MyProvider;

impl EmbeddingProvider for MyProvider {
    fn embed(&self, _image: &[u8]) -> Result<Embedding, ImgFprintError> {
        Embedding::new(vec![0.1, 0.2, 0.3, 0.4])
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = MyProvider;
    
    let img = std::fs::read("photo.jpg")?;
    let embedding = ImageFingerprinter::semantic_embedding(&provider, &img)?;
    
    // Compare two embeddings
    let similarity = ImageFingerprinter::semantic_similarity(&embedding, &embedding)?;
    println!("Similarity: {:.4}", similarity);
    
    Ok(())
}

Architecture

Fingerprint Structure (168 bytes)

ImageFingerprint
├── exact:      [u8; 32]     // SHA256 of original bytes
├── global_phash: u64        // DCT-based hash (center 32×32)
└── block_hashes: [u64; 16]  // DCT hashes (4×4 grid, 64×64 each)

Algorithm Pipeline

Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB
Normalize - Resize to 256×256 using SIMD-accelerated Lanczos3 filter
Convert - RGB → Grayscale (luminance)
Global Hash - Extract center 32×32 → DCT → pHash
Block Hashes - Split into 4×4 grid (64×64 blocks) → DCT → 16 pHashes
Exact Hash - SHA256 of original bytes

Similarity Computation

The similarity score is a weighted combination:

40% Global perceptual hash similarity
60% Block-level hash similarity (crop-resistant)

Block distances > 32 are filtered (handles cropping by ignoring missing blocks).

Embedding Validation

Embeddings are validated on creation:

✅ Non-empty vector
✅ All values finite (no NaN or infinity)
✅ Consistent dimensions for comparison

Performance

Benchmarked on AMD Ryzen 9 5900X (single core unless noted):

Operation	Time	Throughput
`fingerprint()`	1.35ms	~740 images/sec
`compare()`	385ns	2.6B comparisons/sec
`batch()` (10 images)	6.16ms	1,620 images/sec (parallel)
`semantic_similarity()`	~500ns	2M comparisons/sec

Note: semantic_similarity() performance depends on embedding dimension (typically 512-1024). The above is for 512-dim embeddings.

Run benchmarks:

cargo bench

Memory Safety

Maximum image dimension: 8192×8192 (OOM protection)
Dimension check before full decode
Pre-allocated buffers in context API
Zero-copy where possible

Error Handling

use imgfprint::{ImageFingerprinter, ImgFprintError};

match ImageFingerprinter::fingerprint(&bytes) {
    Ok(fp) => { /* success */ }
    
    // Image-related errors
    Err(ImgFprintError::InvalidImage(msg)) => { /* empty or too large */ }
    Err(ImgFprintError::UnsupportedFormat(msg)) => { /* format not supported */ }
    Err(ImgFprintError::DecodeError(msg)) => { /* corrupted data */ }
    Err(ImgFprintError::ImageTooSmall(msg)) => { /* < 32×32 pixels */ }
    
    // Processing errors
    Err(ImgFprintError::ProcessingError(msg)) => { /* internal error */ }
    
    // Embedding errors (new in 0.2)
    Err(ImgFprintError::EmbeddingDimensionMismatch { expected, actual }) => { /* dimension mismatch */ }
    Err(ImgFprintError::ProviderError(msg)) => { /* external provider failed */ }
    Err(ImgFprintError::InvalidEmbedding(msg)) => { /* empty vector or NaN */ }
}

Serialization

JSON

use imgfprint::ImageFingerprinter;
use serde_json;

let fp = ImageFingerprinter::fingerprint(&bytes)?;

// Serialize
let json = serde_json::to_string(&fp)?;

// Deserialize
let fp: ImageFingerprint = serde_json::from_str(&json)?;

Binary

use bincode;

// Serialize
let bytes = bincode::serialize(&fp)?;

// Deserialize
let fp: ImageFingerprint = bincode::deserialize(&bytes)?;

Security

OOM Protection: Maximum image size 8192×8192 pixels (configurable)
Deterministic Output: Same input always produces same output
No Panics: All error conditions return Result
Constant-Time Hashing: SHA256 computation
Input Validation: Comprehensive format and size validation

Comparison with Alternatives

Feature	imgfprint-rs	imagehash	img_hash
SHA256 exact	✅	❌	❌
pHash	✅	✅	✅
Block hashes	✅	❌	❌
Semantic embeddings	✅	❌	❌
Local ONNX inference	✅	❌	❌
Parallel batch	✅	❌	❌
SIMD acceleration	✅	❌	❌
Context API	✅	❌	❌

Examples

See the examples/ directory for complete working examples:

batch_process.rs - Process millions of images efficiently
compare_images.rs - Compare two images and show similarity
find_duplicates.rs - Find duplicate images in a directory
serialize.rs - Serialize/deserialize fingerprints to JSON and binary
similarity_search.rs - Perceptual similarity search in a directory
semantic_search.rs - Content-based image search with CLIP embeddings (requires local-embedding feature)

Run an example:

# Compare two images
cargo run --example compare_images -- images/photo1.jpg images/photo2.jpg

# Semantic search with local CLIP model (requires local-embedding feature)
cargo run --example semantic_search --features local-embedding -- model.onnx query.jpg ./images 0.85

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Run tests: cargo test
Run clippy: cargo clippy --all-targets -- -D warnings
Run benchmarks: cargo bench
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone
git clone https://github.com/themankindproject/imgfprint-rs
cd imgfprint-rs

# Run tests
cargo test

# Run with all features
cargo test --all-features

# Generate documentation
cargo doc --no-deps --open

imgfprint 0.1.2

imgfprint

Overview

Features

Installation

Feature Flags

Quick Start

Basic Fingerprinting

Semantic Embeddings

Local Embeddings (ONNX)

API Reference

Creating Fingerprints

Single Image

Batch Processing

High-Throughput Context

Accessing Fingerprint Components

Comparing Fingerprints

Using Similarity Scores

Direct Distance Methods

Semantic Embeddings

Creating Embeddings with Custom Provider

Computing Cosine Similarity

Using ImageFingerprinter Methods

Architecture

Fingerprint Structure (168 bytes)

Algorithm Pipeline

Similarity Computation

Embedding Validation

Performance

Memory Safety

Error Handling

Serialization

JSON

Binary

Security

Comparison with Alternatives

Examples

Contributing

Development Setup