Skip to main content

Module image_dedup

Module image_dedup 

Source
Expand description

Image deduplication — identifies duplicate images across pages using content hashing to reduce output size and improve processing.

Structs§

DeduplicatedImage
Information about a deduplicated image.
ImageFingerprint
An image fingerprint for deduplication.
ImageRef
An image reference before deduplication.

Functions§

deduplicate
Deduplicate a list of image references, grouping identical images.
duplicate_count
Count how many images are duplicates (total - unique).
fingerprint
Compute a fingerprint for image data using FNV-1a hash.
savings_ratio
Compute deduplication savings ratio (0.0 = no savings, 1.0 = all duplicates).