Expand description
Image deduplication — identifies duplicate images across pages using content hashing to reduce output size and improve processing.
Structs§
- Deduplicated
Image - Information about a deduplicated image.
- Image
Fingerprint - An image fingerprint for deduplication.
- Image
Ref - An image reference before deduplication.
Functions§
- deduplicate
- Deduplicate a list of image references, grouping identical images.
- duplicate_
count - Count how many images are duplicates (total - unique).
- fingerprint
- Compute a fingerprint for image data using FNV-1a hash.
- savings_
ratio - Compute deduplication savings ratio (0.0 = no savings, 1.0 = all duplicates).