Expand description
Media deduplication and duplicate detection for OxiMedia.
oximedia-dedup provides comprehensive duplicate detection and media deduplication
for the OxiMedia multimedia framework. This includes:
- Cryptographic hashing: BLAKE3-based exact duplicate detection
- Visual similarity: Perceptual hashing, SSIM, histogram, and feature matching
- Audio fingerprinting: Audio fingerprint comparison and waveform similarity
- Metadata matching: Fuzzy metadata comparison for near-duplicates
- Storage optimization: Fast SQLite-based indexing for large libraries
- Reporting: Comprehensive duplicate reports with similarity scoring
§Modules
hash: Cryptographic and content-based hashingvisual: Visual similarity detectionaudio: Audio fingerprint comparisonmetadata: Metadata-based deduplicationdatabase: SQLite-based indexing and lookupreport: Duplicate detection reports
§Example
use oximedia_dedup::{DuplicateDetector, DetectionStrategy, DedupConfig};
let config = DedupConfig::default();
let mut detector = DuplicateDetector::new(config).await?;
// Add files to the index
detector.add_file("/path/to/video1.mp4").await?;
detector.add_file("/path/to/video2.mp4").await?;
// Find duplicates
let duplicates = detector.find_duplicates(DetectionStrategy::All).await?;Re-exports§
pub use report::DuplicateGroup;pub use report::DuplicateReport;pub use report::SimilarityScore;
Modules§
- audio
- Audio fingerprinting and similarity detection for deduplication.
- bloom_
filter - Near-duplicate detection using a Bloom filter.
- cluster
- Duplicate clustering: similarity groups, cluster merging, representative selection.
- content_
id - Content ID and fingerprinting for media assets.
- content_
signature - Content-signature types for robust media identification.
- cross_
format - Cross-format duplicate detection: same content in different containers/codecs.
- dedup_
cache - LRU cache for deduplication hash lookups.
- dedup_
index - Persistent-style deduplication index.
- dedup_
policy - Policy types for controlling deduplication behaviour.
- dedup_
report - Deduplication reporting: statistics, summaries, and formatted reports.
- dedup_
report_ ext - Extended deduplication reporting and statistics.
- dedup_
stats - Extended deduplication statistics: space savings, group statistics, action recommendations.
- frame_
hash - Frame-level hash types for fast perceptual deduplication.
- fuzzy_
match - Fuzzy / approximate matching for media deduplication.
- hash
- Cryptographic and content-based hashing for deduplication.
- hash_
store - Persistent hash store for deduplication lookups.
- incremental
- Incremental deduplication: only scan new or modified files.
- lsh_
index - Locality-Sensitive Hashing (LSH) index for approximate nearest-neighbour deduplication of high-dimensional media feature vectors.
- merge_
strategy - Merge strategies for resolving duplicate file groups.
- metadata
- Metadata-based deduplication and fuzzy matching.
- near_
duplicate - Near-duplicate detection using locality-sensitive hashing (LSH).
- perceptual_
hash - Perceptual hashing for image/video deduplication.
- phash
- Perceptual hashing (pHash) and near-duplicate detection for video frames.
- progress
- Progress reporting callbacks for long-running deduplication operations.
- report
- Duplicate detection reports and recommendations.
- rolling_
hash - Rolling hash for content-defined chunking in media deduplication.
- segment_
dedup - Segment-level deduplication for media streams.
- similarity_
index - Similarity index: fast lookup structures for near-duplicate candidate retrieval.
- video_
dedup - Video-level deduplication.
- video_
segment_ dedup - Video segment deduplication using perceptual hashing and temporal windowing.
- visual
- Visual similarity detection for image and video deduplication.
Structs§
- Dedup
Config - Configuration for deduplication.
- Dedup
Stats - Deduplication statistics.
Enums§
- Dedup
Error - Deduplication error type.
- Detection
Strategy - Detection strategy for finding duplicates.
Type Aliases§
- Dedup
Result - Deduplication result type.