oximedia-dedup
Media deduplication and duplicate detection for OxiMedia, providing cryptographic, visual, audio, and metadata-based duplicate finding with SQLite-backed indexing.
Part of the oximedia workspace — a comprehensive pure-Rust media processing framework.
Version: 0.1.7 — 2026-05-16 — 647 tests
Features
- Cryptographic Hashing — BLAKE3-based exact duplicate detection
- Visual Similarity — Perceptual hashing, SSIM, histogram matching, and feature matching
- Audio Fingerprinting — Audio fingerprint comparison and waveform similarity
- Metadata Matching — Fuzzy metadata comparison for near-duplicates
- Rolling Hash — Segment-level deduplication for partial matches
- LSH Index — Locality-sensitive hashing for fast approximate nearest neighbor search
- Bloom Filter — Probabilistic fast duplicate screening
- Cluster Analysis — Group similar media into clusters
- Storage Optimization — SQLite-based indexing for large libraries
- Comprehensive Reporting — Duplicate reports with similarity scoring
- Content ID — Content-based identity tracking
- Content Signatures — Robust perceptual signatures
- Dedup Policy — Configurable dedup policies
- Fuzzy Matching — Fuzzy metadata and filename matching
- Merge Strategy —
MergeExecutor::apply()/dry_run()withAppliedAction { Symlinked, Hardlinked, Deleted, Kept, Skipped }andMergeReport; Unix symlink + hardlink; Windows symlink_file with fallback - Segment Dedup — Segment-level partial matching
- Similarity Index — Fast similarity index
- Video Dedup — Video-specific deduplication
Usage
Add to your Cargo.toml:
[]
= "0.1.7"
use ;
async
API Overview
Core types:
DuplicateDetector— Main deduplication engineDedupConfig— Configuration (thresholds, paths, parallel mode)DetectionStrategy— ExactHash / PerceptualHash / Ssim / Histogram / FeatureMatch / AudioFingerprint / Metadata / All / VisualAll / FastDuplicateGroup,DuplicateReport,SimilarityScore— ResultsDedupDatabase— SQLite index backendDedupStats— Index statistics
Modules:
hash,frame_hash,rolling_hash— Hashing strategiesvisual,perceptual_hash,phash— Visual similarityaudio— Audio fingerprint comparisonmetadata— Metadata-based matchingdatabase,hash_store,dedup_index— Storage backendslsh_index— Locality-sensitive hashing indexbloom_filter— Probabilistic screeningcluster— Similarity clusteringnear_duplicate— Near-duplicate detectionreport,dedup_report,dedup_report_ext— Reportingcontent_id,content_signature— Content identitydedup_cache,dedup_policy— Caching and policydedup_stats— Statisticsfuzzy_match— Fuzzy matchingmerge_strategy—MergeExecutor,AppliedAction,MergeReport— real FS duplicate resolutionsegment_dedup— Segment dedupsimilarity_index— Similarity indexvideo_dedup— Video-specific dedup
License
Apache-2.0 — Copyright 2024-2026 COOLJAPAN OU (Team Kitasan)