oximedia-dedup 0.1.7

Media deduplication and duplicate detection for OxiMedia
Documentation

oximedia-dedup

Status: Stable Version: 0.1.7

Media deduplication and duplicate detection for OxiMedia, providing cryptographic, visual, audio, and metadata-based duplicate finding with SQLite-backed indexing.

Part of the oximedia workspace — a comprehensive pure-Rust media processing framework.

Version: 0.1.7 — 2026-05-16 — 647 tests

Features

  • Cryptographic Hashing — BLAKE3-based exact duplicate detection
  • Visual Similarity — Perceptual hashing, SSIM, histogram matching, and feature matching
  • Audio Fingerprinting — Audio fingerprint comparison and waveform similarity
  • Metadata Matching — Fuzzy metadata comparison for near-duplicates
  • Rolling Hash — Segment-level deduplication for partial matches
  • LSH Index — Locality-sensitive hashing for fast approximate nearest neighbor search
  • Bloom Filter — Probabilistic fast duplicate screening
  • Cluster Analysis — Group similar media into clusters
  • Storage Optimization — SQLite-based indexing for large libraries
  • Comprehensive Reporting — Duplicate reports with similarity scoring
  • Content ID — Content-based identity tracking
  • Content Signatures — Robust perceptual signatures
  • Dedup Policy — Configurable dedup policies
  • Fuzzy Matching — Fuzzy metadata and filename matching
  • Merge StrategyMergeExecutor::apply() / dry_run() with AppliedAction { Symlinked, Hardlinked, Deleted, Kept, Skipped } and MergeReport; Unix symlink + hardlink; Windows symlink_file with fallback
  • Segment Dedup — Segment-level partial matching
  • Similarity Index — Fast similarity index
  • Video Dedup — Video-specific deduplication

Usage

Add to your Cargo.toml:

[dependencies]
oximedia-dedup = "0.1.7"
use oximedia_dedup::{DuplicateDetector, DetectionStrategy, DedupConfig};

async fn example() -> Result<(), Box<dyn std::error::Error>> {
    let config = DedupConfig::default();
    let mut detector = DuplicateDetector::new(config).await?;

    detector.add_file("/path/to/video1.mp4").await?;
    detector.add_file("/path/to/video2.mp4").await?;

    let duplicates = detector.find_duplicates(DetectionStrategy::All).await?;
    Ok(())
}

API Overview

Core types:

  • DuplicateDetector — Main deduplication engine
  • DedupConfig — Configuration (thresholds, paths, parallel mode)
  • DetectionStrategy — ExactHash / PerceptualHash / Ssim / Histogram / FeatureMatch / AudioFingerprint / Metadata / All / VisualAll / Fast
  • DuplicateGroup, DuplicateReport, SimilarityScore — Results
  • DedupDatabase — SQLite index backend
  • DedupStats — Index statistics

Modules:

  • hash, frame_hash, rolling_hash — Hashing strategies
  • visual, perceptual_hash, phash — Visual similarity
  • audio — Audio fingerprint comparison
  • metadata — Metadata-based matching
  • database, hash_store, dedup_index — Storage backends
  • lsh_index — Locality-sensitive hashing index
  • bloom_filter — Probabilistic screening
  • cluster — Similarity clustering
  • near_duplicate — Near-duplicate detection
  • report, dedup_report, dedup_report_ext — Reporting
  • content_id, content_signature — Content identity
  • dedup_cache, dedup_policy — Caching and policy
  • dedup_stats — Statistics
  • fuzzy_match — Fuzzy matching
  • merge_strategyMergeExecutor, AppliedAction, MergeReport — real FS duplicate resolution
  • segment_dedup — Segment dedup
  • similarity_index — Similarity index
  • video_dedup — Video-specific dedup

License

Apache-2.0 — Copyright 2024-2026 COOLJAPAN OU (Team Kitasan)