Skip to main content

Crate oximedia_dedup

Crate oximedia_dedup 

Source
Expand description

Media deduplication and duplicate detection for OxiMedia.

oximedia-dedup provides comprehensive duplicate detection and media deduplication for the OxiMedia multimedia framework. This includes:

  • Cryptographic hashing: BLAKE3-based exact duplicate detection
  • Visual similarity: Perceptual hashing, SSIM, histogram, and feature matching
  • Audio fingerprinting: Audio fingerprint comparison and waveform similarity
  • Metadata matching: Fuzzy metadata comparison for near-duplicates
  • Storage optimization: Fast SQLite-based indexing for large libraries
  • Reporting: Comprehensive duplicate reports with similarity scoring

§Modules

  • hash: Cryptographic and content-based hashing
  • visual: Visual similarity detection
  • audio: Audio fingerprint comparison
  • metadata: Metadata-based deduplication
  • database: SQLite-based indexing and lookup
  • report: Duplicate detection reports

§Example

use oximedia_dedup::{DuplicateDetector, DetectionStrategy, DedupConfig};

let config = DedupConfig::default();
let mut detector = DuplicateDetector::new(config).await?;

// Add files to the index
detector.add_file("/path/to/video1.mp4").await?;
detector.add_file("/path/to/video2.mp4").await?;

// Find duplicates
let duplicates = detector.find_duplicates(DetectionStrategy::All).await?;

Re-exports§

pub use report::DuplicateGroup;
pub use report::DuplicateReport;
pub use report::SimilarityScore;

Modules§

audio
Audio fingerprinting and similarity detection for deduplication.
bloom_filter
Near-duplicate detection using a Bloom filter.
cluster
Duplicate clustering: similarity groups, cluster merging, representative selection.
content_id
Content ID and fingerprinting for media assets.
content_signature
Content-signature types for robust media identification.
cross_format
Cross-format duplicate detection: same content in different containers/codecs.
dedup_cache
LRU cache for deduplication hash lookups.
dedup_index
Persistent-style deduplication index.
dedup_policy
Policy types for controlling deduplication behaviour.
dedup_report
Deduplication reporting: statistics, summaries, and formatted reports.
dedup_report_ext
Extended deduplication reporting and statistics.
dedup_stats
Extended deduplication statistics: space savings, group statistics, action recommendations.
frame_hash
Frame-level hash types for fast perceptual deduplication.
fuzzy_match
Fuzzy / approximate matching for media deduplication.
hash
Cryptographic and content-based hashing for deduplication.
hash_store
Persistent hash store for deduplication lookups.
incremental
Incremental deduplication: only scan new or modified files.
lsh_index
Locality-Sensitive Hashing (LSH) index for approximate nearest-neighbour deduplication of high-dimensional media feature vectors.
merge_strategy
Merge strategies for resolving duplicate file groups.
metadata
Metadata-based deduplication and fuzzy matching.
near_duplicate
Near-duplicate detection using locality-sensitive hashing (LSH).
perceptual_hash
Perceptual hashing for image/video deduplication.
phash
Perceptual hashing (pHash) and near-duplicate detection for video frames.
progress
Progress reporting callbacks for long-running deduplication operations.
report
Duplicate detection reports and recommendations.
rolling_hash
Rolling hash for content-defined chunking in media deduplication.
segment_dedup
Segment-level deduplication for media streams.
similarity_index
Similarity index: fast lookup structures for near-duplicate candidate retrieval.
video_dedup
Video-level deduplication.
video_segment_dedup
Video segment deduplication using perceptual hashing and temporal windowing.
visual
Visual similarity detection for image and video deduplication.

Structs§

DedupConfig
Configuration for deduplication.
DedupStats
Deduplication statistics.

Enums§

DedupError
Deduplication error type.
DetectionStrategy
Detection strategy for finding duplicates.

Type Aliases§

DedupResult
Deduplication result type.