Expand description
Network-aware deduplication for distributed media libraries.
This module provides mechanisms to deduplicate media across multiple nodes in a distributed system. Rather than requiring every node to download every file, nodes exchange compact fingerprint manifests and only transfer content when necessary.
§Design
Each node maintains a local NodeManifest containing fingerprint summaries
(Blake3 hex digest, perceptual hash bits, duration, file size) for its local
media files. Manifests are serialisable as JSON so they can be transmitted over
HTTP or any byte channel without coupling to a particular transport.
The NetworkDedupEngine accepts manifests from multiple remote nodes and
computes cross-node duplicate groups by:
- Exact match – identical Blake3 digests → definite duplicate.
- Perceptual match – Hamming distance on 64-bit pHash ≤ configured threshold → near-duplicate candidate.
- Duration guard – files with very different durations (>
duration_tolerance_s) are excluded from perceptual matching to reduce false positives.
§Example
use oximedia_dedup::network_dedup::{
NetworkDedupEngine, NetworkDedupConfig, NodeManifest, FileRecord,
};
let mut engine = NetworkDedupEngine::new(NetworkDedupConfig::default());
let mut manifest_a = NodeManifest::new("node-a".to_string());
manifest_a.add_file(FileRecord::new(
"node-a:/videos/movie.mp4".to_string(),
"abcdef01".repeat(8),
Some(0xDEAD_BEEF_1234_5678),
Some(7200.0),
Some(4_000_000_000),
));
let mut manifest_b = NodeManifest::new("node-b".to_string());
manifest_b.add_file(FileRecord::new(
"node-b:/archive/movie_copy.mp4".to_string(),
"abcdef01".repeat(8),
Some(0xDEAD_BEEF_1234_5678),
Some(7200.0),
Some(4_000_000_000),
));
engine.add_manifest(manifest_a);
engine.add_manifest(manifest_b);
let groups = engine.find_cross_node_duplicates();
assert!(!groups.is_empty());Structs§
- Cross
Node Group - A group of cross-node duplicate files.
- Cross
Node Summary - Summary of cross-node deduplication results.
- File
Record - A single file entry within a node’s manifest.
- Network
Dedup Config - Configuration for the
NetworkDedupEngine. - Network
Dedup Engine - Engine for detecting duplicates across distributed media nodes.
- Node
Manifest - Fingerprint manifest for a single node.
Enums§
- Duplicate
Method - Detection method for cross-node duplicates.