Skip to main content

Module network_dedup

Module network_dedup 

Source
Expand description

Network-aware deduplication for distributed media libraries.

This module provides mechanisms to deduplicate media across multiple nodes in a distributed system. Rather than requiring every node to download every file, nodes exchange compact fingerprint manifests and only transfer content when necessary.

§Design

Each node maintains a local NodeManifest containing fingerprint summaries (Blake3 hex digest, perceptual hash bits, duration, file size) for its local media files. Manifests are serialisable as JSON so they can be transmitted over HTTP or any byte channel without coupling to a particular transport.

The NetworkDedupEngine accepts manifests from multiple remote nodes and computes cross-node duplicate groups by:

  1. Exact match – identical Blake3 digests → definite duplicate.
  2. Perceptual match – Hamming distance on 64-bit pHash ≤ configured threshold → near-duplicate candidate.
  3. Duration guard – files with very different durations (> duration_tolerance_s) are excluded from perceptual matching to reduce false positives.

§Example

use oximedia_dedup::network_dedup::{
    NetworkDedupEngine, NetworkDedupConfig, NodeManifest, FileRecord,
};

let mut engine = NetworkDedupEngine::new(NetworkDedupConfig::default());

let mut manifest_a = NodeManifest::new("node-a".to_string());
manifest_a.add_file(FileRecord::new(
    "node-a:/videos/movie.mp4".to_string(),
    "abcdef01".repeat(8),
    Some(0xDEAD_BEEF_1234_5678),
    Some(7200.0),
    Some(4_000_000_000),
));

let mut manifest_b = NodeManifest::new("node-b".to_string());
manifest_b.add_file(FileRecord::new(
    "node-b:/archive/movie_copy.mp4".to_string(),
    "abcdef01".repeat(8),
    Some(0xDEAD_BEEF_1234_5678),
    Some(7200.0),
    Some(4_000_000_000),
));

engine.add_manifest(manifest_a);
engine.add_manifest(manifest_b);

let groups = engine.find_cross_node_duplicates();
assert!(!groups.is_empty());

Structs§

CrossNodeGroup
A group of cross-node duplicate files.
CrossNodeSummary
Summary of cross-node deduplication results.
FileRecord
A single file entry within a node’s manifest.
NetworkDedupConfig
Configuration for the NetworkDedupEngine.
NetworkDedupEngine
Engine for detecting duplicates across distributed media nodes.
NodeManifest
Fingerprint manifest for a single node.

Enums§

DuplicateMethod
Detection method for cross-node duplicates.