RustDupe

Smart Duplicate File Finder — A high-performance, cross-platform duplicate file finder built in Rust with an interactive TUI.

ScreenShot

Features
Installation
Usage
CLI Reference
Performance
Contributing
License

Features

High Performance: Parallel directory walking and BLAKE3 hashing for maximum speed.
Interactive TUI: Review duplicate groups, preview files, and select copies for deletion in a navigable interface.
Multi-Phase Optimization:
1. Group by file size (instant filtering).
2. Compare 4KB pre-hashes (fast rejection).
3. Full content hash for final confirmation.
4. Optional byte-by-byte verification (paranoid mode).
Safe Deletion: Moves files to system trash by default (cross-platform support).
Hardlink Aware: Automatically detects and skips hardlinks (same inode) to prevent false positives.
Unicode Support: Handles macOS NFD vs. Windows/Linux NFC normalization issues.
Machine Readable: Export results to JSON or CSV for scripting and automation.

Installation

From crates.io (Recommended)

cargo install rustdupe

Requires Rust 1.85 or later. Install Rust via rustup.

Pre-built Binaries

Download the latest release for your platform from the GitHub Releases page.

Platform	Architecture	Download
Linux	x86_64	`rustdupe-*-x86_64-unknown-linux-gnu`
Linux (musl)	x86_64	`rustdupe-*-x86_64-unknown-linux-musl`
macOS	x86_64	`rustdupe-*-x86_64-apple-darwin`
macOS	Apple Silicon	`rustdupe-*-aarch64-apple-darwin`
Windows	x86_64	`rustdupe-*-x86_64-pc-windows-msvc.exe`

From Source

git clone https://github.com/MasuRii/RustDupe.git
cd rustdupe
cargo build --release

The binary will be available at target/release/rustdupe.

Usage

Basic Scan (Interactive TUI)

rustdupe scan ~/Downloads

Non-Interactive Modes (Automation)

# Export to JSON
rustdupe scan ~/Documents --output json > duplicates.json

# Export to CSV
rustdupe scan /path/to/media --output csv > duplicates.csv

Advanced Options

# Filter by size
rustdupe scan . --min-size 1MB --max-size 1GB

# Ignore specific patterns
rustdupe scan . --ignore "*.tmp" --ignore "node_modules"

# Enable paranoid byte-by-byte verification
rustdupe scan . --paranoid

# Custom I/O threads (default: 4)
rustdupe scan . --io-threads 8

CLI Reference

Usage: rustdupe [OPTIONS] <COMMAND>

Commands:
  scan  Scan a directory for duplicate files
  help  Print this message or the help of the given subcommand(s)

Arguments:
  <PATH>  Directory path to scan for duplicates

Options:
  -v, --verbose...       Increase verbosity level (-v for debug, -vv for trace)
  -q, --quiet            Suppress all output except errors
      --no-color         Disable colored output
  -h, --help             Print help
  -V, --version          Print version

Scan Subcommand Options:
  -o, --output <OUTPUT>  Output format (tui, json, csv) [default: tui]
      --min-size <SIZE>  Minimum file size to consider (e.g., 1KB, 1MB)
      --max-size <SIZE>  Maximum file size to consider (e.g., 1KB, 1MB)
  -i, --ignore <PATTERN> Glob patterns to ignore
      --follow-symlinks  Follow symbolic links
      --skip-hidden      Skip hidden files and directories
      --io-threads <N>   Number of I/O threads for hashing [default: 4]
      --paranoid         Enable byte-by-byte verification
      --permanent        Use permanent deletion instead of trash
  -y, --yes              Skip confirmation prompts

Performance

RustDupe is optimized for speed through several techniques:

Technique	Benefit
BLAKE3 hashing	2.8-10x faster than SHA-256, with multi-threaded scaling
Parallel directory walking	Uses `jwalk` for 4x faster traversal than sequential walking
Multi-phase deduplication	Early rejection via size grouping and 4KB pre-hashes
Work-stealing thread pool	Near-linear scaling with CPU cores via Rayon

Benchmarks

On a typical workstation (8-core CPU, NVMe SSD):

Dataset	Files	Total Size	Time
Home directory	~50,000	100 GB	~15s
Photo library	~20,000	200 GB	~25s
Source code	~100,000	10 GB	~5s

Note: Actual performance varies based on disk speed, file sizes, and duplicate ratio.

Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting a Pull Request.

Quick Start

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidelines on:

Development setup
Code style and linting
Testing requirements
Commit message conventions

Security

For security vulnerabilities, please see our Security Policy.

License

Distributed under the MIT License. See LICENSE for more information.

rustdupe 0.1.0