rustdupe 0.1.0

Smart duplicate file finder with interactive TUI
Documentation

RustDupe

CI Crates.io Downloads License: MIT Rust Version

Smart Duplicate File Finder — A high-performance, cross-platform duplicate file finder built in Rust with an interactive TUI.

ScreenShot


Table of Contents

Features

  • High Performance: Parallel directory walking and BLAKE3 hashing for maximum speed.
  • Interactive TUI: Review duplicate groups, preview files, and select copies for deletion in a navigable interface.
  • Multi-Phase Optimization:
    1. Group by file size (instant filtering).
    2. Compare 4KB pre-hashes (fast rejection).
    3. Full content hash for final confirmation.
    4. Optional byte-by-byte verification (paranoid mode).
  • Safe Deletion: Moves files to system trash by default (cross-platform support).
  • Hardlink Aware: Automatically detects and skips hardlinks (same inode) to prevent false positives.
  • Unicode Support: Handles macOS NFD vs. Windows/Linux NFC normalization issues.
  • Machine Readable: Export results to JSON or CSV for scripting and automation.

Installation

From crates.io (Recommended)

cargo install rustdupe

Requires Rust 1.85 or later. Install Rust via rustup.

Pre-built Binaries

Download the latest release for your platform from the GitHub Releases page.

Platform Architecture Download
Linux x86_64 rustdupe-*-x86_64-unknown-linux-gnu
Linux (musl) x86_64 rustdupe-*-x86_64-unknown-linux-musl
macOS x86_64 rustdupe-*-x86_64-apple-darwin
macOS Apple Silicon rustdupe-*-aarch64-apple-darwin
Windows x86_64 rustdupe-*-x86_64-pc-windows-msvc.exe

From Source

git clone https://github.com/MasuRii/RustDupe.git
cd rustdupe
cargo build --release

The binary will be available at target/release/rustdupe.

Usage

Basic Scan (Interactive TUI)

rustdupe scan ~/Downloads

Non-Interactive Modes (Automation)

# Export to JSON
rustdupe scan ~/Documents --output json > duplicates.json

# Export to CSV
rustdupe scan /path/to/media --output csv > duplicates.csv

Advanced Options

# Filter by size
rustdupe scan . --min-size 1MB --max-size 1GB

# Ignore specific patterns
rustdupe scan . --ignore "*.tmp" --ignore "node_modules"

# Enable paranoid byte-by-byte verification
rustdupe scan . --paranoid

# Custom I/O threads (default: 4)
rustdupe scan . --io-threads 8

CLI Reference

Usage: rustdupe [OPTIONS] <COMMAND>

Commands:
  scan  Scan a directory for duplicate files
  help  Print this message or the help of the given subcommand(s)

Arguments:
  <PATH>  Directory path to scan for duplicates

Options:
  -v, --verbose...       Increase verbosity level (-v for debug, -vv for trace)
  -q, --quiet            Suppress all output except errors
      --no-color         Disable colored output
  -h, --help             Print help
  -V, --version          Print version

Scan Subcommand Options:
  -o, --output <OUTPUT>  Output format (tui, json, csv) [default: tui]
      --min-size <SIZE>  Minimum file size to consider (e.g., 1KB, 1MB)
      --max-size <SIZE>  Maximum file size to consider (e.g., 1KB, 1MB)
  -i, --ignore <PATTERN> Glob patterns to ignore
      --follow-symlinks  Follow symbolic links
      --skip-hidden      Skip hidden files and directories
      --io-threads <N>   Number of I/O threads for hashing [default: 4]
      --paranoid         Enable byte-by-byte verification
      --permanent        Use permanent deletion instead of trash
  -y, --yes              Skip confirmation prompts

Performance

RustDupe is optimized for speed through several techniques:

Technique Benefit
BLAKE3 hashing 2.8-10x faster than SHA-256, with multi-threaded scaling
Parallel directory walking Uses jwalk for 4x faster traversal than sequential walking
Multi-phase deduplication Early rejection via size grouping and 4KB pre-hashes
Work-stealing thread pool Near-linear scaling with CPU cores via Rayon

Benchmarks

On a typical workstation (8-core CPU, NVMe SSD):

Dataset Files Total Size Time
Home directory ~50,000 100 GB ~15s
Photo library ~20,000 200 GB ~25s
Source code ~100,000 10 GB ~5s

Note: Actual performance varies based on disk speed, file sizes, and duplicate ratio.

Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting a Pull Request.

Quick Start

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines on:

  • Development setup
  • Code style and linting
  • Testing requirements
  • Commit message conventions

Security

For security vulnerabilities, please see our Security Policy.

License

Distributed under the MIT License. See LICENSE for more information.