rustdupe 0.1.0

Smart duplicate file finder with interactive TUI
Documentation
# RustDupe

[![CI](https://github.com/MasuRii/RustDupe/actions/workflows/ci.yml/badge.svg)](https://github.com/MasuRii/RustDupe/actions/workflows/ci.yml)
[![Crates.io](https://img.shields.io/crates/v/rustdupe.svg)](https://crates.io/crates/rustdupe)
[![Downloads](https://img.shields.io/crates/d/rustdupe.svg)](https://crates.io/crates/rustdupe)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Rust Version](https://img.shields.io/badge/rust-1.85%2B-blue.svg)](https://www.rust-lang.org)

**Smart Duplicate File Finder** — A high-performance, cross-platform duplicate file finder built in Rust with an interactive TUI.

![ScreenShot](public/images/rustdupe_tuiscreenshot.png)

---

## Table of Contents

- [Features]#features
- [Installation]#installation
- [Usage]#usage
- [CLI Reference]#cli-reference
- [Performance]#performance
- [Contributing]#contributing
- [License]#license

## Features

- **High Performance**: Parallel directory walking and BLAKE3 hashing for maximum speed.
- **Interactive TUI**: Review duplicate groups, preview files, and select copies for deletion in a navigable interface.
- **Multi-Phase Optimization**:
  1. Group by file size (instant filtering).
  2. Compare 4KB pre-hashes (fast rejection).
  3. Full content hash for final confirmation.
  4. Optional byte-by-byte verification (paranoid mode).
- **Safe Deletion**: Moves files to system trash by default (cross-platform support).
- **Hardlink Aware**: Automatically detects and skips hardlinks (same inode) to prevent false positives.
- **Unicode Support**: Handles macOS NFD vs. Windows/Linux NFC normalization issues.
- **Machine Readable**: Export results to JSON or CSV for scripting and automation.

## Installation

### From crates.io (Recommended)

```bash
cargo install rustdupe
```

> **Requires Rust 1.85 or later.** Install Rust via [rustup]https://rustup.rs/.

### Pre-built Binaries

Download the latest release for your platform from the [GitHub Releases](https://github.com/MasuRii/RustDupe/releases) page.

| Platform | Architecture | Download |
|----------|--------------|----------|
| Linux | x86_64 | `rustdupe-*-x86_64-unknown-linux-gnu` |
| Linux (musl) | x86_64 | `rustdupe-*-x86_64-unknown-linux-musl` |
| macOS | x86_64 | `rustdupe-*-x86_64-apple-darwin` |
| macOS | Apple Silicon | `rustdupe-*-aarch64-apple-darwin` |
| Windows | x86_64 | `rustdupe-*-x86_64-pc-windows-msvc.exe` |

### From Source

```bash
git clone https://github.com/MasuRii/RustDupe.git
cd rustdupe
cargo build --release
```

The binary will be available at `target/release/rustdupe`.

## Usage

### Basic Scan (Interactive TUI)

```bash
rustdupe scan ~/Downloads
```

### Non-Interactive Modes (Automation)

```bash
# Export to JSON
rustdupe scan ~/Documents --output json > duplicates.json

# Export to CSV
rustdupe scan /path/to/media --output csv > duplicates.csv
```

### Advanced Options

```bash
# Filter by size
rustdupe scan . --min-size 1MB --max-size 1GB

# Ignore specific patterns
rustdupe scan . --ignore "*.tmp" --ignore "node_modules"

# Enable paranoid byte-by-byte verification
rustdupe scan . --paranoid

# Custom I/O threads (default: 4)
rustdupe scan . --io-threads 8
```

## CLI Reference

```text
Usage: rustdupe [OPTIONS] <COMMAND>

Commands:
  scan  Scan a directory for duplicate files
  help  Print this message or the help of the given subcommand(s)

Arguments:
  <PATH>  Directory path to scan for duplicates

Options:
  -v, --verbose...       Increase verbosity level (-v for debug, -vv for trace)
  -q, --quiet            Suppress all output except errors
      --no-color         Disable colored output
  -h, --help             Print help
  -V, --version          Print version

Scan Subcommand Options:
  -o, --output <OUTPUT>  Output format (tui, json, csv) [default: tui]
      --min-size <SIZE>  Minimum file size to consider (e.g., 1KB, 1MB)
      --max-size <SIZE>  Maximum file size to consider (e.g., 1KB, 1MB)
  -i, --ignore <PATTERN> Glob patterns to ignore
      --follow-symlinks  Follow symbolic links
      --skip-hidden      Skip hidden files and directories
      --io-threads <N>   Number of I/O threads for hashing [default: 4]
      --paranoid         Enable byte-by-byte verification
      --permanent        Use permanent deletion instead of trash
  -y, --yes              Skip confirmation prompts
```

## Performance

RustDupe is optimized for speed through several techniques:

| Technique | Benefit |
|-----------|---------|
| **BLAKE3 hashing** | 2.8-10x faster than SHA-256, with multi-threaded scaling |
| **Parallel directory walking** | Uses `jwalk` for 4x faster traversal than sequential walking |
| **Multi-phase deduplication** | Early rejection via size grouping and 4KB pre-hashes |
| **Work-stealing thread pool** | Near-linear scaling with CPU cores via Rayon |

### Benchmarks

On a typical workstation (8-core CPU, NVMe SSD):

| Dataset | Files | Total Size | Time |
|---------|-------|------------|------|
| Home directory | ~50,000 | 100 GB | ~15s |
| Photo library | ~20,000 | 200 GB | ~25s |
| Source code | ~100,000 | 10 GB | ~5s |

> **Note**: Actual performance varies based on disk speed, file sizes, and duplicate ratio.

## Contributing

Contributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) before submitting a Pull Request.

### Quick Start

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines on:
- Development setup
- Code style and linting
- Testing requirements
- Commit message conventions

## Security

For security vulnerabilities, please see our [Security Policy](SECURITY.md).

## License

Distributed under the MIT License. See [LICENSE](LICENSE) for more information.

---

<p align="center">
  Made with ❤️ in Rust
</p>