# Duplicate File Finder
A fast and efficient tool to detect duplicate files in a directory based on file content.
---
## Features
- **Partial Hashing** for quick initial grouping (reads first 4 KB).
- **Full Hashing** for final confirmation (full file read or memory-mapped).
- **Parallelized** using Rayon for high performance.
- **Progress Bars** for visual feedback.
- **Supports large datasets** and very large files.
- **Colored terminal output** for better readability.
---
## Usage
### 1. Install Rust (if you don't have it)
```bash
### 2. Clone and build the project
```bash
git clone https://github.com/yourusername/duplicate-file-finder.git
cd duplicate-file-finder
cargo build --release
```
### 3. Run the program
```bash
cargo run -- --path /path/to/your/directory
```
Or using the compiled release binary:
```bash
./target/release/duplicate-file-finder --path /path/to/your/directory
```
---
## Example
```bash
cargo run -- --path ./Downloads
```
Sample output:
```
Scanning files...
Found 5321 files. Computing partial hashes...
Grouping files by partial hash...
421 candidate files after partial hashing. Computing full hashes...
Grouping by full hash...
❌ Duplicates found:
Group 1 (2 files) - Hash: d2f1d7e91c8b...
/path/to/file1.jpg
/path/to/file1_copy.jpg
Group 2 (3 files) - Hash: a34e1b1fe98d...
/path/to/doc1.pdf
/path/to/backup/doc1.pdf
/path/to/archive/old/doc1.pdf
Found 2 duplicate groups.
Summary: Scanned 5321 files in 1m 12s.
```
---
## Command-Line Arguments
| `--path` or `-p` | Directory to scan recursively | `--path ./Documents` |
---
## How It Works
- **Step 1**: Scan all files under the given directory recursively.
- **Step 2**: Compute a partial hash (first 4KB) of each file.
- **Step 3**: Group files with identical partial hashes.
- **Step 4**: Compute full hashes for the candidate groups.
- **Step 5**: Report groups of true duplicates based on full file content.
This two-step approach makes it **very fast** even for very large folders.
---
## Dependencies
This project uses:
- [`blake3`](https://docs.rs/blake3/latest/blake3/) for fast cryptographic hashing.
- [`clap`](https://docs.rs/clap/latest/clap/) for argument parsing.
- [`rayon`](https://docs.rs/rayon/latest/rayon/) for parallel processing.
- [`indicatif`](https://docs.rs/indicatif/latest/indicatif/) for progress bars.
- [`colored`](https://docs.rs/colored/latest/colored/) for colored terminal output.
- [`walkdir`](https://docs.rs/walkdir/latest/walkdir/) for recursive file walking.
- [`memmap2`](https://docs.rs/memmap2/latest/memmap2/) for memory-mapping large files.
Install all dependencies automatically when you run `cargo build`.
---
## License
This project is licensed under the MIT License. See [`LICENSE`](LICENSE) for more information.