catfish 0.1.1

catfish is a CLI tool that compares two directories by hashing all files. It reports which files are in the 'right' folder but not in 'left', regardless of how things were moved or renamed. Great for making sure your 'left' folder has all the files from the 'right' one.
# catfish 🥸

**Because sometimes files pretend to be something they’re not.**  
_(No matter the name or location, catfish will find out if they’re the same.)_

> I needed this functionality today and threw this tool together.
> I decided to share it here, in case someone else finds it useful!

## Why “catfish”?

- **cat** is a Unix tool.
- **fish** … well, we’re fishing for the truth in your file system.
- A catfish is a sneaky creature, just like files that might be identical under different names or locations.
- catfish 🥸 unmask duplicates for what they really are.

## What does it do?

`catfish` recursively scans two folders—let’s call them “left” and “right”—and hashes every file (using SHA256).

- If a file in the **right** folder has the same content (hash) as **any** file in the **left**
  folder, it won’t be listed.
- We **don’t** check the file’s location in the left folder. Any matching hash anywhere in “left” is
  enough to exclude it.
- We **don’t** check for duplicates within the left folder itself—if “left” has duplicates, that’s
  not our concern.
- We can optionally ignore duplicates in the **right** folder, so that only the **first** occurrence
  of any given hash in “right” is shown.

**Backstory**:

Some time ago, I switched from cloud provider X to cloud provider Y. I had both drives fully synced
locally (a full copy, not a "lite" sync), so I copied all my files from X to Y and then turned off
sync for X. But I forgot to delete the local X folder, and ended up adding new files to it by
mistake. When I went to delete it, a simple path comparison with Y wasn't enough because I'd moved
and renamed files in Y, which would have caused a lot of false positives. What I really needed was
to find out which files in X didn't exist anywhere in Y - so I could copy them over if necessary,
and then safely delete X without losing anything important.

## Installation

1. **Ensure you have [Rust]https://www.rust-lang.org/ and Cargo installed.**
2. Run:
   ```bash
   cargo install catfish
   ```
   or clone this repo and:
   ```bash
   git clone https://github.com/samvdst/catfish.git
   cd catfish
   cargo install --path .
   ```
3. That’s it! You can now run `catfish` from anywhere.

## Usage

```bash
catfish [OPTIONS] <LEFT_FOLDER> <RIGHT_FOLDER>
```

- **`-i, --ignore-duplicates`**: if there are multiple files with the same hash in `RIGHT_FOLDER`, only list the first occurrence.

## Example

Suppose we have two folders: foo (left) and bar (right). In bar, we have a file that appears twice with identical content.

```bash
catfish foo bar
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example_dupe.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt
```

If we then run:

```bash
catfish foo bar --ignore-duplicates
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt
```

## Contributing

Ideas, improvements, and pull requests are always welcome.
But please note: I can’t guarantee that I’ll have much time to work on this.
So if you open a PR, thanks in advance for your patience!