catfish 🥸
Because sometimes files pretend to be something they’re not.
(No matter the name or location, catfish will find out if they’re the same.)
I needed this functionality today and threw this tool together. I decided to share it here, in case someone else finds it useful!
Why “catfish”?
- cat is a Unix tool.
- fish … well, we’re fishing for the truth in your file system.
- A catfish is a sneaky creature, just like files that might be identical under different names or locations.
- catfish 🥸 unmask duplicates for what they really are.
What does it do?
catfish recursively scans two folders—let’s call them “left” and “right”—and hashes every file (using SHA256).
- If a file in the right folder has the same content (hash) as any file in the left folder, it won’t be listed.
- We don’t check the file’s location in the left folder. Any matching hash anywhere in “left” is enough to exclude it.
- We don’t check for duplicates within the left folder itself—if “left” has duplicates, that’s not our concern.
- We can optionally ignore duplicates in the right folder, so that only the first occurrence of any given hash in “right” is shown.
Backstory:
Some time ago, I switched from cloud provider X to cloud provider Y. I had both drives fully synced locally (a full copy, not a "lite" sync), so I copied all my files from X to Y and then turned off sync for X. But I forgot to delete the local X folder, and ended up adding new files to it by mistake. When I went to delete it, a simple path comparison with Y wasn't enough because I'd moved and renamed files in Y, which would have caused a lot of false positives. What I really needed was to find out which files in X didn't exist anywhere in Y - so I could copy them over if necessary, and then safely delete X without losing anything important.
Installation
- Ensure you have Rust and Cargo installed.
- Run:
or clone this repo and: - That’s it! You can now run
catfishfrom anywhere.
Usage
-i, --ignore-duplicates: if there are multiple files with the same hash inRIGHT_FOLDER, only list the first occurrence.
Example
Suppose we have two folders: foo (left) and bar (right). In bar, we have a file that appears twice with identical content.
If we then run:
Contributing
Ideas, improvements, and pull requests are always welcome. But please note: I can’t guarantee that I’ll have much time to work on this. So if you open a PR, thanks in advance for your patience!