compare-dir
Compare two directories and show the differences, or find duplicated files within a single directory.
- For two directories, it compares the file contents if the file sizes are the same. Useful to verify backup copies.
- For a single directory, it cryptographically hashes matching file sizes to discover exact duplicates.
Installation
See Releases for the change history.
Usage
Compare two directories:
Find duplicated files in a single directory:
Please use the -h option to see all options.
Symbols
When comparing two directories,
the output is human-readable by default.
The --symbol (or -s) option changes the output format to be symbolized,
which is easier for programs to read.
| Position | Character | Meaning |
|---|---|---|
| 1st | = |
In both directories. |
> |
Only in dir1. |
|
< |
Only in dir2. |
|
| 2nd | = |
Modified time are the same. |
> |
dir1 is newer. |
|
< |
dir2 is newer. |
|
| 3rd | = |
Same file sizes and contents. |
! |
Same file sizes but contents differ. | |
> |
dir1 is larger. |
|
< |
dir2 is larger. |
For example:
=>= dir/path
means that dir/path in dir1 is newer than the file in dir2,
but they have the same file sizes and contents.
The following bash example creates a list of paths of the same contents.
| |
If you prefer sed over cut:
| |
To do this in PowerShell:
compare-dir -s <dir1> <dir2> | sls '^..=' | %{$_ -replace '^....',''}
Hash
compare-dir uses file hashes
when comparing file contents if file sizes are the same, and
when finding duplicated files.
The --compare (or -c) option can change
how files are compared.
--compare |
Meaning |
|---|---|
| size | Compare by file sizes only. |
| hash | Compare file contents by their hashes. |
| rehash | Same as hash, but recompute hashes without using the data in the hash cache. |
| full | Compare file contents byte-by-byte. |
Hash conflicts are unlikely, but -c full can help to double check.
Hash Cache
File hashes are saved to a file named .hash_cache
to make subsequent runs faster.
If file contents are changed without changing their modified time,
the cache needs to be invalidated.
You can invalidate the hash cache
by the -c rehash option,
or by deleting the cache file.
[!NOTE] When backing up, do not copy
.hash_cacheif you intend to use this tool to verify backup copies.
Hash Cache Directory
The directory to create the cache is determined by following steps:
- Find
.hash_cachein the specified directory. - If not found, try to find it in its ancestor directories.
- If not found, create it in the specified directory.
You can create the cache file in one of ancestor directories. This is useful if you may want to run the tool for the parent directory. For example:
Then ~/data/.hash_cache is used as the cache file,
instead of ~/data/subdir/.hash_cache.