find_duplicate_files
Find duplicate files according to their size and hashing algorithm.
"A hash function is a mathematical algorithm that takes an input (in this case, a file) and produces a fixed-size string of characters, known as a hash value or checksum. This hash value is unique to the input data, meaning even a slight change in the input will result in a completely different hash value."
Hash algorithm options are:
-
blake version 3 (default)
To find duplicate files in a directory, run the command:
find_duplicate_files
Another example.
To find duplicate files with fxhash
algorithm and yaml
format:
find_duplicate_files -tsa fxhash -r yaml
Help
Type in the terminal find_duplicate_files -h
to see the help messages and all available options:
find duplicate files according to their size and hashing algorithm
Usage: find_duplicate_files [OPTIONS]
Options:
-a, --algorithm <ALGORITHM>
Choose the hash algorithm [default: blake3] [possible values: blake3, fxhash, sha256, sha512]
-f, --full_path
Prints full path of duplicate files, otherwise relative path
-g, --generate <GENERATOR>
If provided, outputs the completion file for given shell [possible values: bash, elvish, fish, powershell, zsh]
-m, --max_depth <MAX_DEPTH>
Set the maximum depth to search for duplicate files
-o, --omit_hidden
Omit hidden files (starts with '.'), otherwise search all files
-p, --path <PATH>
Set the path where to look for duplicate files, otherwise use the current directory
-r, --result_format <RESULT_FORMAT>
Print the result in the chosen format [default: personal] [possible values: json, yaml, personal]
-s, --sort
Sort result by file size, otherwise sort by number of duplicate files
-t, --time
Show total execution time
-h, --help
Print help (see more with '--help')
-V, --version
Print version
Building
To build and install from source, run the following command:
cargo install find_duplicate_files
Another option is to clone/copy the project from github, compile and generate the executable:
git clone https://github.com/claudiofsr/find_duplicate_files.git
cd find_duplicate_files
cargo b -r && cargo install --path=.
Mutually exclusive features
Walking a directory recursively: jwalk or walkdir.
In general, jwalk (default) is faster than walkdir.
But if you prefer to use walkdir:
cargo b -r && cargo install --path=. --features walkdir