cargo-dupes
A cargo subcommand that detects duplicate and near-duplicate code blocks in Rust codebases.
How It Works
cargo-dupes parses Rust source files into ASTs using syn, then normalizes each function, method, and closure into a canonical form where:
- Identifiers are replaced with positional placeholders (so
foo(x)andbar(y)are identical) - Literal values are erased but types preserved (
42and99are both "integer literal") - Control flow structure is preserved exactly
- Macro invocations become opaque nodes
This normalized AST is hashed into a fingerprint for exact duplicate detection, and compared tree-by-tree using the Dice coefficient for near-duplicate detection.
Installation
Or, to run directly:
When installed, it's available as a cargo subcommand:
Usage
cargo dupes [OPTIONS] [COMMAND]
Commands:
stats Show duplication statistics only
report Show full duplication report (default)
check Check for duplicates and exit with non-zero if thresholds exceeded
ignore Add a fingerprint to the ignore list
ignored List all ignored fingerprints
Options:
-p, --path <PATH> Path to analyze (defaults to current directory)
--min-nodes <MIN_NODES> Minimum AST node count for analysis [default: 10]
--min-lines <MIN_LINES> Minimum source line count for analysis [default: 0 (disabled)]
--threshold <THRESHOLD> Similarity threshold for near-duplicates (0.0-1.0) [default: 0.8]
--format <FORMAT> Output format [default: text] [possible values: text, json]
--exclude <EXCLUDE> Exclude patterns (can be repeated)
--exclude-tests Exclude test code (#[test] functions and #[cfg(test)] modules)
Examples
Full report:
=====================
)
)
)
)
)
================
)
)
)
Statistics only:
=====================
)
)
)
)
)
JSON output:
{
}
CI check (fail if any exact duplicates exist):
# Exits with code 1 if exact duplicate groups > 0
# Exits with code 0 if within thresholds
CI check with percentage thresholds (fail if >5% of lines are exact duplicates):
# Exits with code 1 if exact duplicate lines exceed 5% of total lines
Exclude test code (inline #[cfg(test)] modules and #[test] functions):
Exclude test directories by path:
Only report duplicates that are at least 10 lines long:
Lower the similarity threshold:
Configuration
Configuration can be provided in three ways (in order of precedence):
- CLI flags (highest priority)
dupes.tomlin the project rootCargo.tomlunder[package.metadata.dupes]
dupes.toml
= 15
= 5
= 0.85
= ["tests", "benches"]
= true
= 0
= 10
= 5.0
= 10.0
Cargo.toml
[]
= 15
= 0.85
= ["tests"]
Configuration Options
| Option | Default | Description |
|---|---|---|
min_nodes |
10 |
Minimum AST node count for a code unit to be analyzed. Increase to skip trivial functions. |
min_lines |
0 |
Minimum source line count for a code unit to be analyzed. 0 means disabled. |
similarity_threshold |
0.8 |
Minimum similarity score (0.0-1.0) for near-duplicate detection. |
exclude |
[] |
Path patterns to exclude from scanning (substring match). |
exclude_tests |
false |
Exclude #[test] functions and #[cfg(test)] modules from analysis. |
max_exact_duplicates |
None |
For check subcommand: maximum allowed exact duplicate groups. |
max_near_duplicates |
None |
For check subcommand: maximum allowed near-duplicate groups. |
max_exact_percent |
None |
For check subcommand: maximum allowed exact duplicate line percentage. |
max_near_percent |
None |
For check subcommand: maximum allowed near-duplicate line percentage. |
Ignoring Duplicates
Some duplicates are intentional (e.g., test helpers, trait implementations). You can ignore them by fingerprint:
# Add a fingerprint to the ignore list
# List ignored fingerprints
)
# Ignored groups are automatically filtered from reports and checks
# The ignored group will not appear
The ignore list is stored in .dupes-ignore.toml in the project root.
CI Integration
Use the check subcommand in CI pipelines:
# GitHub Actions example
- name: Check for code duplication
run: cargo dupes check --max-exact 0 --max-exact-percent 5.0
Exit codes:
- 0 — Check passed (within thresholds)
- 1 — Check failed (thresholds exceeded)
- 2 — Error (no source files, invalid path, etc.)
What Gets Analyzed
| Code Unit | Description |
|---|---|
| Functions | Top-level fn items |
| Methods | fn items inside impl blocks |
| Trait impls | fn items inside impl Trait for Type blocks |
| Closures | Closure expressions (above the min node threshold) |
The scanner automatically:
- Skips
target/directories - Skips hidden directories (starting with
.) - Respects exclude patterns
- Handles parse errors gracefully (skips unparseable files with a warning)
Development
Requirements: Rust 1.85+ (edition 2024)
Pre-commit hooks (via cargo-husky) run clippy and rustfmt automatically.
License
This project is licensed under the MIT License.