dupe-cli-0.1.0 is not a library.
PolyDup CLI
Command-line interface for PolyDup, the cross-language duplicate code detector.
Installation
From Source
# Binary will be at: target/release/polydup
System-wide Installation
# Or from the workspace root:
Usage
Basic Scan
Scan Multiple Paths
Adjust Detection Parameters
# Set minimum block size (default: 50 tokens)
# Set similarity threshold (default: 0.85 = 85%)
# Combine both
Output Formats
Text output (default):
Output:
📊 Scan Results
═══════════════════════════════════════════════════════════
Files scanned: 4
Functions analyzed: 45
Duplicates found: 0
✅ No duplicates found!
JSON output (for scripting):
Output:
Verbose Mode
Show additional performance metrics:
Output includes:
- Total tokens processed
- Number of unique hashes
- Scan duration
Command-Line Options
polydup [OPTIONS] <PATHS>...
Arguments:
<PATHS>... Paths to scan (files or directories)
Options:
-f, --format <FORMAT>
Output format [default: text] [possible values: text, json]
-t, --threshold <MIN_BLOCK_SIZE>
Minimum code block size in tokens [default: 50]
-s, --similarity <SIMILARITY>
Similarity threshold (0.0-1.0) [default: 0.85]
-v, --verbose
Show verbose output
-h, --help
Print help
-V, --version
Print version
Exit Codes
- 0: No duplicates found
- 1: Duplicates found (or error occurred)
This allows usage in CI/CD pipelines:
#!/bin/bash
if ! ; then
fi
Examples
CI/CD Integration
GitHub Actions:
name: Check Duplicates
on:
jobs:
check-dupes:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Install PolyDup
run: cargo install --path crates/dupe-cli
- name: Check for duplicates
run: |
polydup ./src --threshold 50 --similarity 0.85 --format json > duplicates.json
- name: Upload results
uses: actions/upload-artifact@v3
if: failure()
with:
name: duplicate-report
path: duplicates.json
Pre-commit Hook
#!/bin/bash
# .git/hooks/pre-commit
if ! ; then
fi
Makefile Integration
: :
: :
Shell Script for Multiple Projects
#!/bin/bash
# scan-all-projects.sh
projects=(
"project1/src"
"project2/lib"
"project3/backend"
)
for; do
done
Performance Tuning
Fast Scan (Lower Accuracy)
# Large block size = fewer comparisons = faster
Thorough Scan (Higher Accuracy)
# Small block size = more comparisons = slower but catches smaller duplicates
Recommended Settings
| Use Case | Threshold | Similarity |
|---|---|---|
| Quick check | 100 | 0.85 |
| Standard scan | 50 | 0.85 |
| Thorough analysis | 30 | 0.90 |
| Refactoring prep | 20 | 0.95 |
Troubleshooting
No Duplicates Found (But You Expected Some)
- Lower the threshold: Try
--threshold 20to catch smaller duplicates - Lower similarity: Try
--similarity 0.7for looser matching - Check file types: Only Rust, Python, and JavaScript/TypeScript are supported
Too Many False Positives
- Raise the threshold: Try
--threshold 100to only catch large duplicates - Raise similarity: Try
--similarity 0.95for stricter matching
Slow Performance
- Increase threshold: Larger blocks = fewer comparisons
- Scan fewer files: Be more specific with paths
- Use release build:
cargo build --release(already done if installed)
Supported Languages
- Rust:
.rsfiles - Python:
.pyfiles - JavaScript/TypeScript:
.js,.jsx,.ts,.tsxfiles
More languages coming soon!
Algorithm
PolyDup uses:
- Tree-sitter for AST-based parsing
- Token normalization (identifiers →
$$ID, strings →$$STR, numbers →$$NUM) - Rabin-Karp rolling hash with window size 50
- Parallel processing via Rayon for multi-core performance
See architecture-research.md for details.
License
MIT OR Apache-2.0
Links
- Core Library: dupe-core
- Node.js Bindings: dupe-node
- Python Bindings: dupe-py
- GitHub: https://github.com/wiesnerbernard/polydup