jscpd-rs
50x+ faster duplicate-code detector for local development, CI/CD, and code
quality gates.
jscpd-rs scans a codebase, finds copy-paste fragments across files, writes
console, JSON, SARIF, HTML, XML, CSV, Markdown, badge, and Xcode reports, and
can fail a build when duplication crosses a configured threshold.
It is a native Rust implementation of the common
jscpd command-line workflow:
upstream-style CLI flags, .jscpd.json and package.json#jscpd
configuration, report formats, exit-code behavior, Git blame, and server
snippet checks. The practical goal is simple: keep copy-paste detection
always-on without spending unnecessary CI minutes, developer waiting time,
cloud compute budget, or electricity on repeated quality gates.
Recorded public benchmark baselines show 50x+ speedups over upstream on the
covered repositories. The compatibility gate is coverage-first: on the same
inputs and options, jscpd-rs must not miss duplicated source lines reported
by upstream jscpd.
Install
Cargo:
npm/npx:
Current npm packaging note: jscpd-rs installs prebuilt Linux, macOS, and
Windows binaries through optional platform packages and does not run
install-time build scripts. Unsupported npm platforms should use Cargo. See
docs/prebuilt-binaries.md.
From this repository:
Build without installing:
Quick Start
Scan a project:
Tune the detection threshold:
Generate reports:
Fail CI when duplication is above a threshold:
Use the upstream-compatible command help and format list:
Start the native REST/MCP server:
The current server exposes /, /api/health, /api/stats, /api/check,
/api/recheck, and /mcp. Snippet checks reuse project token maps refreshed
by /api/recheck.
For full CLI, configuration, reporter, server, MCP, and Rust API examples, see
docs/user-guide.md.
If you already use upstream jscpd, see
docs/migrating-from-jscpd.md.
GitHub Actions
Install from crates.io:
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@stable
- run: cargo install jscpd-rs --locked
- run: jscpd src --reporters console,json --threshold 5 --exitCode 1
Use npm/npx when a Node-based CI environment is already available:
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v5
with:
node-version: 22
- run: npx jscpd-rs src --reporters console,json --threshold 5 --exitCode 1
Install from a checked-out source tree:
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@stable
- run: cargo install --path . --bins --locked
- run: jscpd src --reporters console,json --threshold 5 --exitCode 1
AI Refactoring Skills
Install the project skills for duplication detection and guided dry refactoring:
Why jscpd-rs
- Fast CI/CD gates: duplicate detection should be cheap enough to run on every pull request.
- Low-friction rollout: npm installs use prebuilt binaries on supported Linux, macOS, and Windows targets without install-time build scripts. Other platforms should install through Cargo.
- Actionable reports: console, JSON, SARIF, HTML, XML, CSV, Markdown, badge, Xcode, threshold, and AI-oriented reports are implemented natively.
- Lower operating cost: shorter scans reduce paid compute minutes and repeated developer wait time.
- Native detector path: the detector does not embed or spawn JavaScript for core behavior.
- Practical compatibility: the CLI, config, reporter, server, and exit-code
workflows are designed to map onto common upstream
jscpdusage. - Small project-specific core: use battle-tested Rust crates for CLI parsing, config formats, ignore handling, serialization, token processing, concurrency, and reporting wherever that keeps the implementation simpler.
What Works Today
The current release is a coverage-first compatible CLI replacement for common
jscpd duplicate-code and copy-paste detection workflows:
jscpdandjscpd-serverbinaries with upstream-compatible command names;- CLI and config option surface covered by compatibility scripts;
- native built-in reporters:
ai,console,consoleFull,csv,html,json,markdown,silent,sarif,threshold,xcode,xml, andbadge; - upstream-synchronized format registry with native JS/TS/JSX/TSX tokenization and generic native tokenization for long-tail formats;
- native Git blame support;
- native REST/MCP server workflow;
- Rust library API for running detection from paths or prepared in-memory sources.
Dynamic npm reporters, stores, listeners, and plugins are intentionally out of scope for the first release. Unknown external reporters/stores keep upstream-style warnings and continue where upstream continues.
Compatibility Contract
The release gate is coverage-first. For the same inputs and options,
jscpd-rs must not miss duplicated lines reported by upstream jscpd. Extra
Rust duplicates are allowed while compatibility converges, but compatibility
reports keep them visible as extra findings.
Exact pair ordering and token totals are quality metrics rather than the default blocking gate. This matters for multi-way clones: different pair selection can still cover the same duplicated source lines.
The upstream repository is checked out as jscpd/ and treated as the
executable specification. Compatibility scripts run both implementations and
compare their reports.
Performance
Latest recorded public benchmark baseline for duplicate-code detection:
| Repo | Format | Rust avg | Upstream avg | Speedup |
|---|---|---|---|---|
| React | JavaScript | 0.199097s | 10.079214s | 50.62x |
| Next.js | TypeScript | 0.262433s | 14.715736s | 56.07x |
| Prometheus | Go | 0.085239s | 4.642435s | 54.46x |
Reproduce the public benchmark and coverage suite:
Benchmark native server snippet checks against upstream:
RUNS=20
Release-candidate workflows rerun the public suite before each new publication and enforce the core coverage gate, so README numbers stay tied to a concrete commit and gate output. The public release gate fails below a 45x speedup on the default benchmark cases to prevent silent performance regressions while preserving room for normal runner noise.
Library API
The crate exposes the detector core for native integrations:
let options = get_default_options;
let result = detect_clones_and_statistic?;
let clones = result.clones;
let clones = jscpd?;
detect_clones_and_statistics is also available as the idiomatic Rust spelling.
jscpd and jscpd_with_exit_callback provide a native embeddable argv runner
similar to upstream jscpd(argv, exitCallback?). get_options_from_args parses
upstream-style argv into normalized Options for native integrations.
Tokenizer provides a native generate-maps entrypoint over the same tokenizer
used by detection. Detector, Statistic, and MemoryStore expose native
counterparts for the main upstream core classes without loading JavaScript.
detect_source_files accepts in-memory SourceFile values, which is the
foundation for the upstream-style snippet/server workflow. Format helpers are
available through get_supported_formats, get_format_by_file, and
get_format_by_file_with_mappings.
Architecture
The implementation keeps the hot path small and native:
paths/config
-> ignore-aware discovery
-> native token maps
-> parallel duplicate detection
-> reporters / exit codes / server API
Core crates and libraries:
clap,serde, and config parsers for the CLI/config surface;ignore,globset, and Git ignore handling for file discovery;- Oxc-backed JS/TS/JSX/TSX token processing for the highest-volume languages;
- native generic tokenizers for long-tail formats and embedded code blocks;
rayonand Rust data structures for parallel discovery/detection work;- native reporters for JSON, SARIF, XML, CSV, Markdown, HTML, console, badge, Xcode, threshold, silent, and AI refactoring output.
Known First-Release Deviations
The first release is native-only and coverage-first. These differences from the JavaScript package are intentional unless a real workflow proves otherwise:
- dynamic npm reporters, stores, listeners, and plugins are not loaded;
- token totals and exact clone pair ordering may differ from Prism-based upstream reports while duplicated upstream lines remain covered;
- HTML reports are self-contained and practically compatible, not pixel-perfect;
- the Rust crate exposes a native Rust API, not the upstream JavaScript package API.
Development
The upstream repository is checked out as the jscpd/ git submodule and is the
executable specification for behavior.
Useful focused checks:
STRICT=coverage
Rust code coverage is optional and intentionally kept out of the default fast gate:
SUMMARY=1
SCOPE=core SUMMARY=1
SCOPE=core FAIL_UNDER_LINES=93
Black-box behavior tests that exercise the public API live in tests/. Small
private-helper tests stay next to the module they protect.
Known upstream bug candidates and intentional compatibility exceptions are tracked in docs/upstream-bugs.md. GitHub-ready issue drafts are prepared in docs/upstream-issue-drafts.md.
Release Gates
Fast local gate:
The fast gate includes cargo fmt, cargo test, shell syntax checks,
shellcheck, package/install checks, and the focused compatibility gates.
Package/install gate:
npm package/npx gate:
Full compatibility matrix:
FULL=1
Public benchmark and coverage gate:
Release candidate gate:
The GitHub Actions workflow runs the fast gate on pushes and pull requests.
Manual workflow runs can enable the full compatibility matrix and public
benchmark suite before a release, or set release_candidate=true to run the
full release-candidate gate in CI.
See docs/compat-baseline.md for the current gate baseline, docs/release-readiness.md for component status, docs/release-checklist.md for the publication checklist, CHANGELOG.md for release notes, and docs/release-decisions.md for approved first-release compatibility decisions.
License
MIT