jscpd-rs
Fast native Rust clone of jscpd
for copy-paste and duplicate-code detection in local development and CI/CD.
jscpd-rs keeps the upstream command shape, configuration formats, reports,
exit-code workflows, and server workflow, while moving the hot path to native
Rust. The practical goal is simple: keep duplication checks always-on without
spending unnecessary CI minutes, developer waiting time, cloud compute budget,
or electricity on repeated quality gates.
Recorded public release-candidate benchmarks are currently 50x+ faster than
upstream on the covered repositories. The compatibility gate is coverage-first:
on the same inputs and options, jscpd-rs must not miss duplicated source lines
reported by upstream jscpd.
Install
Cargo:
npm/npx:
The first npm package is a source-build package: install/postinstall compiles the native Rust binaries with Cargo. A Rust toolchain must be available on the installing machine. Prebuilt platform packages are a planned publication improvement.
From this repository:
Build without installing:
Quick Start
Scan a project:
Tune the detection threshold:
Generate reports:
Fail CI when duplication is above a threshold:
Use the upstream-compatible command help and format list:
Start the native REST/MCP server:
The current server exposes /, /api/health, /api/stats, /api/check,
/api/recheck, and /mcp. Snippet checks reuse project token maps refreshed
by /api/recheck.
For full CLI, configuration, reporter, server, MCP, and Rust API examples, see docs/user-guide.md.
GitHub Actions
Install from crates.io after publication:
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@stable
- run: cargo install jscpd-rs --locked
- run: jscpd src --reporters console,json --threshold 5 --exitCode 1
Use npm/npx when a Node-based CI environment is already available:
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@stable
- uses: actions/setup-node@v5
with:
node-version: 22
- run: npx jscpd-rs src --reporters console,json --threshold 5 --exitCode 1
Install from a checked-out source tree:
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@stable
- run: cargo install --path . --bins --locked
- run: jscpd src --reporters console,json --threshold 5 --exitCode 1
AI Refactoring Skills
Install the project skills for duplication detection and guided dry refactoring:
Why jscpd-rs
- Fast CI/CD gates: duplicate detection should be cheap enough to run on every pull request.
- Lower operating cost: shorter scans reduce paid compute minutes and repeated developer wait time.
- Native detector path: the detector does not embed or spawn JavaScript for core behavior.
- Practical compatibility: the CLI, config, reporter, server, and exit-code
workflows are designed to map onto common upstream
jscpdusage. - Small project-specific core: use battle-tested Rust crates for CLI parsing, config formats, ignore handling, serialization, token processing, concurrency, and reporting wherever that keeps the implementation simpler.
What Works Today
This is pre-release software. The first release target is a coverage-first
compatible CLI replacement for common jscpd workflows:
jscpdandjscpd-serverbinaries with upstream-compatible command names;- CLI and config option surface covered by compatibility scripts;
- native built-in reporters:
ai,console,consoleFull,csv,html,json,markdown,silent,sarif,threshold,xcode,xml, andbadge; - upstream-synchronized format registry with native JS/TS/JSX/TSX tokenization and generic native tokenization for long-tail formats;
- native Git blame support;
- native REST/MCP server workflow;
- Rust library API for running detection from paths or prepared in-memory sources.
Dynamic npm reporters, stores, listeners, and plugins are intentionally out of scope for the first release. Unknown external reporters/stores keep upstream-style warnings and continue where upstream continues.
Compatibility Contract
The release gate is coverage-first. For the same inputs and options,
jscpd-rs must not miss duplicated lines reported by upstream jscpd. Extra
Rust duplicates are allowed while compatibility converges, but compatibility
reports keep them visible as extra findings.
Exact pair ordering and token totals are quality metrics rather than the default blocking gate. This matters for multi-way clones: different pair selection can still cover the same duplicated source lines.
The upstream repository is checked out as jscpd/ and treated as the
executable specification. Compatibility scripts run both implementations and
compare their reports.
Performance
Latest recorded public benchmark baseline:
| Repo | Format | Rust avg | Upstream avg | Speedup |
|---|---|---|---|---|
| React | JavaScript | 0.199097s | 10.079214s | 50.62x |
| Next.js | TypeScript | 0.262433s | 14.715736s | 56.07x |
| Prometheus | Go | 0.085239s | 4.642435s | 54.46x |
Reproduce the public benchmark and coverage suite:
PUBLIC=1 PUBLIC_RUNS=3
The release-candidate workflow reruns the public suite before publication so README numbers stay tied to a concrete commit and gate output.
Library API
The crate exposes the detector core for native integrations:
let options = get_default_options;
let result = detect_clones_and_statistic?;
let clones = result.clones;
let clones = jscpd?;
detect_clones_and_statistics is also available as the idiomatic Rust spelling.
jscpd and jscpd_with_exit_callback provide a native embeddable argv runner
similar to upstream jscpd(argv, exitCallback?). get_options_from_args parses
upstream-style argv into normalized Options for native integrations.
Tokenizer provides a native generate-maps entrypoint over the same tokenizer
used by detection. Detector, Statistic, and MemoryStore expose native
counterparts for the main upstream core classes without loading JavaScript.
detect_source_files accepts in-memory SourceFile values, which is the
foundation for the upstream-style snippet/server workflow. Format helpers are
available through get_supported_formats, get_format_by_file, and
get_format_by_file_with_mappings.
Architecture
The implementation keeps the hot path small and native:
paths/config
-> ignore-aware discovery
-> native token maps
-> parallel duplicate detection
-> reporters / exit codes / server API
Core crates and libraries:
clap,serde, and config parsers for the CLI/config surface;ignore,globset, and Git ignore handling for file discovery;- Oxc-backed JS/TS/JSX/TSX token processing for the highest-volume languages;
- native generic tokenizers for long-tail formats and embedded code blocks;
rayonand Rust data structures for parallel discovery/detection work;- native reporters for JSON, SARIF, XML, CSV, Markdown, HTML, console, badge, Xcode, threshold, silent, and AI refactoring output.
Known First-Release Deviations
The first release is native-only and coverage-first. These differences from the JavaScript package are intentional unless a real workflow proves otherwise:
- dynamic npm reporters, stores, listeners, and plugins are not loaded;
- token totals and exact clone pair ordering may differ from Prism-based upstream reports while duplicated upstream lines remain covered;
- HTML reports are self-contained and practically compatible, not pixel-perfect;
- the Rust crate exposes a native Rust API, not the upstream JavaScript package API.
Development
The upstream repository is checked out as the jscpd/ git submodule and is the
executable specification for behavior.
Useful focused checks:
STRICT=coverage
Known upstream bug candidates and intentional compatibility exceptions are tracked in docs/upstream-bugs.md. GitHub-ready issue drafts are prepared in docs/upstream-issue-drafts.md.
Release Gates
Fast local gate:
Package/install gate:
npm package/npx gate:
Full compatibility matrix:
FULL=1
Public benchmark and coverage gate:
PUBLIC=1 PUBLIC_RUNS=3
Release candidate gate:
The GitHub Actions workflow runs the fast gate on pushes and pull requests.
Manual workflow runs can enable the full compatibility matrix and public
benchmark suite before a release, or set release_candidate=true to run the
full release-candidate gate in CI.
See docs/compat-baseline.md for the current gate baseline, docs/release-readiness.md for component status, docs/release-checklist.md for the publication checklist, CHANGELOG.md for release notes, and docs/release-decisions.md for approved first-release compatibility decisions.
License
MIT