jscpd-rs 0.1.1

50x+ faster duplicate-code detector for CI/CD; jscpd-compatible CLI, SARIF, JSON, HTML reports
Documentation

jscpd-rs

release-gate crates.io docs.rs npm license rust

50x+ faster duplicate-code detector for local development, CI/CD, and code quality gates. jscpd-rs scans a codebase, finds copy-paste fragments across files, writes console, JSON, SARIF, HTML, XML, CSV, Markdown, badge, and Xcode reports, and can fail a build when duplication crosses a configured threshold.

It is a native Rust implementation of the common jscpd command-line workflow: upstream-style CLI flags, .jscpd.json and package.json#jscpd configuration, report formats, exit-code behavior, Git blame, and server snippet checks. The practical goal is simple: keep copy-paste detection always-on without spending unnecessary CI minutes, developer waiting time, cloud compute budget, or electricity on repeated quality gates.

Recorded public benchmark baselines show 50x+ speedups over upstream on the covered repositories. The compatibility gate is coverage-first: on the same inputs and options, jscpd-rs must not miss duplicated source lines reported by upstream jscpd.

Install

Cargo:

cargo install jscpd-rs --locked
jscpd --version

npm/npx:

npm install -g jscpd-rs
jscpd --version

npx jscpd-rs --version
npx jscpd-rs .

Current npm packaging note: jscpd-rs installs prebuilt Linux, macOS, and Windows binaries where available, then falls back to building from source with Cargo for unsupported platforms. The original 0.1.0 npm package was source-build only; use 0.1.1+ for the prebuilt-first path. See docs/prebuilt-binaries.md.

From this repository:

git clone https://github.com/vv-bogdanov/jscpd-rs.git
cd jscpd-rs
cargo install --path . --bins --locked

Build without installing:

cargo build --release --bin jscpd
cargo build --release --bin jscpd-server

Quick Start

Scan a project:

jscpd .

Tune the detection threshold:

jscpd --min-lines 5 --min-tokens 50 src

Generate reports:

jscpd --reporters console,json,html --output report src

Fail CI when duplication is above a threshold:

jscpd --threshold 5 --exitCode 1 src

Use the upstream-compatible command help and format list:

jscpd --help
jscpd --list

Start the native REST/MCP server:

jscpd-server . --host 127.0.0.1 --port 3000
curl http://127.0.0.1:3000/api/health

The current server exposes /, /api/health, /api/stats, /api/check, /api/recheck, and /mcp. Snippet checks reuse project token maps refreshed by /api/recheck.

For full CLI, configuration, reporter, server, MCP, and Rust API examples, see docs/user-guide.md. If you already use upstream jscpd, see docs/migrating-from-jscpd.md.

GitHub Actions

Install from crates.io:

jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo install jscpd-rs --locked
      - run: jscpd src --reporters console,json --threshold 5 --exitCode 1

Use npm/npx when a Node-based CI environment is already available:

jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable
      - uses: actions/setup-node@v5
        with:
          node-version: 22
      - run: npx jscpd-rs src --reporters console,json --threshold 5 --exitCode 1

Install from a checked-out source tree:

jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo install --path . --bins --locked
      - run: jscpd src --reporters console,json --threshold 5 --exitCode 1

AI Refactoring Skills

Install the project skills for duplication detection and guided dry refactoring:

npx skills add vv-bogdanov/jscpd-rs --skill jscpd
npx skills add vv-bogdanov/jscpd-rs --skill dry-refactoring

Why jscpd-rs

  • Fast CI/CD gates: duplicate detection should be cheap enough to run on every pull request.
  • Low-friction rollout: npm installs use prebuilt binaries on supported Linux, macOS, and Windows targets, with a Cargo fallback for other platforms.
  • Actionable reports: console, JSON, SARIF, HTML, XML, CSV, Markdown, badge, Xcode, threshold, and AI-oriented reports are implemented natively.
  • Lower operating cost: shorter scans reduce paid compute minutes and repeated developer wait time.
  • Native detector path: the detector does not embed or spawn JavaScript for core behavior.
  • Practical compatibility: the CLI, config, reporter, server, and exit-code workflows are designed to map onto common upstream jscpd usage.
  • Small project-specific core: use battle-tested Rust crates for CLI parsing, config formats, ignore handling, serialization, token processing, concurrency, and reporting wherever that keeps the implementation simpler.

What Works Today

The current release is a coverage-first compatible CLI replacement for common jscpd duplicate-code and copy-paste detection workflows:

  • jscpd and jscpd-server binaries with upstream-compatible command names;
  • CLI and config option surface covered by compatibility scripts;
  • native built-in reporters: ai, console, consoleFull, csv, html, json, markdown, silent, sarif, threshold, xcode, xml, and badge;
  • upstream-synchronized format registry with native JS/TS/JSX/TSX tokenization and generic native tokenization for long-tail formats;
  • native Git blame support;
  • native REST/MCP server workflow;
  • Rust library API for running detection from paths or prepared in-memory sources.

Dynamic npm reporters, stores, listeners, and plugins are intentionally out of scope for the first release. Unknown external reporters/stores keep upstream-style warnings and continue where upstream continues.

Compatibility Contract

The release gate is coverage-first. For the same inputs and options, jscpd-rs must not miss duplicated lines reported by upstream jscpd. Extra Rust duplicates are allowed while compatibility converges, but compatibility reports keep them visible as extra findings.

Exact pair ordering and token totals are quality metrics rather than the default blocking gate. This matters for multi-way clones: different pair selection can still cover the same duplicated source lines.

The upstream repository is checked out as jscpd/ and treated as the executable specification. Compatibility scripts run both implementations and compare their reports.

Performance

Latest recorded public benchmark baseline for duplicate-code detection:

Repo Format Rust avg Upstream avg Speedup
React JavaScript 0.199097s 10.079214s 50.62x
Next.js TypeScript 0.262433s 14.715736s 56.07x
Prometheus Go 0.085239s 4.642435s 54.46x

Reproduce the public benchmark and coverage suite:

PUBLIC=1 PUBLIC_RUNS=3 scripts/release-gate.sh

Release-candidate workflows rerun the public suite before each new publication so README numbers stay tied to a concrete commit and gate output.

Library API

The crate exposes the detector core for native integrations:

let options = jscpd_rs::get_default_options();
let result = jscpd_rs::detect_clones_and_statistic(&options)?;
let clones = result.clones;

let clones = jscpd_rs::jscpd(["jscpd", "src", "--silent", "--noTips"])?;

detect_clones_and_statistics is also available as the idiomatic Rust spelling. jscpd and jscpd_with_exit_callback provide a native embeddable argv runner similar to upstream jscpd(argv, exitCallback?). get_options_from_args parses upstream-style argv into normalized Options for native integrations.

Tokenizer provides a native generate-maps entrypoint over the same tokenizer used by detection. Detector, Statistic, and MemoryStore expose native counterparts for the main upstream core classes without loading JavaScript. detect_source_files accepts in-memory SourceFile values, which is the foundation for the upstream-style snippet/server workflow. Format helpers are available through get_supported_formats, get_format_by_file, and get_format_by_file_with_mappings.

Architecture

The implementation keeps the hot path small and native:

paths/config
  -> ignore-aware discovery
  -> native token maps
  -> parallel duplicate detection
  -> reporters / exit codes / server API

Core crates and libraries:

  • clap, serde, and config parsers for the CLI/config surface;
  • ignore, globset, and Git ignore handling for file discovery;
  • Oxc-backed JS/TS/JSX/TSX token processing for the highest-volume languages;
  • native generic tokenizers for long-tail formats and embedded code blocks;
  • rayon and Rust data structures for parallel discovery/detection work;
  • native reporters for JSON, SARIF, XML, CSV, Markdown, HTML, console, badge, Xcode, threshold, silent, and AI refactoring output.

Known First-Release Deviations

The first release is native-only and coverage-first. These differences from the JavaScript package are intentional unless a real workflow proves otherwise:

  • dynamic npm reporters, stores, listeners, and plugins are not loaded;
  • token totals and exact clone pair ordering may differ from Prism-based upstream reports while duplicated upstream lines remain covered;
  • HTML reports are self-contained and practically compatible, not pixel-perfect;
  • the Rust crate exposes a native Rust API, not the upstream JavaScript package API.

Development

The upstream repository is checked out as the jscpd/ git submodule and is the executable specification for behavior.

git submodule update --init --recursive
cargo test

Useful focused checks:

scripts/compat-cli.sh
scripts/compat-config.sh
scripts/compat-reporters.sh
STRICT=coverage scripts/compat-matrix.sh

Known upstream bug candidates and intentional compatibility exceptions are tracked in docs/upstream-bugs.md. GitHub-ready issue drafts are prepared in docs/upstream-issue-drafts.md.

Release Gates

Fast local gate:

scripts/release-gate.sh

Package/install gate:

scripts/package-check.sh

npm package/npx gate:

scripts/npm-package-check.sh

Full compatibility matrix:

FULL=1 scripts/release-gate.sh

Public benchmark and coverage gate:

PUBLIC=1 PUBLIC_RUNS=3 scripts/release-gate.sh

Release candidate gate:

scripts/release-candidate.sh

The GitHub Actions workflow runs the fast gate on pushes and pull requests. Manual workflow runs can enable the full compatibility matrix and public benchmark suite before a release, or set release_candidate=true to run the full release-candidate gate in CI.

See docs/compat-baseline.md for the current gate baseline, docs/release-readiness.md for component status, docs/release-checklist.md for the publication checklist, CHANGELOG.md for release notes, and docs/release-decisions.md for approved first-release compatibility decisions.

License

MIT