jscpd-rs 0.1.0

Fast Rust clone of jscpd
Documentation

jscpd-rs

release-gate crates.io docs.rs npm license rust

Fast native Rust clone of jscpd for copy-paste and duplicate-code detection in local development and CI/CD.

jscpd-rs keeps the upstream command shape, configuration formats, reports, exit-code workflows, and server workflow, while moving the hot path to native Rust. The practical goal is simple: keep duplication checks always-on without spending unnecessary CI minutes, developer waiting time, cloud compute budget, or electricity on repeated quality gates.

Recorded public release-candidate benchmarks are currently 50x+ faster than upstream on the covered repositories. The compatibility gate is coverage-first: on the same inputs and options, jscpd-rs must not miss duplicated source lines reported by upstream jscpd.

Install

Cargo:

cargo install jscpd-rs --locked
jscpd --version

npm/npx:

npm install -g jscpd-rs
jscpd --version

npx jscpd-rs --version
npx jscpd-rs .

The first npm package is a source-build package: install/postinstall compiles the native Rust binaries with Cargo. A Rust toolchain must be available on the installing machine. Prebuilt platform packages are a planned publication improvement.

From this repository:

git clone https://github.com/vv-bogdanov/jscpd-rs.git
cd jscpd-rs
cargo install --path . --bins --locked

Build without installing:

cargo build --release --bin jscpd
cargo build --release --bin jscpd-server

Quick Start

Scan a project:

jscpd .

Tune the detection threshold:

jscpd --min-lines 5 --min-tokens 50 src

Generate reports:

jscpd --reporters console,json,html --output report src

Fail CI when duplication is above a threshold:

jscpd --threshold 5 --exitCode 1 src

Use the upstream-compatible command help and format list:

jscpd --help
jscpd --list

Start the native REST/MCP server:

jscpd-server . --host 127.0.0.1 --port 3000
curl http://127.0.0.1:3000/api/health

The current server exposes /, /api/health, /api/stats, /api/check, /api/recheck, and /mcp. Snippet checks reuse project token maps refreshed by /api/recheck.

For full CLI, configuration, reporter, server, MCP, and Rust API examples, see docs/user-guide.md.

GitHub Actions

Install from crates.io after publication:

jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo install jscpd-rs --locked
      - run: jscpd src --reporters console,json --threshold 5 --exitCode 1

Use npm/npx when a Node-based CI environment is already available:

jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable
      - uses: actions/setup-node@v5
        with:
          node-version: 22
      - run: npx jscpd-rs src --reporters console,json --threshold 5 --exitCode 1

Install from a checked-out source tree:

jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo install --path . --bins --locked
      - run: jscpd src --reporters console,json --threshold 5 --exitCode 1

AI Refactoring Skills

Install the project skills for duplication detection and guided dry refactoring:

npx skills add vv-bogdanov/jscpd-rs --skill jscpd
npx skills add vv-bogdanov/jscpd-rs --skill dry-refactoring

Why jscpd-rs

  • Fast CI/CD gates: duplicate detection should be cheap enough to run on every pull request.
  • Lower operating cost: shorter scans reduce paid compute minutes and repeated developer wait time.
  • Native detector path: the detector does not embed or spawn JavaScript for core behavior.
  • Practical compatibility: the CLI, config, reporter, server, and exit-code workflows are designed to map onto common upstream jscpd usage.
  • Small project-specific core: use battle-tested Rust crates for CLI parsing, config formats, ignore handling, serialization, token processing, concurrency, and reporting wherever that keeps the implementation simpler.

What Works Today

This is pre-release software. The first release target is a coverage-first compatible CLI replacement for common jscpd workflows:

  • jscpd and jscpd-server binaries with upstream-compatible command names;
  • CLI and config option surface covered by compatibility scripts;
  • native built-in reporters: ai, console, consoleFull, csv, html, json, markdown, silent, sarif, threshold, xcode, xml, and badge;
  • upstream-synchronized format registry with native JS/TS/JSX/TSX tokenization and generic native tokenization for long-tail formats;
  • native Git blame support;
  • native REST/MCP server workflow;
  • Rust library API for running detection from paths or prepared in-memory sources.

Dynamic npm reporters, stores, listeners, and plugins are intentionally out of scope for the first release. Unknown external reporters/stores keep upstream-style warnings and continue where upstream continues.

Compatibility Contract

The release gate is coverage-first. For the same inputs and options, jscpd-rs must not miss duplicated lines reported by upstream jscpd. Extra Rust duplicates are allowed while compatibility converges, but compatibility reports keep them visible as extra findings.

Exact pair ordering and token totals are quality metrics rather than the default blocking gate. This matters for multi-way clones: different pair selection can still cover the same duplicated source lines.

The upstream repository is checked out as jscpd/ and treated as the executable specification. Compatibility scripts run both implementations and compare their reports.

Performance

Latest recorded public benchmark baseline:

Repo Format Rust avg Upstream avg Speedup
React JavaScript 0.199097s 10.079214s 50.62x
Next.js TypeScript 0.262433s 14.715736s 56.07x
Prometheus Go 0.085239s 4.642435s 54.46x

Reproduce the public benchmark and coverage suite:

PUBLIC=1 PUBLIC_RUNS=3 scripts/release-gate.sh

The release-candidate workflow reruns the public suite before publication so README numbers stay tied to a concrete commit and gate output.

Library API

The crate exposes the detector core for native integrations:

let options = jscpd_rs::get_default_options();
let result = jscpd_rs::detect_clones_and_statistic(&options)?;
let clones = result.clones;

let clones = jscpd_rs::jscpd(["jscpd", "src", "--silent", "--noTips"])?;

detect_clones_and_statistics is also available as the idiomatic Rust spelling. jscpd and jscpd_with_exit_callback provide a native embeddable argv runner similar to upstream jscpd(argv, exitCallback?). get_options_from_args parses upstream-style argv into normalized Options for native integrations.

Tokenizer provides a native generate-maps entrypoint over the same tokenizer used by detection. Detector, Statistic, and MemoryStore expose native counterparts for the main upstream core classes without loading JavaScript. detect_source_files accepts in-memory SourceFile values, which is the foundation for the upstream-style snippet/server workflow. Format helpers are available through get_supported_formats, get_format_by_file, and get_format_by_file_with_mappings.

Architecture

The implementation keeps the hot path small and native:

paths/config
  -> ignore-aware discovery
  -> native token maps
  -> parallel duplicate detection
  -> reporters / exit codes / server API

Core crates and libraries:

  • clap, serde, and config parsers for the CLI/config surface;
  • ignore, globset, and Git ignore handling for file discovery;
  • Oxc-backed JS/TS/JSX/TSX token processing for the highest-volume languages;
  • native generic tokenizers for long-tail formats and embedded code blocks;
  • rayon and Rust data structures for parallel discovery/detection work;
  • native reporters for JSON, SARIF, XML, CSV, Markdown, HTML, console, badge, Xcode, threshold, silent, and AI refactoring output.

Known First-Release Deviations

The first release is native-only and coverage-first. These differences from the JavaScript package are intentional unless a real workflow proves otherwise:

  • dynamic npm reporters, stores, listeners, and plugins are not loaded;
  • token totals and exact clone pair ordering may differ from Prism-based upstream reports while duplicated upstream lines remain covered;
  • HTML reports are self-contained and practically compatible, not pixel-perfect;
  • the Rust crate exposes a native Rust API, not the upstream JavaScript package API.

Development

The upstream repository is checked out as the jscpd/ git submodule and is the executable specification for behavior.

git submodule update --init --recursive
cargo test

Useful focused checks:

scripts/compat-cli.sh
scripts/compat-config.sh
scripts/compat-reporters.sh
STRICT=coverage scripts/compat-matrix.sh

Known upstream bug candidates and intentional compatibility exceptions are tracked in docs/upstream-bugs.md. GitHub-ready issue drafts are prepared in docs/upstream-issue-drafts.md.

Release Gates

Fast local gate:

scripts/release-gate.sh

Package/install gate:

scripts/package-check.sh

npm package/npx gate:

scripts/npm-package-check.sh

Full compatibility matrix:

FULL=1 scripts/release-gate.sh

Public benchmark and coverage gate:

PUBLIC=1 PUBLIC_RUNS=3 scripts/release-gate.sh

Release candidate gate:

scripts/release-candidate.sh

The GitHub Actions workflow runs the fast gate on pushes and pull requests. Manual workflow runs can enable the full compatibility matrix and public benchmark suite before a release, or set release_candidate=true to run the full release-candidate gate in CI.

See docs/compat-baseline.md for the current gate baseline, docs/release-readiness.md for component status, docs/release-checklist.md for the publication checklist, CHANGELOG.md for release notes, and docs/release-decisions.md for approved first-release compatibility decisions.

License

MIT