polydup-core 0.1.0

Cross-language duplicate code detection library using Tree-sitter and Rabin-Karp
Documentation

PolyDup

Cross-language duplicate code detector powered by Tree-sitter and Rust.

Architecture

Shared Core Architecture: Heavy lifting done in Rust, exposed via FFI bindings.

  • dupe-core: Pure Rust library with Tree-sitter parsing, hashing (Rabin-Karp/MinHash), and reporting
  • dupe-cli: Standalone Rust CLI tool
  • dupe-node: Node.js native addon via napi-rs
  • dupe-py: Python extension module via PyO3

Installation

Rust CLI (Recommended)

Install the CLI tool from crates.io:

cargo install dupe-cli
polydup scan ./src

Or download pre-built binaries from GitHub Releases.

Node.js/npm

Install as a project dependency:

npm install @polydup/core

Or globally:

npm install -g @polydup/core

Usage in your project:

const { findDuplicates } = require('@polydup/core');

const duplicates = findDuplicates(['src/', 'tests/'], 10, 0.85);
console.log(duplicates);

Python/pip

Install from PyPI:

pip install polydup

Usage in your project:

import polydup

duplicates = polydup.find_duplicates(
    paths=['src/', 'tests/'],
    min_block_size=10,
    similarity_threshold=0.85
)

Building from Source

CLI

cargo build --release -p dupe-cli
./target/release/polydup scan ./src

Node.js

cd crates/dupe-node
npm install
npm run build

Python

cd crates/dupe-py
maturin develop
python -c "import polydup; print(polydup.version())"

CLI Usage

Scan directories for duplicate code:

# Basic usage
polydup scan ./src

# Custom threshold and output format
polydup scan ./src ./tests --threshold 0.85 --format json

# Adjust block size for granularity
polydup scan ./src --min-block-size 50

Output Formats

  • Text (default): Human-readable colored output
  • JSON: Machine-readable format with full details

Options

  • --threshold: Similarity threshold (0.0-1.0, default: 0.9)
  • --min-block-size: Minimum lines per block (default: 10)
  • --format: Output format (text or json)

License

MIT OR Apache-2.0