# PolyDup
Cross-language duplicate code detector powered by Tree-sitter and Rust.
## Architecture
**Shared Core Architecture**: Heavy lifting done in Rust, exposed via FFI bindings.
- **dupe-core**: Pure Rust library with Tree-sitter parsing, hashing (Rabin-Karp/MinHash), and reporting
- **dupe-cli**: Standalone Rust CLI tool
- **dupe-node**: Node.js native addon via napi-rs
- **dupe-py**: Python extension module via PyO3
## Installation
### Rust CLI (Recommended)
Install the CLI tool from crates.io:
```bash
cargo install dupe-cli
polydup scan ./src
```
Or download pre-built binaries from [GitHub Releases](https://github.com/wiesnerbernard/polydup/releases).
### Node.js/npm
Install as a project dependency:
```bash
npm install @polydup/core
```
Or globally:
```bash
npm install -g @polydup/core
```
Usage in your project:
```javascript
const { findDuplicates } = require('@polydup/core');
const duplicates = findDuplicates(['src/', 'tests/'], 10, 0.85);
console.log(duplicates);
```
### Python/pip
Install from PyPI:
```bash
pip install polydup
```
Usage in your project:
```python
import polydup
duplicates = polydup.find_duplicates(
paths=['src/', 'tests/'],
min_block_size=10,
similarity_threshold=0.85
)
```
## Building from Source
### CLI
```bash
cargo build --release -p dupe-cli
./target/release/polydup scan ./src
```
### Node.js
```bash
cd crates/dupe-node
npm install
npm run build
```
### Python
```bash
cd crates/dupe-py
maturin develop
python -c "import polydup; print(polydup.version())"
```
## CLI Usage
Scan directories for duplicate code:
```bash
# Basic usage
polydup scan ./src
# Custom threshold and output format
polydup scan ./src ./tests --threshold 0.85 --format json
# Adjust block size for granularity
polydup scan ./src --min-block-size 50
```
### Output Formats
- **Text** (default): Human-readable colored output
- **JSON**: Machine-readable format with full details
### Options
- `--threshold`: Similarity threshold (0.0-1.0, default: 0.9)
- `--min-block-size`: Minimum lines per block (default: 10)
- `--format`: Output format (text or json)
## License
MIT OR Apache-2.0