dupe-core 0.1.0

Cross-language duplicate code detection library using Tree-sitter and Rabin-Karp
Documentation
# PolyDup

Cross-language duplicate code detector powered by Tree-sitter and Rust.

## Architecture

**Shared Core Architecture**: Heavy lifting done in Rust, exposed via FFI bindings.

- **dupe-core**: Pure Rust library with Tree-sitter parsing, hashing (Rabin-Karp/MinHash), and reporting
- **dupe-cli**: Standalone Rust CLI tool
- **dupe-node**: Node.js native addon via napi-rs
- **dupe-py**: Python extension module via PyO3

## Installation

### Rust CLI (Recommended)

Install the CLI tool from crates.io:

```bash
cargo install dupe-cli
polydup scan ./src
```

Or download pre-built binaries from [GitHub Releases](https://github.com/wiesnerbernard/polydup/releases).

### Node.js/npm

Install as a project dependency:

```bash
npm install @polydup/core
```

Or globally:

```bash
npm install -g @polydup/core
```

Usage in your project:

```javascript
const { findDuplicates } = require('@polydup/core');

const duplicates = findDuplicates(['src/', 'tests/'], 10, 0.85);
console.log(duplicates);
```

### Python/pip

Install from PyPI:

```bash
pip install polydup
```

Usage in your project:

```python
import polydup

duplicates = polydup.find_duplicates(
    paths=['src/', 'tests/'],
    min_block_size=10,
    similarity_threshold=0.85
)
```

## Building from Source

### CLI
```bash
cargo build --release -p dupe-cli
./target/release/polydup scan ./src
```

### Node.js
```bash
cd crates/dupe-node
npm install
npm run build
```

### Python
```bash
cd crates/dupe-py
maturin develop
python -c "import polydup; print(polydup.version())"
```

## CLI Usage

Scan directories for duplicate code:

```bash
# Basic usage
polydup scan ./src

# Custom threshold and output format
polydup scan ./src ./tests --threshold 0.85 --format json

# Adjust block size for granularity
polydup scan ./src --min-block-size 50
```

### Output Formats

- **Text** (default): Human-readable colored output
- **JSON**: Machine-readable format with full details

### Options

- `--threshold`: Similarity threshold (0.0-1.0, default: 0.9)
- `--min-block-size`: Minimum lines per block (default: 10)
- `--format`: Output format (text or json)

## License

MIT OR Apache-2.0