Expand description
PolyDup Core - Cross-language duplicate code detection engine
This library provides the core functionality for detecting duplicate code across Node.js, Python, and Rust codebases using Tree-sitter parsing, Rabin-Karp/MinHash algorithms, and parallel processing.
Structs§
- Baseline
- Baseline snapshot for comparing duplicate detection across runs
- Cache
Stats - Cache statistics for reporting
- Clone
Match - Represents a detected duplicate code block
- Code
Location - Location of a code block in the codebase
- Directive
- Represents a directive found in source code
- Duplicate
Match - Represents a detected duplicate code fragment
- File
Cache Metadata - Metadata about a cached file
- File
Directives - Directive detection result for a single file
- File
Range - Represents a range within a file (e.g., “src/main.rs:10-25”)
- Function
Node - Represents a parsed function node from source code
- Hash
Cache - The complete hash cache for a codebase
- Ignore
Entry - A single ignore entry representing an acceptable duplicate
- Ignore
Manager - Manages loading, saving, and querying ignore entries
- Report
- Report containing scan results
- Rolling
Hash - Rabin-Karp rolling hash for efficient substring comparison
- Scan
Config - Configuration used for scanning
- Scan
Stats - Statistics from the scanning process
- Scanner
- Main scanner for detecting duplicates
Enums§
- Clone
Type - Clone type classification
- Poly
DupError - Token
- Normalized token representation
Functions§
- compute_
duplicate_ id - Compute a content-based ID for a duplicate
- compute_
rolling_ hashes - Computes rolling hashes for a token stream
- compute_
symmetric_ duplicate_ id - Compute a symmetric ID for a pair of token windows.
- compute_
token_ edit_ distance - Computes Levenshtein edit distance between two token sequences
- compute_
token_ similarity - Computes token-level similarity between two token sequences using edit distance
- compute_
window_ hash - Computes hash for a specific token window
- detect_
directives - Detects polydup-ignore directives in source code
- detect_
directives_ in_ file - Detects directives in a file
- detect_
duplicates_ with_ extension - Detects duplicates using rolling hash with greedy extension
- detect_
type3_ clones - Detects Type-3 clones (gap-tolerant) between two token sequences
- extend_
match - Extends a token match greedily beyond the initial window size
- extract_
functions - Extracts all function definitions from the given source code
- extract_
javascript_ functions - Convenience function to extract functions from JavaScript code
- extract_
python_ functions - Convenience function to extract functions from Python code
- extract_
rust_ functions - Convenience function to extract functions from Rust code
- find_
duplicates - Public API: Find duplicates in the given file paths
- find_
duplicates_ with_ config - Public API with custom configuration
- normalize
- Normalizes source code into a token stream for duplicate detection
- normalize_
with_ line_ numbers - Normalizes source code into tokens while tracking line offsets
- verify_
cross_ window_ match - Verifies that two token windows from different slices are exactly identical