Skip to main content

Crate dupe_core

Crate dupe_core 

Source
Expand description

PolyDup Core - Cross-language duplicate code detection engine

This library provides the core functionality for detecting duplicate code across Node.js, Python, and Rust codebases using Tree-sitter parsing, Rabin-Karp/MinHash algorithms, and parallel processing.

Structs§

Baseline
Baseline snapshot for comparing duplicate detection across runs
CacheStats
Cache statistics for reporting
CloneMatch
Represents a detected duplicate code block
CodeLocation
Location of a code block in the codebase
Directive
Represents a directive found in source code
DuplicateMatch
Represents a detected duplicate code fragment
FileCacheMetadata
Metadata about a cached file
FileDirectives
Directive detection result for a single file
FileRange
Represents a range within a file (e.g., “src/main.rs:10-25”)
FunctionNode
Represents a parsed function node from source code
HashCache
The complete hash cache for a codebase
IgnoreEntry
A single ignore entry representing an acceptable duplicate
IgnoreManager
Manages loading, saving, and querying ignore entries
LanguageInfo
Information about a supported programming language
Report
Report containing scan results
RollingHash
Rabin-Karp rolling hash for efficient substring comparison
ScanConfig
Configuration used for scanning
ScanStats
Statistics from the scanning process
Scanner
Main scanner for detecting duplicates
SkippedFile
A file that was skipped during scanning

Enums§

CloneType
Clone type classification
LanguageStatus
Support status for a language
PolyDupError
Token
Normalized token representation

Functions§

compute_duplicate_id
Compute a content-based ID for a duplicate
compute_rolling_hashes
Computes rolling hashes for a token stream
compute_symmetric_duplicate_id
Compute a symmetric ID for a pair of token windows.
compute_token_edit_distance
Computes Levenshtein edit distance between two token sequences
compute_token_similarity
Computes token-level similarity between two token sequences using edit distance
compute_window_hash
Computes hash for a specific token window
detect_directives
Detects polydup-ignore directives in source code
detect_directives_in_file
Detects directives in a file
detect_duplicates_with_extension
Detects duplicates using rolling hash with greedy extension
detect_type3_clones
Detects Type-3 clones (gap-tolerant) between two token sequences
extend_match
Extends a token match greedily beyond the initial window size
extract_functions
Extracts all function definitions from the given source code
extract_javascript_functions
Convenience function to extract functions from JavaScript code
extract_python_functions
Convenience function to extract functions from Python code
extract_rust_functions
Convenience function to extract functions from Rust code
find_duplicates
Public API: Find duplicates in the given file paths
find_duplicates_with_config
Public API with custom configuration
get_supported_languages
Returns information about all supported programming languages
normalize
Normalizes source code into a token stream for duplicate detection
normalize_with_line_numbers
Normalizes source code into tokens while tracking line offsets
verify_cross_window_match
Verifies that two token windows from different slices are exactly identical

Type Aliases§

Result