Crate dupe_core

Crate dupe_core 

Source
Expand description

PolyDup Core - Cross-language duplicate code detection engine

This library provides the core functionality for detecting duplicate code across Node.js, Python, and Rust codebases using Tree-sitter parsing, Rabin-Karp/MinHash algorithms, and parallel processing.

Structs§

Baseline
Baseline snapshot for comparing duplicate detection across runs
CacheStats
Cache statistics for reporting
CloneMatch
Represents a detected duplicate code block
CodeLocation
Location of a code block in the codebase
Directive
Represents a directive found in source code
DuplicateMatch
Represents a detected duplicate code fragment
FileCacheMetadata
Metadata about a cached file
FileDirectives
Directive detection result for a single file
FileRange
Represents a range within a file (e.g., “src/main.rs:10-25”)
FunctionNode
Represents a parsed function node from source code
HashCache
The complete hash cache for a codebase
IgnoreEntry
A single ignore entry representing an acceptable duplicate
IgnoreManager
Manages loading, saving, and querying ignore entries
Report
Report containing scan results
RollingHash
Rabin-Karp rolling hash for efficient substring comparison
ScanConfig
Configuration used for scanning
ScanStats
Statistics from the scanning process
Scanner
Main scanner for detecting duplicates

Enums§

CloneType
Clone type classification
PolyDupError
Token
Normalized token representation

Functions§

compute_duplicate_id
Compute a content-based ID for a duplicate
compute_rolling_hashes
Computes rolling hashes for a token stream
compute_symmetric_duplicate_id
Compute a symmetric ID for a pair of token windows.
compute_token_edit_distance
Computes Levenshtein edit distance between two token sequences
compute_token_similarity
Computes token-level similarity between two token sequences using edit distance
compute_window_hash
Computes hash for a specific token window
detect_directives
Detects polydup-ignore directives in source code
detect_directives_in_file
Detects directives in a file
detect_duplicates_with_extension
Detects duplicates using rolling hash with greedy extension
detect_type3_clones
Detects Type-3 clones (gap-tolerant) between two token sequences
extend_match
Extends a token match greedily beyond the initial window size
extract_functions
Extracts all function definitions from the given source code
extract_javascript_functions
Convenience function to extract functions from JavaScript code
extract_python_functions
Convenience function to extract functions from Python code
extract_rust_functions
Convenience function to extract functions from Rust code
find_duplicates
Public API: Find duplicates in the given file paths
find_duplicates_with_config
Public API with custom configuration
normalize
Normalizes source code into a token stream for duplicate detection
normalize_with_line_numbers
Normalizes source code into tokens while tracking line offsets
verify_cross_window_match
Verifies that two token windows from different slices are exactly identical

Type Aliases§

Result