Expand description
py-canon — the Python Frontend for find-dup-defs, over Ruff’s
native parser (modern syntax: PEP 695 / 701).
Python walks each file once and lowers every top-level function / class / constant /
type-alias and class method to a Def, computing its canonical strings
off the AST node — a CPython ast.dump-shaped, name-preserving, docstring-stripped
structural canonical (the representation difflib-fast clusters). ast_canonical /
analyze_functions expose that canonicalization over a source string for tooling /
golden checks; LineMap and AnalyzedFn are the supporting source-location / analysis
types.
Structs§
- LineMap
- Precomputed line-start offsets for one source string (
starts[i]= byte offset of linei). - Python
- Python frontend over Ruff’s parser.
Statics§
- CLASSES
- Body-bearing nominal types (
class/struct/enum/union). - CONSTANTS
UPPER_SNAKEmodule/namespace constants (const/static).- FUNCTIONS
- Top-level functions (
def f,fn f,function f). - METHODS
- Methods, qualified
Type.method/Type::method. - TYPE_
ALIASES type X = …aliases (note the space innoun_plural, distinct from the hyphenatedid).
Functions§
- analyze_
functions - Batch
analyze_onein parallel — one parse per function, all dup-defs canonical forms at once. - ast_
canonical - CPython-
ast.dump-shaped canonical of the leading def intext(names preserved, docstrings stripped), or the raw text if it does not parse / has no statements. Single-text entry point. - ast_
canonical_ many - Batch canonicalize def texts (functions / classes / …) in parallel — replaces the Python
ast_canonicalloop. Returns one canonical string per input, in order. - normalize_
functions - Batch alpha-rename canonicalize function texts in parallel — replaces the Python
_analyze(cross-name + Type-3 canonicalization).Noneentries are non-function texts.
Type Aliases§
- Analyzed
Fn (cluster_canonical, xname_canonical, type3_lines, node_count)— the analysis tuple the scan reads to build aDef’s cluster canonical +Analysis.py-canon’s own type (was shared viadup-defs-core, now local since the engine consumesDef, not this tuple). The result of analyzing one callable:(cluster-canonical, xname-canonical, type3-lines, size). Identical across the three frontends — they differ only in how they produce the strings.