Skip to main content

Crate py_canon

Crate py_canon 

Source
Expand description

py-canon — the Python Frontend for find-dup-defs, over Ruff’s native parser (modern syntax: PEP 695 / 701).

Python walks each file once and lowers every top-level function / class / constant / type-alias and class method to a Def, computing its canonical strings off the AST node — a CPython ast.dump-shaped, name-preserving, docstring-stripped structural canonical (the representation difflib-fast clusters). ast_canonical / analyze_functions expose that canonicalization over a source string for tooling / golden checks; LineMap and AnalyzedFn are the supporting source-location / analysis types.

Structs§

LineMap
Precomputed line-start offsets for one source string (starts[i] = byte offset of line i).
Python
Python frontend over Ruff’s parser.

Statics§

CLASSES
Body-bearing nominal types (class / struct / enum / union).
CONSTANTS
UPPER_SNAKE module/namespace constants (const / static).
FUNCTIONS
Top-level functions (def f, fn f, function f).
METHODS
Methods, qualified Type.method / Type::method.
TYPE_ALIASES
type X = … aliases (note the space in noun_plural, distinct from the hyphenated id).

Functions§

analyze_functions
Batch analyze_one in parallel — one parse per function, all dup-defs canonical forms at once.
ast_canonical
CPython-ast.dump-shaped canonical of the leading def in text (names preserved, docstrings stripped), or the raw text if it does not parse / has no statements. Single-text entry point.
ast_canonical_many
Batch canonicalize def texts (functions / classes / …) in parallel — replaces the Python ast_canonical loop. Returns one canonical string per input, in order.
normalize_functions
Batch alpha-rename canonicalize function texts in parallel — replaces the Python _analyze (cross-name + Type-3 canonicalization). None entries are non-function texts.

Type Aliases§

AnalyzedFn
(cluster_canonical, xname_canonical, type3_lines, node_count) — the analysis tuple the scan reads to build a Def’s cluster canonical + Analysis. py-canon’s own type (was shared via dup-defs-core, now local since the engine consumes Def, not this tuple). The result of analyzing one callable: (cluster-canonical, xname-canonical, type3-lines, size). Identical across the three frontends — they differ only in how they produce the strings.