py-canon
The Python frontend for find-dup-defs:
Python source → a CPython ast.dump-shape canonical form plus a top-level definition scan.
Parses with the Ruff Python parser (modern syntax — PEP 695 /
PEP 701). It implements find-dup-defs's Frontend trait:
Python::scanwalks each file once and lowers every module-level definition (function, class,UPPER_CASEconstant,typealias) and class method to aDef, with its canonical strings precomputed off the AST node.- The canonical is a structural form matching CPython's
ast.dumpshape (docstrings stripped) — the input to byte-for-byte Ratcliff–Obershelp similarity.ast_canonical/analyze_functionsexpose it over a source string (used for tooling / golden checks).
The canonicalization is validated byte-for-byte against a golden corpus produced by CPython's own
ast module (examples/verify_golden.rs).
use Arc;
use Frontend;
use Python;
let files = ;
let defs = Python.scan; // reads the files, returns Defs with canon precomputed
for d in &defs
Reusable on its own; pairs with difflib-fast for the
similarity/clustering step.
License
MIT.