Module commented_out_code

Expand description

commented_out_code — heuristic detector for blocks of commented-out source code (as opposed to prose comments, license headers, doc comments, or ASCII banners).

Targets the “agent left dead code behind” pattern: agents tend to comment-rather-than-delete during iteration, and the leftovers accumulate. Existing primitives can ban specific phrasings but can’t catch the generic “block-of-code-shaped-comments” pattern.

Design doc: docs/design/v0.7/commented_out_code.md.

§Heuristic

For each consecutive run of comment lines (≥ min_lines), count the fraction of non-whitespace characters that are structural punctuation strongly biased toward code:

  strong_chars = ( ) { } [ ] ; = < > & | ^
  raw_density  = count(strong_chars) / non-whitespace-char-count

Backticks and quotes are deliberately excluded — backticks show up constantly in rustdoc / TSDoc prose to delimit code references (`foo` matches `bar`), and double quotes appear in normal English. Including either inflates the score on legitimate prose comments.

Then normalise so the user-facing threshold field has a useful midpoint at 0.5:

  density = min(raw_density / 0.20, 1.0)

At raw_density = 0.20 (i.e. one-fifth of non-whitespace chars are strong-code chars), the normalised density is 1.0. Real code blocks comfortably exceed this; English prose is well below it because everyday writing rarely uses brackets, semicolons, or assignment operators.

Density ≥ threshold (default 0.5) marks the block as code-shaped. Doc-comment markers (///, /** */) and the file’s first skip_leading_lines lines (license headers) are excluded by construction.

The score deliberately does NOT use identifier-token density: English prose is dominated by 3+-letter words that look identifier-shaped, so identifier counts can’t discriminate code from explanation. Punctuation can.

Structs§

CommentedOutCodeRule

Functions§

build