Expand description
commented_out_code — heuristic detector for blocks of
commented-out source code (as opposed to prose comments,
license headers, doc comments, or ASCII banners).
Targets the “agent left dead code behind” pattern: agents tend to comment-rather-than-delete during iteration, and the leftovers accumulate. Existing primitives can ban specific phrasings but can’t catch the generic “block-of-code-shaped-comments” pattern.
Design doc: docs/design/v0.7/commented_out_code.md.
§Heuristic
For each consecutive run of comment lines (≥ min_lines),
count the fraction of non-whitespace characters that are
structural punctuation strongly biased toward code:
strong_chars = ( ) { } [ ] ; = < > & | ^
raw_density = count(strong_chars) / non-whitespace-char-countBackticks and quotes are deliberately excluded — backticks
show up constantly in rustdoc / TSDoc prose to delimit code
references (`foo` matches `bar`), and double quotes
appear in normal English. Including either inflates the
score on legitimate prose comments.
Then normalise so the user-facing threshold field has a
useful midpoint at 0.5:
density = min(raw_density / 0.20, 1.0)At raw_density = 0.20 (i.e. one-fifth of non-whitespace
chars are strong-code chars), the normalised density is
1.0. Real code blocks comfortably exceed this; English
prose is well below it because everyday writing rarely
uses brackets, semicolons, or assignment operators.
Density ≥ threshold (default 0.5) marks the block as
code-shaped. Doc-comment markers (///, /** */) and
the file’s first skip_leading_lines lines (license
headers) are excluded by construction.
The score deliberately does NOT use identifier-token density: English prose is dominated by 3+-letter words that look identifier-shaped, so identifier counts can’t discriminate code from explanation. Punctuation can.