1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
#!/usr/bin/env python3
"""
Scan public-text surfaces (issue/PR/comment bodies) for sensitive-info
patterns and emit pattern CATEGORY names — never the matched value.
This is the GitHub Actions sibling of scripts/scan-secrets.sh. The two
guards share a pattern set and an allow-list convention so contributors
get the same answer locally and remotely.
Contract
--------
Input (preferred — env vars set by the workflow):
TEXT_SCAN_TITLE one-line title text (issue or PR title)
TEXT_SCAN_BODY multi-line body text (issue/PR/comment body)
Either may be unset / empty. If both are unset *and* stdin has data,
the script reads stdin instead (convenient for local development and
for the verification gate in the dispatch prompt).
Output:
one line per finding, exactly: CATEGORY: <category-name>
distinct categories are deduplicated across the input.
If GITHUB_OUTPUT is set (the workflow runner), the same list is also
written to that file in heredoc form:
findings<<DELIM
CATEGORY: aws_access_key
DELIM
That makes the categories available to the next step as
`steps.<id>.outputs.findings`. This addresses Gemini-3-Pro round-1
finding #1 against the original Spec F draft, which used
`print('FINDINGS::...')` and never wrote to $GITHUB_OUTPUT.
Exit codes:
0 no findings
1 findings detected (used by the workflow as the gate)
Pattern set + allow-list
------------------------
Patterns and category names mirror scripts/scan-secrets.sh exactly so
that a string that fails the local pre-commit hook also fails the
public-text guard (and vice versa). When updating one side, update the
other in the same PR.
Bash POSIX-ERE -> Python `re` translation: [[:space:]] -> \\s, the rest
is character-class identical.
Allow-list: a line containing `<word>`-style placeholder syntax (regex
`<[A-Za-z_][A-Za-z0-9_-]*>`) is treated as a documentation example and
skipped. This matches the local guard's behavior — see SECURITY.md.
The allow-list check looks at the WHOLE LINE, not the regex match
span. This addresses Gemini-3-Pro round-1 finding #4 against the
original Spec F draft, which used `m.group(0)` (only the matched
secret-shaped substring) and so could not see surrounding placeholder
markers on the same line.
Security discipline
-------------------
This script MUST NEVER print, log, or return the matched value. Only
the category name is exposed. Reviewers updating this file: search
for `print(` and verify each call site only emits literal strings or
category names from CATEGORIES. There is no `print(line)`,
`print(m.group(0))`, etc.
"""
# (category_name, compiled_pattern)
# Category names match scripts/scan-secrets.sh PATTERN_NAMES.
: =
=
"""Gather scan input from env vars, falling back to stdin."""
=
=
# Title is treated as its own line so the line-level allow-list
# logic applies cleanly to each input region.
return + +
return or
# No env input — fall through to stdin (used for local testing and
# the dispatch's verification gate).
return
return
"""Return distinct category names that fire on `text`.
Order is preserved: the order in which a category first matches
determines its order in the output. That makes the workflow's
comment deterministic across multiple-finding inputs.
"""
: =
: =
# Allow-list: any line with a <word>-style placeholder is
# treated as documentation and skipped. Check the WHOLE LINE,
# not a per-match span (Gemini round-1 finding #4).
continue
continue
# NOTE: never print the matched value. We append only
# the category name. The matched line and substring
# are intentionally discarded.
return
"""Write findings to $GITHUB_OUTPUT in heredoc form, if available.
Heredoc delimiter is a fresh random hex token so it cannot collide
with payload content (the payload is fixed-shape category names,
but the defensive choice is cheap).
"""
=
return
= +
=
=
return 0
=
# Only the category name leaves this process. Verified by
# construction: `name` is a key in CATEGORIES.
return 1