1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
// What: `pub fn stacked_quantifier(src: &str) -> Option<String>`
// detects two regex quantifier suffixes appearing back-to-back
// without an atom or group between them: `a**`, `\D{5,11}{5,11}`,
// `(?:a){2}{3}`, `a*?+`, `a*??`. Returns `Some(reason)` when the
// shape is found and `None` otherwise.
// Why: Stacked quantifiers expand multiplicatively in NFA-based
// regex engines. The `regex` crate parses `a{5,11}{5,11}` as
// "5-11 reps of (5-11 reps of `a`)" -- 25-121 reps -- and
// five-deep stacking like the fuzz-discovered slow-unit
// `\D{5,11}{5,11}{5,11}{5,11}{5,11}\D*****aa` reaches 5^5
// through 11^5 (~161,051 reps). With `(?u)` widening `\D`
// to the full non-decimal-digit Unicode class, NFA size
// exceeds the crate's 256 MB limit and `RegexBuilder::build`
// wall-clocks at 1.4-1.5 seconds before erroring with
// `CompiledTooBig`. The plain path tries `unicode(false)`
// first then `unicode(true)`, so the total compile call
// takes ~2.9s -- well past libFuzzer's `report_slow_units`
// threshold of 10s once ASAN overhead and the fuzz-corpus
// replay loop are accounted for. The resharp parser rejects
// stacked quantifiers in microseconds with
// `UnsupportedResharpRegex`, but the slow-unit shape lacks
// any `requires_resharp` trigger (no `&`, `_`, `~(`,
// lookaround), so it never reaches resharp.
//
// Stacked quantifiers are virtually never a legitimate
// authored pattern: the regex crate accepts them by treating
// the outer as a quantifier on the inner-quantified atom,
// but no realistic secret-detection rule needs this shape.
// Rejecting at the pre-validator level is conservative and
// keeps the fuzz target moving past inputs that would
// otherwise burn the entire budget on one compile attempt.
//
// Detection runs as a single-pass byte walker over `src`
// and skips:
// - escape pairs `\X` (so `\{` is a literal `{`, not a
// quantifier-brace start);
// - character class interiors `[...]` (where the
// metacharacters are literal bytes);
// - the `?` byte immediately after `(` (so `(?:`, `(?i)`,
// `(?<=`, `(?P<name>` are not counted as a `?`
// quantifier).
// A quantifier is recognised as one of `*`, `+`, `?`, or
// `{` followed by an ASCII digit (so a literal `{abc}` is
// NOT treated as a quantifier). After a primary quantifier,
// a single trailing `?` (lazy) or `+` (possessive) modifier
// is consumed as part of the same quantifier; a second
// quantifier byte after that is the stacked case we flag.
//
// Place the call BEFORE `requires_resharp` in
// `compile_rule_src` so the rejection reads as "this
// pattern is structurally bad", not "the plain path
// specifically dislikes it". Either engine would either
// reject the shape outright (resharp) or wall-clock on it
// (regex crate); the structural rejection is the honest
// framing.
// TS map: `function stackedQuantifier(src: string): string | null`.
//
// In TS you'd write (pseudocode):
// ```ts
// function stackedQuantifier(src: string): string | null {
// // walk bytes; track in_class and just_consumed_quant;
// // skip \X escapes and class interiors and `(?` group prefixes;
// // when a quantifier start is seen and just_consumed_quant is true,
// // return Some(reason); otherwise consume the quantifier and any
// // single trailing `?`/`+` modifier, set just_consumed_quant=true.
// }
// ```
// What: `pub fn nested_grouped_quantifier(src: &str) -> Option<String>`
// detects regex source containing four or more consecutive
// `)`+quantifier adjacencies. The shape the fuzz target
// actually emits looks like
// `(?:(?:(?:(?:(?:\d){5,11}){5,11}){5,11}){5,11}){5,11}` --
// five `(?:` opens, an inner atom, then five `){5,11}`
// close+quantifier pairs back-to-back. Sibling shape
// `(?:(?:(?:(?:(?:\d)*)*)*)*)*` is the same with `*`
// instead of `{5,11}`. The `Node::Quant` renderer at
// `fuzz/src/generators.rs:1292` ALWAYS wraps a quantified
// atom in `(?:...)`, so the fuzz generator can never emit
// the bare-stacked shape `stacked_quantifier` was written
// for -- the grouped form is the actual blowup vector.
// Why: Each `{N,M}` (or `*`/`+`/`?`) on a group multiplies the
// inner NFA's state count by up to M (or by a large but
// bounded factor for `*`/`+`). Five-deep multiplication
// reaches `11^5 = 161051` repetitions of the inner atom;
// combined with `(?u)` widening `\d` to the full Unicode
// decimal-digit class (~700 codepoint sub-states), the
// regex crate's NFA construction exceeds the 256 MB
// DFA size limit and takes ~3 s of wall to error with
// `CompiledTooBig`. Under ASAN in the fuzz target, the
// single iteration crosses libFuzzer's `report_slow_units`
// threshold of 10 s and halts the run before the
// interesting (?u)-Unicode shapes for the e49d8694
// case-fold soundness bug can be discovered. Rejecting
// depth-4 chains at the pre-validator level rejects the
// shape in microseconds with a clear message, keeping
// the fuzz target moving.
//
// Threshold: depth 4 (= chain of four `){quant}` pairs).
// Empirically depth 3 still compiles in milliseconds even
// under Unicode `\D`/`\d`; depth 4 starts to noticeably
// bloat NFA construction; depth 5+ reliably wall-clocks.
// Catching at 4 is conservative enough to spare any
// plausibly-authored rule (real secret-detection rules
// seldom nest beyond 2 levels of quantified groups) while
// covering both halves of the slow-unit source (chain=5
// for each half; flags as soon as chain reaches 4).
//
// Detection runs as a single-pass byte walker over `src`:
// - Escape pairs `\X` and class interiors `[...]` are
// skipped (escapes are atoms; class bytes are literal
// and quantifiers don't apply structurally there).
// - Any `(` resets the chain (a new group opening breaks
// a `)`+quant chain). The optional `?` after `(` for
// `(?...)` group prefixes (non-capturing `(?:`, flag
// groups `(?i)`, lookarounds `(?=`/`(?!`/`(?<=`/`(?<!`,
// named captures `(?P<...>`/`(?<...>`, comments `(?#...)`)
// is skipped so it isn't misread as a `?` quantifier.
// - On `)`, peek the next byte: if it is a quantifier
// start (`*`/`+`/`?`/`{` followed by a digit), consume
// the quantifier and an optional `?` lazy modifier;
// increment the chain counter. If the chain reaches
// THRESHOLD, return Some(reason).
// - On `)` with no quantifier following, reset the chain
// (a close that is not quantified breaks the pattern).
// - Any other byte (atom, anchor, `|`, literal) resets
// the chain.
//
// Place the call alongside `stacked_quantifier` in
// `compile_rule_src` -- both are structural pre-validators
// that apply regardless of engine routing, and both
// rejection messages read as "this source shape is
// structurally bad" rather than "the plain path
// specifically dislikes it". Either engine would either
// reject the shape outright or wall-clock on it.
// TS map: `function nestedGroupedQuantifier(src: string): string | null`.
//
// In TS you'd write (pseudocode):
// ```ts
// function nestedGroupedQuantifier(src: string): string | null {
// // walk bytes; skip \X escapes and class interiors;
// // on `(` reset chain (and skip optional `?` after);
// // on `)` peek next byte: if quantifier start, consume it
// // (plus optional lazy `?`) and chain++; if chain >= 4 return reason;
// // otherwise reset chain. Any other byte resets chain.
// }
// ```