1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
//! SIMD-accelerated prefilter for the top N most common secret patterns.
//!
//! `simdsieve` provides 50+ GB/s scanning for up to 8 patterns using AVX-512/AVX2.
//! This module integrates it as Layer 1 of the scanning pipeline:
//! hot patterns are checked first, and if found, we can often skip AC/Regex.
/// Common high-value secret prefixes that trigger Layer 1 SIMD.
pub const HOT_PATTERNS: & = &;
/// `service` field per hot pattern - the CANONICAL service of the detector
/// this fast-path stands in for, NOT an internal `*_key` label. The hot path
/// is a perf optimization, not a distinct detector: a leaked `AKIA…` is an
/// `aws-access-key` finding however the engine found it. Before 2026-05-29
/// these were `aws_key`/`github_pat`/… so the SAME secret surfaced as
/// `hot-aws_key`/service `aws_key` on Linux (Hyperscan path) but
/// `aws-access-key`/service `aws` on macOS/Windows (portable, no hot path) -
/// a cross-platform id divergence. Emitting canonical identity here makes all
/// platforms agree and matches what `keyhog explain` already resolves hot ids
/// to. Index-parallel with HOT_PATTERNS / the two arrays below.
pub const HOT_PATTERN_NAMES: & = &;
/// Canonical `detector_id` per hot pattern - the id of the named detector the
/// fast-path represents, so scan output (JSON/SARIF/text/baselines) is
/// identical regardless of which engine path made the find. `sq0csp-` keeps
/// `hot-square_secret`: no standalone square-secret detector exists yet, so it
/// is genuinely fast-path-only (`keyhog explain` documents this). Static (not
/// `format!`-per-match) to keep the per-hit allocation the perf audit removed.
///
/// `ASIA` maps to `aws-access-key`, NOT `aws-session-token`: an `ASIA…` string
/// is a temporary STS *access key ID* (the same shape as `AKIA…` - the
/// `aws-access-key` detector regex is literally `(?-i)(AKIA|ASIA)[0-9A-Z]{16}`
/// and the verifier lists `ASIA` in `AWS_VALID_ACCESS_KEY_PREFIXES`). The
/// *session token* is the separate long base64 blob the `aws-session-token`
/// detector matches via the `AWS_SESSION_TOKEN=`/`X-Amz-Security-Token=`
/// anchors - none of which begin with `ASIA`. The old `ASIA→aws-session-token`
/// mapping mis-attributed every `ASIA` key ID and (once the hot path gained
/// precise-regex validation) would have rejected them outright, since the
/// session-token regex can never match an `ASIA…` literal.
pub const HOT_PATTERN_DETECTOR_IDS: & = &;
/// Canonical human-readable detector name per hot pattern (matches the `name`
/// field of the corresponding `detectors/*.toml`). Square has no canonical
/// detector, so it carries a plain "Square Secret" label.
pub const HOT_PATTERN_DISPLAY_NAMES: & = &;
/// Build a precise-regex validator for each hot-pattern slot, index-parallel
/// with [`HOT_PATTERNS`].
///
/// The hot path is a literal-prefix prefilter: a 50+ GB/s SIMD sieve finds
/// `ghp_`/`xoxp-`/`AKIA`/… and historically emitted a `Critical` finding
/// gated ONLY by a per-prefix length floor (`PER_PATTERN_MIN_LEN` in
/// `engine/hot_patterns.rs`). A length floor is a crude proxy for the
/// detector's real regex and admits wrong-character-class tokens the precise
/// pattern rejects:
/// - `ghp_THIS_HAS_UNDERSCORES_IN_IT_NOT_A_TOKEN0` (43 ≥ 40 floor, but `_`
/// is not in `[A-Za-z0-9]` and the body is 39 chars, not 36), and
/// - `xoxp-123-456-789-abc` (20 ≥ 16 floor, but the segments are far short
/// of the 10-13-digit Slack shape)
/// both cleared the floor and surfaced as `Critical` false positives that the
/// AC+regex path correctly rejected. Validating each candidate against the
/// detector's own regex (anchored at the candidate start) restores parity: the
/// fast path emits exactly what the precise path would, just sooner.
///
/// A slot is `None` only when its `HOT_PATTERN_DETECTOR_IDS` entry names no
/// loaded detector (`hot-square_secret`, genuinely fast-path-only); that slot
/// keeps the length-floor as its sole gate.
///
/// This module (`mod simdsieve_prefilter`) and the sole caller in
/// `engine::compile` are both gated on `feature = "simdsieve"`, so whenever
/// this function is compiled its caller is too: no `#[allow(dead_code)]` is
/// needed.