1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
//! Offline AWS account-ID recovery and canary-token classification.
//!
//! This is the **single source of truth** for two credential-string-only facts
//! about an AWS access-key ID, shared by every keyhog crate (scanner attaches
//! them as finding metadata with no verify; verifier consults the canary check
//! to refuse tripping a canary on `--verify`). It lives in `keyhog-core` — the
//! one crate both `keyhog-scanner` and `keyhog-verifier` depend on — so there is
//! exactly one decode and one canary list, never a fork.
//!
//! 1. **Account decode.** Every modern AWS access-key ID (`AKIA…` long-term,
//! `ASIA…` temporary STS) has the 12-digit account number mathematically
//! embedded in it, recoverable with a pure base32-decode + bit-shift — NO
//! network call, NO STS `GetCallerIdentity`, and it works on LIVE *and*
//! revoked keys. Algorithm matches the trufflesecurity write-up
//! <https://trufflesecurity.com/blog/research-uncovers-aws-account-numbers-hidden-in-access-keys>:
//! drop the 4-char prefix; base32-decode the body; the first 6 decoded bytes
//! are a big-endian u48; `account = (u48 & 0x7fff_ffff_ff80) >> 7`, rendered
//! as a 12-digit zero-padded decimal string.
//!
//! 2. **Canary classification.** An access key whose decoded account belongs to
//! a known canary issuer (canarytokens.org / Thinkst and off-brand clones) is
//! a tripwire: any live verification alerts whoever planted it. The baseline
//! issuer list is Tier-B data embedded from `data/aws-canary-accounts.toml`
//! and unioned at first use with a runtime-extension file pointed to by
//! `KEYHOG_AWS_CANARY_ACCOUNTS`. Baseline source:
//! <https://trufflesecurity.com/blog/canaries>.
use ;
/// The two access-key-ID prefixes whose 12-digit account number is embedded.
/// `AKIA` is a long-term IAM key, `ASIA` a temporary STS session key. Both use
/// the identical embedding, so both decode with the same routine.
const AWS_KEY_ID_PREFIXES: = ;
/// Length of a canonical AWS access-key ID: 4-char prefix + 16 base32 chars.
const AWS_KEY_ID_LEN: usize = 20;
/// The 48-bit mask + 7-bit right shift that extracts the account number from
/// the leading 6 decoded bytes. Documented by trufflesecurity; the low 7 bits
/// are a non-account discriminator, and bit 47 is always 0 for the account.
const ACCOUNT_MASK: u64 = 0x7fff_ffff_ff80;
const ACCOUNT_SHIFT: u64 = 7;
/// Decode an RFC-4648 standard base32 character (`A`-`Z`, `2`-`7`) to its 5-bit
/// value. Returns `None` for any out-of-alphabet byte (lowercase, padding,
/// digits 0/1/8/9), which makes the whole decode fail closed on a malformed id.
/// Recover the 12-digit AWS account ID embedded in an access-key ID, fully
/// offline. Returns `None` when `key_id` is not a well-formed `AKIA…`/`ASIA…`
/// access-key ID (wrong length, wrong prefix, or a non-base32 body), so a
/// caller can blindly try every credential and only act on `Some`.
///
/// The returned string is always exactly 12 ASCII digits, zero-padded — AWS
/// account numbers are 12-digit identifiers and the leading-zero form (e.g.
/// `052310077262`) is the canonical rendering, matching the STS `Account`
/// field and trufflehog's output.
/// The Tier-B baseline canary account list, compiled into the binary from
/// `data/aws-canary-accounts.toml`, unioned at first use with any runtime
/// extension file pointed to by `KEYHOG_AWS_CANARY_ACCOUNTS`.
///
/// Soft-fails to an empty set so a corrupted data file degrades canary
/// awareness rather than crashing.
static CANARY_ACCOUNTS: LazyLock = new;
/// `[canary]`/`[knockoff]` TOML shape shared by the baseline and any runtime
/// extension file. Both tables are merged into the same account set — keyhog
/// treats off-brand knockoffs identically to first-party canaries.
/// Parse one canary TOML document and union its accounts into `set`. Trims each
/// account so whitespace in a hand-edited extension file never silently misses.
/// True when `account_id` (a 12-digit AWS account string) belongs to a known
/// canary-token issuer.
/// True when `key_id` is a decodable AWS access-key ID whose offline-decoded
/// account belongs to a known canary issuer. The verifier uses this to refuse
/// sending a live probe (which would trip the canary) without re-implementing
/// the decode.
/// Operator-facing note attached to a canary finding so the report explains why
/// verification was skipped. Mirrors trufflehog's responder message.
pub const CANARY_MESSAGE: &str =
"AWS canary token (canarytokens.org / Thinkst-style). Do NOT verify: a \
verification request alerts whoever planted it. See \
https://trufflesecurity.com/canaries";
/// Build the offline metadata for an AWS-access-key finding: always
/// `{ "account_id": "<12 digits>" }` for a decodable `AKIA…`/`ASIA…` key, plus
/// `{ "is_canary": "true", "canary_message": <note> }` when the decoded account
/// belongs to a known canary issuer. `None` when `credential` is not a
/// well-formed AWS access-key ID.
///
/// The `HashMap<String, String>` shape lets a [`crate::VerifiedFinding`]'s
/// `metadata` absorb it directly, with no verify and no network.