1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
//! Static-string interner for the frozen detector-metadata universe.
//!
//! Built once at scanner construction from the universe of metadata
//! strings that are stable across a scan run - every detector's
//! `id`, `name`, `service`, plus the small set of `source_type`
//! literals every source backend uses (`filesystem`, `git`,
//! `git/history`, `stdin`, `s3`, `docker`, `web`, `github`, `slack`).
//!
//! At scan time, `lookup(s)` returns a pre-allocated `Arc<str>` for
//! known strings without touching the global allocator. Unknown
//! strings (file paths, commit SHAs, author names, dates) fall
//! through to the per-scan `HashSet` interner in `ScanState`.
//!
//! ## Lookup backing: single-hash `ahash` map (PERF-locality_intern-1)
//!
//! The interner previously used vyre's CHD perfect hash. CHD is O(1) in the
//! big-O sense, but its constant factor is FOUR full-key traversals per lookup:
//! two seeded FNV-1a passes (bucket + slot), one xxHash-style verify pass, and a
//! final byte-for-byte `arc == s` compare. FNV-1a folds one byte per loop
//! iteration, so on the per-match hot path (three metadata fields per emitted
//! finding) that is twelve whole-key traversals per match - the dominant cost
//! the locality tripwire pins.
//!
//! `lookup` now resolves through a single `ahash` map keyed by the interned
//! string. `ahash` mixes the key in 8-byte words with hardware multiply/rotate
//! ops rather than one function call per byte, so a lookup is ONE fast hash +
//! one bucket compare instead of three byte-loops + a compare. The map stores
//! the arena index, and we return `arena[idx].clone()` - byte-identical to the
//! string the CHD path returned. The map is built once and read-only at scan
//! time, so every rayon worker shares it lock-free (an `&HashMap` read needs no
//! synchronization). For callers that already hold the detector index, the
//! scanner caches the resolved triple by index and skips `lookup` entirely
//! (`CompiledScanner::interned_detector_metadata`).
use Arc;
/// Stable source-type identifiers every keyhog source backend
/// emits. Pre-interned because every match lands a copy of one of
/// these in `MatchLocation::source`. Keep this list in sync with
/// `keyhog_sources::Source::name()` implementations.
pub const SEED_SOURCE_TYPES: & = &;
/// Frozen static-string interner. Built once at scanner
/// construction; cloneable via `Arc` so every rayon worker shares
/// one read-only instance.
///
/// `index` maps each interned string to its slot in `arena`; it is read-only
/// after construction, so concurrent `lookup`s need no synchronization. The
/// `ahash` hasher gives a single fast (8-byte-word, hardware-mixed) hash per
/// lookup instead of the CHD perfect hash's three per-byte hash passes.