1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
//! Context-sensitive inline analysis — cache, body, and attribution types.
//!
//! Extracted from the monolithic `ssa_transfer.rs`. Contains:
//! * [`ArgTaintSig`] — compact per-arg cap signature used as a cache key.
//! * [`InlineResult`] / [`CachedInlineShape`] / [`ReturnShape`] — the
//! callsite-adapted and callsite-agnostic inline-analysis result types.
//! * [`InlineCache`] — the shared cache map keyed by
//! `(FuncKey, ArgTaintSig)`.
//! * [`CrossFileNodeMeta`] / [`CalleeSsaBody`] — the serde-able bodies
//! persisted to SQLite for cross-file context-sensitive analysis.
//! * [`populate_node_meta`] / [`rebuild_body_graph`] — bookkeeping for
//! cross-file body proxy CFGs.
//!
//! The implementation functions (`inline_analyse_callee`,
//! `apply_cached_shape`, `extract_inline_return_taint`) remain in the
//! parent `mod.rs` because they depend tightly on the block worklist, the
//! `run_ssa_taint_full` entry point, and the callee-resolution pipeline.
//!
//! # Cache key scope and origin attribution
//!
//! The inline-analysis cache below ([`InlineCache`]) is keyed by
//! `(FuncKey, ArgTaintSig)`, where [`ArgTaintSig`] encodes **per-arg
//! capability bits only** — not the identity of the source
//! [`crate::taint::domain::TaintOrigin`]s that produced those caps. The
//! stored value ([`CachedInlineShape`]) captures **only the structural**
//! shape of the callee's return taint: return caps, callee-internal
//! origins (from `Source` ops inside the callee body), and per-parameter
//! provenance flags that record which formal parameters contributed to
//! the return. Caller-specific origin identity is *not* stored — it is
//! re-attributed at cache-apply time from the current call site's
//! argument taint.
use crateCap;
use crate;
use cratePathFactReturnEntry;
use crateFuncKey;
use crate;
use NodeIndex;
use SmallVec;
use HashMap;
/// Maximum SSA blocks in a callee body before skipping inline analysis.
pub const MAX_INLINE_BLOCKS: usize = 500;
/// Compact cache key: per-arg-position cap bits (sorted, non-empty only).
///
/// Two calls with identical `ArgTaintSig` produce identical inline results
/// for soundness purposes (return caps, callee-internal sink activations).
/// Origin identity is **not** part of the key — see the module-level note
/// above on origin-attribution non-determinism.
pub );
/// Call-site-adapted result of inline-analyzing a callee.
///
/// Constructed fresh per call site by `apply_cached_shape` from a stored
/// [`CachedInlineShape`]; carries origins that point to the *current*
/// caller's source chain, not to whichever caller first populated the
/// cache entry.
pub
/// Structural (callsite-agnostic) summary of an inline-analyzed callee.
///
/// Stored in [`InlineCache`] in place of a fully-attributed `InlineResult`.
/// Origin-identity information that depends on the caller's argument chain
/// is *not* kept here; instead, [`ReturnShape::param_provenance`]
/// records which callee parameter positions contributed seed taint to the
/// return, and the actual caller origins are re-unioned in at apply time.
///
/// `None` means "this callee produced no return taint for the given
/// argument shape". A cached `None` is still a meaningful result — it
/// short-circuits re-analysis on subsequent calls with matching caps.
pub );
/// Structural parts of a non-trivial inline-analysis result.
///
/// Split from the full [`VarTaint`] so that cached entries can be re-used
/// across call sites with matching arg-cap signatures but differing source
/// origins. See the module-level note above on origin attribution.
pub
/// Cache for context-sensitive inline analysis results.
///
/// Keyed by the callee's canonical [`FuncKey`] rather than a bare function
/// name so that same-name definitions (e.g. two `process/1` methods on
/// different classes in the same file) never share or overwrite each
/// other's cache entries. Values are stored as [`CachedInlineShape`]; see
/// the module-level note above for why origins are stripped from the
/// cache value and re-attributed at apply time.
pub type InlineCache = ;
/// Drop every entry from an inline cache, marking the start of a new
/// convergence epoch.
///
/// Cross-file SCC fixed-point iteration runs pass 2 repeatedly until the
/// merged summaries stop changing. Between iterations the callee-summary
/// inputs to inline analysis may have changed, so results cached under a
/// stale snapshot must not leak into the next iteration — otherwise the
/// engine could converge to a non-fixed-point (reporting a taint result
/// that would not reproduce on a fresh run of the same file order).
///
/// The per-file inline cache is already reconstructed fresh at the top of
/// each [`crate::taint::analyse_file`] call, so in the current code this
/// call is effectively a no-op plumbing hook. Keeping the method (instead
/// of relying on ambient re-construction) makes the lifecycle explicit for
/// any future refactor that moves the cache up into the SCC orchestrator.
// semantic hook; used by tests and future shared-cache refactor
pub
/// Set-equal fingerprint of an inline cache, used by the SCC orchestrator
/// to detect when cross-file inline analysis has reached a fixed point
/// alongside summary convergence.
///
/// Returns a `HashMap` mapping each `(FuncKey, ArgTaintSig)` cache key to
/// the return-value capability bits of its inline result. `HashMap`
/// equality is set-equal (unordered), so two caches with the same entries
/// compare equal regardless of insertion order.
///
/// Origins are intentionally omitted — they are non-deterministic across
/// callers with identical caps (see the module-level note on origin
/// attribution) and would cause the fingerprint to oscillate without
/// reflecting a real precision change.
// observability hook; used by tests and future shared-cache refactor
pub
/// CFG node metadata embedded in cross-file callee bodies.
///
/// ## Why a full [`crate::cfg::NodeInfo`] lives here
///
/// An earlier variant carried only the two fields the symex executor reads
/// (`bin_op`, `labels`). That was sufficient for symex but not for the
/// taint engine, which reads ~20 fields off `cfg[inst.cfg_node]` across
/// `transfer_inst`, `collect_block_events`, `compute_succ_states`, and
/// helpers (callee name, `arg_uses`, `arg_callees`, `call_ordinal`,
/// `outer_callee`, `kwargs`, `arg_string_literals`, `ast.span`,
/// `ast.enclosing_func`, `condition_*`, `all_args_literal`, `catch_param`,
/// `parameterized_query`, `in_defer`, `cast_target_type`, `string_prefix`,
/// `taint.uses`, `taint.defines`, `taint.extra_defines`,
/// `taint.const_text`, …). Rather than shuttling each of those through a
/// `CfgView` accessor at every callsite, we store a full serde-able
/// [`crate::cfg::NodeInfo`] snapshot here so the indexed-scan path can
/// rehydrate an equivalent `Cfg` on load (see [`rebuild_body_graph`]).
/// Both scan paths then feed the same `&Cfg` into the taint engine, and
/// cross-file inline fires regardless of whether the body came from pass
/// 1 or from SQLite.
/// Pre-lowered and optimized SSA body for a function,
/// ready for context-sensitive re-analysis with different argument taint.
///
/// For intra-file use, `node_meta` is empty and the original CFG is used.
/// For cross-file persistence, `node_meta` carries the minimal CFG
/// metadata needed by the symex executor.
/// Populate `node_meta` from the original CFG for cross-file persistence.
///
/// Returns `true` if all referenced NodeIndex values were resolved
/// successfully. Returns `false` if any node was out of bounds (body is
/// ineligible for cross-file use).
/// Synthesize a proxy [`crate::cfg::Cfg`] from `node_meta` so the taint
/// engine can index `cfg[inst.cfg_node]` uniformly on the indexed-scan
/// path.
///
/// When the callee body was loaded from SQLite, `body_graph` is `None`
/// (it is `#[serde(skip)]`), but `node_meta` carries a full
/// [`crate::cfg::NodeInfo`] for every referenced NodeIndex (see
/// [`populate_node_meta`]). This helper rebuilds a petgraph `Cfg` with
/// nodes at exactly the right NodeIndex positions so the taint engine's
/// existing indexing works without change.
///
/// Returns `true` if a proxy graph was freshly installed. Idempotent:
/// subsequent calls are cheap no-ops once `body_graph` is `Some`. No-op
/// for intra-file bodies (which arrive with `body_graph` already set and
/// `node_meta` empty).