1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
//! Lazy retry wrappers for the boot-race-sensitive accessors.
//!
//! Three coordinator-state caches need to be populated lazily because
//! their construction depends on guest-memory bootstrap symbols that
//! the kernel only writes during boot (`page_offset_base`,
//! `pgtable_l5_enabled`, `init_top_pgt`, `__per_cpu_offset[]`):
//!
//! 1. [`crate::monitor::bpf_map::GuestMemMapAccessorOwned`] — backs
//! every BPF map discovery the freeze coord performs (probe `.bss`
//! PA cache, watchpoint target resolution, dump map rendering).
//! 2. [`crate::monitor::bpf_prog::GuestMemProgAccessorOwned`] — backs
//! the prog-runtime-stats capture in `dump_state`.
//! 3. The per-CPU offset array — read once via
//! [`crate::monitor::symbols::read_per_cpu_offsets`] and cached
//! for the rest of the run; gated on every entry being non-zero so
//! a partially-online VM doesn't poison the cache with a CPU 0
//! alias for not-yet-online CPUs (the rq PA invariant).
//!
//! All three retry blocks share the same `(mem, vmlinux, tcr_el1,
//! cr3)` input shape, which is the GuestKernel handshake context the
//! coordinator captures at run_vm scope. Lifting them into named
//! `pub(super) fn` lets unit tests drive the boot-race window
//! deterministically: a test constructs a `GuestMem` and feeds
//! controlled `(tcr, cr3)` snapshots through `try_init_*` to assert
//! the cache transitions from None → Some on the first successful
//! attempt and stays Some thereafter.
//!
//! # No state-machine semantics change
//!
//! Each `try_init_*` is byte-for-byte identical to the inline
//! retry block: same Acquire load on the cr3 / tcr_el1 atoms (the
//! cr3 cache may flip mid-run as the BSP loop refines the
//! page-table root, so the load happens INSIDE the helper —
//! capturing it pre-call would freeze a stale value), same gate on
//! every per-CPU offset slot being non-zero before caching. The
//! accessor helpers return the constructor's `anyhow::Result`
//! verbatim so the caller can capture the most recent error
//! message and surface it as a warn after enough retries (the
//! per-CPU offsets helper still returns `Option` because its
//! failure mode includes the "any slot still zero" non-error
//! retry condition).
use Arc;
use ;
use cratemonitor;
/// Try to construct a [`monitor::bpf_map::GuestMemMapAccessorOwned`]
/// for the lifetime of `mem` and `vmlinux`. Returns `Err` if the
/// constructor's `GuestKernel` handshake fails (still-booting guest
/// has not yet populated the boot-time symbols); caller leaves its
/// cache `None`, retries on the next scan tick, and tracks the most
/// recent `Err` so a permanent failure (e.g. stripped vmlinux missing
/// `map_idr`) can be surfaced as a warn after enough retries instead
/// of disappearing silently behind `.ok()`.
///
/// `tcr_el1` is `Option<&Arc<AtomicU64>>` because aarch64 holds the
/// register cache while x86_64 always passes `None`. The Acquire load
/// happens INSIDE the helper so a fresh value is observed each
/// iteration — capturing it pre-call would freeze a stale snapshot
/// the BSP loop hasn't refined yet.
///
/// `data` is the cached vmlinux bytes the coordinator reads once at
/// run scope; the helper re-parses the ELF on each retry (parsing the
/// cached bytes is microseconds, the original `std::fs::read` was
/// 14-28 s on cold disk cache). `vmlinux` is still passed through for
/// the BTF sidecar cache lookup inside `BpfMapOffsets::from_elf`.
pub
pub
/// Resolve and cache the per-CPU offset array. Returns `Some(offsets)`
/// only when every slot is non-zero so a partially-online VM does
/// not poison the cache with a CPU 0 alias for not-yet-online CPUs
/// (rq PA invariant; fix for `compute_rq_pas` wraparound when a
/// `pco_offset == 0` is fed downstream). Returns `None` when:
///
/// * `per_cpu_offset_kva == 0` (caller's symbol cache had no entry
/// for `__per_cpu_offset` — typically a stripped vmlinux image),
/// OR
/// * any slot is still zero (caller leaves cache `None`, retries
/// next scan tick).
///
/// Takes a pre-resolved `per_cpu_offset_kva` from the coordinator's
/// `dump_cpu_time_symbols` cache instead of re-running
/// `KernelSymbols::from_vmlinux` on every scan tick. The previous
/// in-helper parse re-read the entire vmlinux ELF (50 MB+) and re-
/// built every symbol-table entry every 100 ms while waiting for
/// the per-CPU areas to come up — visible as ~MB/s of constant
/// post-boot file I/O on every ktstr run. The KVA is fixed at
/// kernel link time and the caller already resolved it once at
/// coord start; passing it through eliminates the redundant work
/// without changing the post-resolution invariants.
///
/// `phys_base` is sourced by the caller from the owned accessor's
/// `GuestKernel::phys_base()` when it has landed; otherwise `0`
/// (correct on non-KASLR boots and the bootstrap value before the
/// accessor's page-table walk has resolved phys_base for the live
/// kernel).
pub