lucet_runtime_internals/instance/execution.rs
1//! The `execution` module contains state for an instance's execution, and exposes functions
2//! building that state into something appropriate for safe use externally.
3//!
4//! So far as state tracked in this module is concerned, there are two key items: "terminability"
5//! and "execution domain".
6//!
7//! ## Terminability
8//! This specifically answers the question "is it safe to initiate termination of this instance
9//! right now?". An instance becomes terminable when it begins executing, and stops being
10//! terminable when it is terminated, or when it stops executing. Termination does not directly map
11//! to the idea of guest code currently executing on a processor, because termination can occur
12//! during host code, or while a guest has yielded execution. As a result, termination can only be
13//! treated as a best-effort to deschedule a guest, and is typically quick when it occurs during
14//! guest code execution, or immediately upon resuming execution of guest code (exiting host code,
15//! or resuming a yielded instance).
16//!
17//! ## Execution Domain
18//! Execution domains allow us to distinguish what an appropriate mechanism to signal termination
19//! is. This means that changing of an execution domain must be atomic - it would be an error to
20//! read the current execution domain, continue with that domain to determine temination, and
21//! simultaneously for execution to continue possibly into a different execution domain. For
22//! example, beginning termination directly at the start of a hostcall, where sending `SIGALRM` may
23//! be appropriate, while the domain switches to `Hostcall` and is no longer appropriate for
24//! signalling, would be an error.
25//!
26//! ## Instance Lifecycle and `KillState`
27//!
28//! And now we can enumerate interleavings of execution and timeout, to see the expected state at
29//! possible points of interest in an instance's lifecycle:
30//!
31//! * `Instance created`
32//! - terminable: `false`
33//! - execution_domain: `Guest`
34//! * `Instance::run called`
35//! - terminable: `true`
36//! - execution_domain: `Guest`
37//! * `Instance::run executing`
38//! - terminable: `true, or false`
39//! - execution_domain: `Guest, Hostcall, or Terminated`
40//! - `execution_domain` will only be `Guest` when executing guest code, only be `Hostcall` when
41//! executing a hostcall, but may also be `Terminated` while in a hostcall to indicate that it
42//! should exit when the hostcall completes.
43//! - `terminable` will be false if and only if `execution_domain` is `Terminated`.
44//! * `Instance::run returns`
45//! - terminable: `false`
46//! - execution_domain: `Guest, Hostcall, or Terminated`
47//! - `execution_domain` will be `Guest` when the initial guest function returns, `Hostcall` when
48//! terminated by `lucet_hostcall_terminate!`, and `Terminated` when exiting due to a termination
49//! request.
50//! * `Guest function executing`
51//! - terminable: `true`
52//! - execution_domain: `Guest`
53//! * `Guest function returns`
54//! - terminable: `true`
55//! - execution_domain: `Guest`
56//! * `Hostcall called`
57//! - terminable: `true`
58//! - execution_domain: `Hostcall`
59//! * `Hostcall executing`
60//! - terminable: `true`
61//! - execution_domain: `Hostcall, or Terminated`
62//! - `execution_domain` will typically be `Hostcall`, but may be `Terminated` if termination of
63//! the instance is requested during the hostcall.
64//! - `terminable` will be false if and only if `execution_domain` is `Terminated`.
65//! * `Hostcall yields`
66//! - This is a specific point in "Hostcall executing" and has no further semantics.
67//! * `Hostcall resumes`
68//! - This is a specific point in "Hostcall executing" and has no further semantics.
69//! * `Hostcall returns`
70//! - terminable: `true`
71//! - execution_domain: `Guest`
72//! - `execution_domain` may be `Terminated` before returning, in which case `terminable` will be
73//! false, but the hostcall would then exit. If a hostcall successfully returns to its caller it
74//! was not terminated, so the only state an instance will have after returning from a hostcall
75//! will be that it's executing terminable guest code.
76
77use libc::{pthread_kill, pthread_t, SIGALRM};
78use std::mem;
79use std::sync::atomic::{AtomicBool, Ordering};
80use std::sync::{Condvar, Mutex, Weak};
81
82use crate::instance::{Instance, TerminationDetails};
83
84/// All instance state a remote kill switch needs to determine if and how to signal that execution
85/// should stop.
86///
87/// Some definitions for reference in this struct's documentation:
88/// * "stopped" means "stop executing at some point before reaching the end of the entrypoint
89/// wasm function".
90/// * "critical section" means what it typically means - an uninterruptable region of code. The
91/// detail here is that currently "critical section" and "hostcall" are interchangeable, but in
92/// the future this may change. Hostcalls may one day be able to opt out of criticalness, or
93/// perhaps guest code may include critical sections.
94///
95/// "Stopped" is a particularly loose word here because it encompasses the worst case: trying to
96/// stop a guest that is currently in a critical section. Because the signal will only be checked
97/// when exiting the critical section, the latency is bounded by whatever embedder guarantees are
98/// made. In fact, it is possible for a kill signal to be successfully sent and still never
99/// impactful, if a hostcall itself invokes `lucet_hostcall_terminate!`. In this circumstance, the
100/// hostcall would terminate the instance if it returned, but `lucet_hostcall_terminate!` will
101/// terminate the guest before the termination request would even be checked.
102pub struct KillState {
103 /// Can the instance be terminated? This must be `true` only when the instance can be stopped.
104 /// This may be false while the instance can safely be stopped, such as immediately after
105 /// completing a host->guest context swap. Regions such as this should be minimized, but are
106 /// not a problem of correctness.
107 ///
108 /// Typically, this is true while in any guest code, or hostcalls made from guest code.
109 terminable: AtomicBool,
110 /// The kind of code is currently executing in the instance this `KillState` describes.
111 ///
112 /// This allows a `KillSwitch` to determine what the appropriate signalling mechanism is in
113 /// `terminate`. Locks on `execution_domain` prohibit modification while signalling, ensuring
114 /// both that:
115 /// * we don't enter a hostcall while someone may decide it is safe to signal, and
116 /// * no one may try to signal in a hostcall-safe manner after exiting a hostcall, where it
117 /// may never again be checked by the guest.
118 execution_domain: Mutex<Domain>,
119 /// The current `thread_id` the associated instance is running on. This is the TID where
120 /// `SIGALRM` will be sent if the instance is killed via `KillSwitch::terminate` and a signal
121 /// is an appropriate mechanism.
122 thread_id: Mutex<Option<pthread_t>>,
123 /// `tid_change_notifier` allows functions that may cause a change in `thread_id` to wait,
124 /// without spinning, for the signal to be processed.
125 tid_change_notifier: Condvar,
126}
127
128pub unsafe extern "C" fn exit_guest_region(instance: *mut Instance) {
129 let terminable = (*instance)
130 .kill_state
131 .terminable
132 .swap(false, Ordering::SeqCst);
133 if !terminable {
134 // Something else has taken the terminable flag, so it's not safe to actually exit a
135 // guest context yet. Because this is called when exiting a guest context, the
136 // termination mechanism will be a signal, delivered at some point (hopefully soon!).
137 // Further, because the termination mechanism will be a signal, we are constrained to
138 // only signal-safe behavior.
139 //
140 // For now, hang indefinitely, waiting for the sigalrm to arrive.
141
142 loop {}
143 }
144}
145
146impl KillState {
147 pub fn new() -> KillState {
148 KillState {
149 terminable: AtomicBool::new(false),
150 tid_change_notifier: Condvar::new(),
151 execution_domain: Mutex::new(Domain::Guest),
152 thread_id: Mutex::new(None),
153 }
154 }
155
156 pub fn is_terminable(&self) -> bool {
157 self.terminable.load(Ordering::SeqCst)
158 }
159
160 pub fn enable_termination(&self) {
161 self.terminable.store(true, Ordering::SeqCst);
162 }
163
164 pub fn disable_termination(&self) {
165 self.terminable.store(false, Ordering::SeqCst);
166 }
167
168 pub fn terminable_ptr(&self) -> *const AtomicBool {
169 &self.terminable as *const AtomicBool
170 }
171
172 pub fn begin_hostcall(&self) {
173 // Lock the current execution domain, so we can update to `Hostcall`.
174 let mut current_domain = self.execution_domain.lock().unwrap();
175 match *current_domain {
176 Domain::Guest => {
177 // Guest is the expected domain until this point. Switch to the Hostcall
178 // domain so we know to not interrupt this instance.
179 *current_domain = Domain::Hostcall;
180 }
181 Domain::Hostcall => {
182 panic!(
183 "Invalid state: Instance marked as in a hostcall while entering a hostcall."
184 );
185 }
186 Domain::Terminated => {
187 panic!("Invalid state: Instance marked as terminated while in guest code. This should be an error.");
188 }
189 }
190 }
191
192 pub fn end_hostcall(&self) -> Option<TerminationDetails> {
193 let mut current_domain = self.execution_domain.lock().unwrap();
194 match *current_domain {
195 Domain::Guest => {
196 panic!("Invalid state: Instance marked as in guest code while exiting a hostcall.");
197 }
198 Domain::Hostcall => {
199 *current_domain = Domain::Guest;
200 None
201 }
202 Domain::Terminated => {
203 // The instance was stopped in the hostcall we were executing.
204 debug_assert!(!self.terminable.load(Ordering::SeqCst));
205 std::mem::drop(current_domain);
206 Some(TerminationDetails::Remote)
207 }
208 }
209 }
210
211 pub fn schedule(&self, tid: pthread_t) {
212 *self.thread_id.lock().unwrap() = Some(tid);
213 self.tid_change_notifier.notify_all();
214 }
215
216 pub fn deschedule(&self) {
217 *self.thread_id.lock().unwrap() = None;
218 self.tid_change_notifier.notify_all();
219 }
220}
221
222pub enum Domain {
223 Guest,
224 Hostcall,
225 Terminated,
226}
227
228/// An object that can be used to terminate an instance's execution from a separate thread.
229pub struct KillSwitch {
230 state: Weak<KillState>,
231}
232
233#[derive(Debug, PartialEq)]
234pub enum KillSuccess {
235 Signalled,
236 Pending,
237}
238
239#[derive(Debug, PartialEq)]
240pub enum KillError {
241 NotTerminable,
242}
243
244type KillResult = Result<KillSuccess, KillError>;
245
246impl KillSwitch {
247 pub(crate) fn new(state: Weak<KillState>) -> Self {
248 KillSwitch { state }
249 }
250
251 /// Signal the instance associated with this `KillSwitch` to stop, if possible.
252 ///
253 /// The returned `Result` only describes the behavior taken by this function, not necessarily
254 /// what caused the associated instance to stop.
255 ///
256 /// As an example, if a `KillSwitch` fires, sending a SIGALRM to an instance at the same
257 /// moment it begins handling a SIGSEGV which is determined to be fatal, the instance may
258 /// stop with `State::Faulted` before actually _handling_ the SIGALRM we'd send here. So the
259 /// host code will see `State::Faulted` as an instance state, where `KillSwitch::terminate`
260 /// would return `Ok(KillSuccess::Signalled)`.
261 pub fn terminate(&self) -> KillResult {
262 // Get the underlying kill state. If this fails, it means the instance exited and was
263 // discarded, so we can't terminate.
264 let state = self.state.upgrade().ok_or(KillError::NotTerminable)?;
265
266 // Attempt to take the flag indicating the instance may terminate
267 let terminable = state.terminable.swap(false, Ordering::SeqCst);
268 if !terminable {
269 return Err(KillError::NotTerminable);
270 }
271
272 // we got it! we can signal the instance.
273
274 // Now check what domain the instance is in. We can signal in guest code, but want
275 // to avoid signalling in host code lest we interrupt some function operating on
276 // guest/host shared memory, and invalidate invariants. For example, interrupting
277 // in the middle of a resize operation on a `Vec` could be extremely dangerous.
278 //
279 // Hold this lock through all signalling logic to prevent the instance from
280 // switching domains (and invalidating safety of whichever mechanism we choose here)
281 let mut execution_domain = state.execution_domain.lock().unwrap();
282
283 let result = match *execution_domain {
284 Domain::Guest => {
285 let mut curr_tid = state.thread_id.lock().unwrap();
286 // we're in guest code, so we can just send a signal.
287 if let Some(thread_id) = *curr_tid {
288 unsafe {
289 pthread_kill(thread_id, SIGALRM);
290 }
291
292 // wait for the SIGALRM handler to deschedule the instance
293 //
294 // this should never actually loop, which would indicate the instance
295 // was moved to another thread, or we got spuriously notified.
296 while curr_tid.is_some() {
297 curr_tid = state.tid_change_notifier.wait(curr_tid).unwrap();
298 }
299 Ok(KillSuccess::Signalled)
300 } else {
301 panic!("logic error: instance is terminable but not actually running.");
302 }
303 }
304 Domain::Hostcall => {
305 // the guest is in a hostcall, so the only thing we can do is indicate it
306 // should terminate and wait.
307 *execution_domain = Domain::Terminated;
308 Ok(KillSuccess::Pending)
309 }
310 Domain::Terminated => {
311 // Something else (another KillSwitch?) has already signalled this instance
312 // to exit when it has completed its hostcall. Nothing to do here.
313 Err(KillError::NotTerminable)
314 }
315 };
316 // explicitly drop the lock to be clear about how long we want to hold this lock, which is
317 // until all signalling is complete.
318 mem::drop(execution_domain);
319 result
320 }
321}