lucet_runtime_internals/instance/
execution.rs

1//! The `execution` module contains state for an instance's execution, and exposes functions
2//! building that state into something appropriate for safe use externally.
3//!
4//! So far as state tracked in this module is concerned, there are two key items: "terminability"
5//! and "execution domain".
6//!
7//! ## Terminability
8//! This specifically answers the question "is it safe to initiate termination of this instance
9//! right now?". An instance becomes terminable when it begins executing, and stops being
10//! terminable when it is terminated, or when it stops executing. Termination does not directly map
11//! to the idea of guest code currently executing on a processor, because termination can occur
12//! during host code, or while a guest has yielded execution. As a result, termination can only be
13//! treated as a best-effort to deschedule a guest, and is typically quick when it occurs during
14//! guest code execution, or immediately upon resuming execution of guest code (exiting host code,
15//! or resuming a yielded instance).
16//!
17//! ## Execution Domain
18//! Execution domains allow us to distinguish what an appropriate mechanism to signal termination
19//! is. This means that changing of an execution domain must be atomic - it would be an error to
20//! read the current execution domain, continue with that domain to determine temination, and
21//! simultaneously for execution to continue possibly into a different execution domain. For
22//! example, beginning termination directly at the start of a hostcall, where sending `SIGALRM` may
23//! be appropriate, while the domain switches to `Hostcall` and is no longer appropriate for
24//! signalling, would be an error.
25//!
26//! ## Instance Lifecycle and `KillState`
27//!
28//! And now we can enumerate interleavings of execution and timeout, to see the expected state at
29//! possible points of interest in an instance's lifecycle:
30//!
31//! * `Instance created`
32//!   - terminable: `false`
33//!   - execution_domain: `Guest`
34//! * `Instance::run called`
35//!   - terminable: `true`
36//!   - execution_domain: `Guest`
37//! * `Instance::run executing`
38//!   - terminable: `true, or false`
39//!   - execution_domain: `Guest, Hostcall, or Terminated`
40//!   - `execution_domain` will only be `Guest` when executing guest code, only be `Hostcall` when
41//!   executing a hostcall, but may also be `Terminated` while in a hostcall to indicate that it
42//!   should exit when the hostcall completes.
43//!   - `terminable` will be false if and only if `execution_domain` is `Terminated`.
44//! * `Instance::run returns`
45//!   - terminable: `false`
46//!   - execution_domain: `Guest, Hostcall, or Terminated`
47//!   - `execution_domain` will be `Guest` when the initial guest function returns, `Hostcall` when
48//!   terminated by `lucet_hostcall_terminate!`, and `Terminated` when exiting due to a termination
49//!   request.
50//! * `Guest function executing`
51//!   - terminable: `true`
52//!   - execution_domain: `Guest`
53//! * `Guest function returns`
54//!   - terminable: `true`
55//!   - execution_domain: `Guest`
56//! * `Hostcall called`
57//!   - terminable: `true`
58//!   - execution_domain: `Hostcall`
59//! * `Hostcall executing`
60//!   - terminable: `true`
61//!   - execution_domain: `Hostcall, or Terminated`
62//!   - `execution_domain` will typically be `Hostcall`, but may be `Terminated` if termination of
63//!   the instance is requested during the hostcall.
64//!   - `terminable` will be false if and only if `execution_domain` is `Terminated`.
65//! * `Hostcall yields`
66//!   - This is a specific point in "Hostcall executing" and has no further semantics.
67//! * `Hostcall resumes`
68//!   - This is a specific point in "Hostcall executing" and has no further semantics.
69//! * `Hostcall returns`
70//!   - terminable: `true`
71//!   - execution_domain: `Guest`
72//!   - `execution_domain` may be `Terminated` before returning, in which case `terminable` will be
73//!   false, but the hostcall would then exit. If a hostcall successfully returns to its caller it
74//!   was not terminated, so the only state an instance will have after returning from a hostcall
75//!   will be that it's executing terminable guest code.
76
77use libc::{pthread_kill, pthread_t, SIGALRM};
78use std::mem;
79use std::sync::atomic::{AtomicBool, Ordering};
80use std::sync::{Condvar, Mutex, Weak};
81
82use crate::instance::{Instance, TerminationDetails};
83
84/// All instance state a remote kill switch needs to determine if and how to signal that execution
85/// should stop.
86///
87/// Some definitions for reference in this struct's documentation:
88/// * "stopped" means "stop executing at some point before reaching the end of the entrypoint
89/// wasm function".
90/// * "critical section" means what it typically means - an uninterruptable region of code. The
91/// detail here is that currently "critical section" and "hostcall" are interchangeable, but in
92/// the future this may change. Hostcalls may one day be able to opt out of criticalness, or
93/// perhaps guest code may include critical sections.
94///
95/// "Stopped" is a particularly loose word here because it encompasses the worst case: trying to
96/// stop a guest that is currently in a critical section. Because the signal will only be checked
97/// when exiting the critical section, the latency is bounded by whatever embedder guarantees are
98/// made. In fact, it is possible for a kill signal to be successfully sent and still never
99/// impactful, if a hostcall itself invokes `lucet_hostcall_terminate!`. In this circumstance, the
100/// hostcall would terminate the instance if it returned, but `lucet_hostcall_terminate!` will
101/// terminate the guest before the termination request would even be checked.
102pub struct KillState {
103    /// Can the instance be terminated? This must be `true` only when the instance can be stopped.
104    /// This may be false while the instance can safely be stopped, such as immediately after
105    /// completing a host->guest context swap. Regions such as this should be minimized, but are
106    /// not a problem of correctness.
107    ///
108    /// Typically, this is true while in any guest code, or hostcalls made from guest code.
109    terminable: AtomicBool,
110    /// The kind of code is currently executing in the instance this `KillState` describes.
111    ///
112    /// This allows a `KillSwitch` to determine what the appropriate signalling mechanism is in
113    /// `terminate`. Locks on `execution_domain` prohibit modification while signalling, ensuring
114    /// both that:
115    /// * we don't enter a hostcall while someone may decide it is safe to signal, and
116    /// * no one may try to signal in a hostcall-safe manner after exiting a hostcall, where it
117    ///   may never again be checked by the guest.
118    execution_domain: Mutex<Domain>,
119    /// The current `thread_id` the associated instance is running on. This is the TID where
120    /// `SIGALRM` will be sent if the instance is killed via `KillSwitch::terminate` and a signal
121    /// is an appropriate mechanism.
122    thread_id: Mutex<Option<pthread_t>>,
123    /// `tid_change_notifier` allows functions that may cause a change in `thread_id` to wait,
124    /// without spinning, for the signal to be processed.
125    tid_change_notifier: Condvar,
126}
127
128pub unsafe extern "C" fn exit_guest_region(instance: *mut Instance) {
129    let terminable = (*instance)
130        .kill_state
131        .terminable
132        .swap(false, Ordering::SeqCst);
133    if !terminable {
134        // Something else has taken the terminable flag, so it's not safe to actually exit a
135        // guest context yet. Because this is called when exiting a guest context, the
136        // termination mechanism will be a signal, delivered at some point (hopefully soon!).
137        // Further, because the termination mechanism will be a signal, we are constrained to
138        // only signal-safe behavior.
139        //
140        // For now, hang indefinitely, waiting for the sigalrm to arrive.
141
142        loop {}
143    }
144}
145
146impl KillState {
147    pub fn new() -> KillState {
148        KillState {
149            terminable: AtomicBool::new(false),
150            tid_change_notifier: Condvar::new(),
151            execution_domain: Mutex::new(Domain::Guest),
152            thread_id: Mutex::new(None),
153        }
154    }
155
156    pub fn is_terminable(&self) -> bool {
157        self.terminable.load(Ordering::SeqCst)
158    }
159
160    pub fn enable_termination(&self) {
161        self.terminable.store(true, Ordering::SeqCst);
162    }
163
164    pub fn disable_termination(&self) {
165        self.terminable.store(false, Ordering::SeqCst);
166    }
167
168    pub fn terminable_ptr(&self) -> *const AtomicBool {
169        &self.terminable as *const AtomicBool
170    }
171
172    pub fn begin_hostcall(&self) {
173        // Lock the current execution domain, so we can update to `Hostcall`.
174        let mut current_domain = self.execution_domain.lock().unwrap();
175        match *current_domain {
176            Domain::Guest => {
177                // Guest is the expected domain until this point. Switch to the Hostcall
178                // domain so we know to not interrupt this instance.
179                *current_domain = Domain::Hostcall;
180            }
181            Domain::Hostcall => {
182                panic!(
183                    "Invalid state: Instance marked as in a hostcall while entering a hostcall."
184                );
185            }
186            Domain::Terminated => {
187                panic!("Invalid state: Instance marked as terminated while in guest code. This should be an error.");
188            }
189        }
190    }
191
192    pub fn end_hostcall(&self) -> Option<TerminationDetails> {
193        let mut current_domain = self.execution_domain.lock().unwrap();
194        match *current_domain {
195            Domain::Guest => {
196                panic!("Invalid state: Instance marked as in guest code while exiting a hostcall.");
197            }
198            Domain::Hostcall => {
199                *current_domain = Domain::Guest;
200                None
201            }
202            Domain::Terminated => {
203                // The instance was stopped in the hostcall we were executing.
204                debug_assert!(!self.terminable.load(Ordering::SeqCst));
205                std::mem::drop(current_domain);
206                Some(TerminationDetails::Remote)
207            }
208        }
209    }
210
211    pub fn schedule(&self, tid: pthread_t) {
212        *self.thread_id.lock().unwrap() = Some(tid);
213        self.tid_change_notifier.notify_all();
214    }
215
216    pub fn deschedule(&self) {
217        *self.thread_id.lock().unwrap() = None;
218        self.tid_change_notifier.notify_all();
219    }
220}
221
222pub enum Domain {
223    Guest,
224    Hostcall,
225    Terminated,
226}
227
228/// An object that can be used to terminate an instance's execution from a separate thread.
229pub struct KillSwitch {
230    state: Weak<KillState>,
231}
232
233#[derive(Debug, PartialEq)]
234pub enum KillSuccess {
235    Signalled,
236    Pending,
237}
238
239#[derive(Debug, PartialEq)]
240pub enum KillError {
241    NotTerminable,
242}
243
244type KillResult = Result<KillSuccess, KillError>;
245
246impl KillSwitch {
247    pub(crate) fn new(state: Weak<KillState>) -> Self {
248        KillSwitch { state }
249    }
250
251    /// Signal the instance associated with this `KillSwitch` to stop, if possible.
252    ///
253    /// The returned `Result` only describes the behavior taken by this function, not necessarily
254    /// what caused the associated instance to stop.
255    ///
256    /// As an example, if a `KillSwitch` fires, sending a SIGALRM to an instance at the same
257    /// moment it begins handling a SIGSEGV which is determined to be fatal, the instance may
258    /// stop with `State::Faulted` before actually _handling_ the SIGALRM we'd send here. So the
259    /// host code will see `State::Faulted` as an instance state, where `KillSwitch::terminate`
260    /// would return `Ok(KillSuccess::Signalled)`.
261    pub fn terminate(&self) -> KillResult {
262        // Get the underlying kill state. If this fails, it means the instance exited and was
263        // discarded, so we can't terminate.
264        let state = self.state.upgrade().ok_or(KillError::NotTerminable)?;
265
266        // Attempt to take the flag indicating the instance may terminate
267        let terminable = state.terminable.swap(false, Ordering::SeqCst);
268        if !terminable {
269            return Err(KillError::NotTerminable);
270        }
271
272        // we got it! we can signal the instance.
273
274        // Now check what domain the instance is in. We can signal in guest code, but want
275        // to avoid signalling in host code lest we interrupt some function operating on
276        // guest/host shared memory, and invalidate invariants. For example, interrupting
277        // in the middle of a resize operation on a `Vec` could be extremely dangerous.
278        //
279        // Hold this lock through all signalling logic to prevent the instance from
280        // switching domains (and invalidating safety of whichever mechanism we choose here)
281        let mut execution_domain = state.execution_domain.lock().unwrap();
282
283        let result = match *execution_domain {
284            Domain::Guest => {
285                let mut curr_tid = state.thread_id.lock().unwrap();
286                // we're in guest code, so we can just send a signal.
287                if let Some(thread_id) = *curr_tid {
288                    unsafe {
289                        pthread_kill(thread_id, SIGALRM);
290                    }
291
292                    // wait for the SIGALRM handler to deschedule the instance
293                    //
294                    // this should never actually loop, which would indicate the instance
295                    // was moved to another thread, or we got spuriously notified.
296                    while curr_tid.is_some() {
297                        curr_tid = state.tid_change_notifier.wait(curr_tid).unwrap();
298                    }
299                    Ok(KillSuccess::Signalled)
300                } else {
301                    panic!("logic error: instance is terminable but not actually running.");
302                }
303            }
304            Domain::Hostcall => {
305                // the guest is in a hostcall, so the only thing we can do is indicate it
306                // should terminate and wait.
307                *execution_domain = Domain::Terminated;
308                Ok(KillSuccess::Pending)
309            }
310            Domain::Terminated => {
311                // Something else (another KillSwitch?) has already signalled this instance
312                // to exit when it has completed its hostcall. Nothing to do here.
313                Err(KillError::NotTerminable)
314            }
315        };
316        // explicitly drop the lock to be clear about how long we want to hold this lock, which is
317        // until all signalling is complete.
318        mem::drop(execution_domain);
319        result
320    }
321}