hw_exception/
lib.rs

1// Copyright (c) 2023 Daniel Fox Franke
2// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
3
4#![warn(missing_docs)]
5//! This crate handles POSIX signals which are triggered in response to hardware
6//! exceptions. These signals include:
7//!
8//! * `SIGILL`
9//! * `SIGFPE`
10//! * `SIGSEGV`
11//! * `SIGBUS`
12//! * `SIGTRAP`
13//!
14//! Examples of hardware exceptions which trigger them include:
15//!
16//! * Illegal instructions
17//! * General protection faults
18//! * Divide-by-zero errors
19//! * Floating point exceptions
20//! * Page faults
21//! * General protection faults
22//! * Machine check exceptions (raised, *e.g.*, on double-bit errors from ECC
23//!   memory)
24//! * Hardware breakpoints
25//!
26//! Normally, receiving any of these signals indicates either a hardware failure
27//! or certain kinds of bugs which shouldn't be possible in safe Rust code. When
28//! they're received unexpectedly, the only sensible way to proceed is to abort
29//! the process and dump core, which is exactly what would normally happen.
30//! However, many use cases exist where such signals are expected, and recovery
31//! is possible. Here are just a few:
32//!
33//! * **Stop-and-copy garbage collectors**. Certain garbage-collection
34//!   techniques routinely trigger segmentation faults. The signal handler can
35//!   map a valid page into the faulting address and then execution can resume
36//!   where it left off. (Consider the
37//!   [`userfaultfd`](https://docs.rs/userfaultfd) crate as an alternative for
38//!   this and similar use cases.)
39//! * **Sharing memory with untrusted peers**. Writers to a shared memory
40//!   segment can do various unfriendly things, such as truncating it
41//!   unexpectedly, which will cause other processes accessing the segment to
42//!   get a `SIGBUS`, which they can't guard against without running into TOCTOU
43//!   problems. Victims of such behavior can catch the signal and jump back to a
44//!   recovery point. (Consider the [`memfd`](https://docs.rs/memfd) crate as an
45//!   alternative to avoid such complications.)
46//! * **Fancy numerical stuff**. Sometimes it's more efficient to let a
47//!   divide-by-zero or a floating point exception occur than it is to check
48//!   every operation which might trigger it.
49//! * **Robust storage layers**. As the size of disk or memory approaches
50//!   infinity, the probability of a hardware error approaches one. Catching
51//!   machine check exceptions makes it possible to handle such failures
52//!   robustly by switching to redundant storage or by tolerating small amounts
53//!   of data loss.
54//! * **Debuggers**, which will get a `SIGTRAP` upon hitting a breakpoint
55//!   they've set.
56//! 
57//! Hardware exceptions are generally handled in one of three ways; this crate
58//! supports all of them to varying degrees. They are:
59//! 
60//! * **Patch and continue**: Fix the problem from within the signal handler,
61//!   and then return from it to re-execute the excepting instruction. For
62//!   example, by mapping a valid page to correct a segmentation fault.
63//! * **Catch and recover**: Use `setjmp` to store a recovery point, and then
64//!   `longjmp` back to it from the signal handler.
65//! * **Scream and die**: Don't attempt to recover from the exception at all;
66//!   just use the signal handler to log some diagnostics before aborting.
67//! 
68//! In all three cases, your first entrypoint into this crate will be the
69//! [`register_hook`] function, to which you will provide a callback which
70//! receives [`ExceptionInfo`]. For catch-and-recover, you will wrap the
71//! potentially-excepting block using [`catch`] and have the hook that you
72//! registered call [`throw`].
73//! 
74//! Hardware exceptions are synchronous signals, which means that the usual
75//! cautions about ["signal
76//! safety"](https://man7.org/linux/man-pages/man7/signal-safety.7.html) don't
77//! apply; you can safely allocate memory, for example. However, do be cautious
78//! that signal handlers run an alternate stack which is usually much smaller
79//! than the main one, typically 8 KiB, and is easy to overflow by accident. If
80//! you find that a `SIGSEGV` hook is mysteriously hanging, it may be that the
81//! hook is itself segfaulting due to a stack overflow, resulting in an infinite
82//! loop. [`Backtrace`](std::backtrace::Backtrace)'s
83//! [`Display`](std::fmt::Display) implementation seems to be particularly
84//! stack-hungry, so printing a backtrace from a signal handler is likely to
85//! lead to an overflow. Two good ways to get around this are either to
86//! [`throw`] the [`Backtrace`](std::backtrace::Backtrace) and print it upon
87//! returning from [`catch`], or to spawn a thread from the signal handler and
88//! do your work from the child thread.
89//! 
90//! # Example
91//! 
92//! The following example triggers a segmentation fault by dereferencing a null
93//! pointer, catches and recovers from it, and then prints a backtrace showing
94//! where the segfault occurred.
95//! 
96//! ```
97//!   use hw_exception::*;
98//!   use std::backtrace::Backtrace;
99//! 
100//!   unsafe {
101//!     // Register a hook for SIGSEGV, which captures and throws a backtrace.
102//!     register_hook(&[Signo::SIGSEGV], |e| {
103//!       let bt = Backtrace::force_capture();
104//!       throw((e, bt))
105//!     });
106//!   }
107//! 
108//!   // Dereference a null pointer from within a `catch` block. Using `read_volatile`
109//!   // prevents this from being UB.
110//!   let result = catch(|| unsafe {
111//!      std::ptr::null::<usize>().read_volatile()
112//!   });
113//! 
114//!   // Assert that this block resulted in an exception, and extract it.
115//!   let e = result.expect_err("dereferencing a null pointer should have segfaulted, but gave");
116//! 
117//!   // Extract and print the backtrace
118//!   let bt : &Backtrace = e
119//!      .additional()
120//!      .expect("thrown exception info should have included additional data")
121//!      .downcast_ref()
122//!      .expect("additional data should have been a `Backtrace`");
123//!   println!("{}", bt);
124//! ```
125
126
127mod cdecls;
128mod signals;
129
130pub use cdecls::*;
131pub use signals::*;
132use std::any::Any;
133use std::cell::UnsafeCell;
134use std::ffi::{c_int, c_void};
135use std::mem::ManuallyDrop;
136use std::panic::UnwindSafe;
137use std::ptr::{addr_of, addr_of_mut};
138use std::sync::{Mutex, MutexGuard};
139
140
141/// A boxed closure which can be invoked in response to an exception being raised.
142pub type DynExceptionHook = Box<dyn (Fn(ExceptionInfo<'_>) -> bool) + Send + Sync + 'static>;
143
144#[derive(Debug, Copy, Clone)]
145struct ExceptionHookRaw(*const (dyn (Fn(ExceptionInfo<'_>) -> bool) + Send + Sync + 'static));
146
147unsafe impl Send for ExceptionHookRaw {}
148unsafe impl Sync for ExceptionHookRaw {}
149
150/// A handle to a registered exception hook.
151#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
152pub struct ExceptionHookId(u128); //Make this huge so we never have to think about overflow.
153
154static ID_COUNTER: Mutex<u128> = Mutex::new(0);
155
156static SIGILL_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
157static SIGFPE_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
158static SIGSEGV_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
159static SIGBUS_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
160static SIGTRAP_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
161
162static SIGILL_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
163static SIGFPE_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
164static SIGSEGV_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
165static SIGBUS_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
166static SIGTRAP_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
167
168thread_local!(static JMP_BUF_PTR: UnsafeCell<*mut c_void> = UnsafeCell::new(std::ptr::null_mut()));
169thread_local!(static ERR_BUF_PTR: UnsafeCell<*mut c_void> = UnsafeCell::new(std::ptr::null_mut()));
170
171extern "C" fn run_hooks(
172    signo_raw: c_int,
173    siginfo: *mut libc::siginfo_t,
174    context: *mut c_void,
175) -> bool {
176    unsafe {
177        let signo = if let Ok(signal) = Signal::from_raw(signo_raw, (*siginfo).si_code) {
178            signal.signo()
179        } else {
180            return false;
181        };
182
183        let hooks: Vec<_> = lookup_hooks(signo)
184            .iter()
185            .rev()
186            .map(|(_, raw)| raw.0)
187            .collect();
188        for hook in hooks {
189            let exception_info = ExceptionInfo::new(signo_raw, siginfo, context);
190            let f = &*hook;
191            if f(exception_info) {
192                return true;
193            }
194        }
195
196        false
197    }
198}
199
200extern "C" fn handler(signo_raw: c_int, siginfo: *mut libc::siginfo_t, context: *mut c_void) {
201    if run_hooks(signo_raw, siginfo, context) {
202        return;
203    }
204
205    let signo = match signo_raw {
206        libc::SIGILL => Signo::SIGILL,
207        libc::SIGFPE => Signo::SIGFPE,
208        libc::SIGSEGV => Signo::SIGSEGV,
209        libc::SIGBUS => Signo::SIGBUS,
210        libc::SIGTRAP => Signo::SIGTRAP,
211        _ => return,
212    };
213
214    if let Some(old_action) = *lookup_oldaction(signo) {
215        if old_action.sa_sigaction == libc::SIG_DFL {
216            unsafe {
217                if libc::signal(signo_raw, libc::SIG_DFL) == libc::SIG_ERR {
218                    std::process::abort();
219                }
220            }
221        } else if old_action.sa_sigaction == libc::SIG_IGN {
222        } else if old_action.sa_flags & libc::SA_SIGINFO != 0 {
223            let f: extern "C" fn(c_int, *mut libc::siginfo_t, context: *mut c_void) =
224                unsafe { std::mem::transmute(old_action.sa_sigaction) };
225            f(signo_raw, siginfo, context);
226        } else {
227            let f: extern "C" fn(c_int) = unsafe { std::mem::transmute(old_action.sa_sigaction) };
228            f(signo_raw);
229        }
230    }
231}
232
233fn lookup_oldaction(signo: Signo) -> MutexGuard<'static, Option<libc::sigaction>> {
234    match signo {
235        Signo::SIGILL => SIGILL_OLDACTION.lock().unwrap(),
236        Signo::SIGFPE => SIGFPE_OLDACTION.lock().unwrap(),
237        Signo::SIGSEGV => SIGSEGV_OLDACTION.lock().unwrap(),
238        Signo::SIGBUS => SIGBUS_OLDACTION.lock().unwrap(),
239        Signo::SIGTRAP => SIGTRAP_OLDACTION.lock().unwrap(),
240    }
241}
242
243fn lookup_hooks(signo: Signo) -> MutexGuard<'static, Vec<(ExceptionHookId, ExceptionHookRaw)>> {
244    match signo {
245        Signo::SIGILL => SIGILL_HOOKS.lock().unwrap(),
246        Signo::SIGFPE => SIGFPE_HOOKS.lock().unwrap(),
247        Signo::SIGSEGV => SIGSEGV_HOOKS.lock().unwrap(),
248        Signo::SIGBUS => SIGBUS_HOOKS.lock().unwrap(),
249        Signo::SIGTRAP => SIGTRAP_HOOKS.lock().unwrap(),
250    }
251}
252
253fn register_handler(signo: Signo) {
254    let mut old_action_mutex = lookup_oldaction(signo);
255
256    if old_action_mutex.is_some() {
257        return;
258    }
259
260    unsafe {
261        let mut sigset: libc::sigset_t = std::mem::zeroed();
262        assert!(libc::sigemptyset(&mut sigset) == 0);
263        let mut action: libc::sigaction = std::mem::zeroed();
264
265        let handler_ptr = handler as extern "C" fn(c_int, *mut libc::siginfo_t, *mut c_void);
266        action.sa_mask = sigset;
267        action.sa_sigaction = handler_ptr as usize;
268        action.sa_flags = libc::SA_NODEFER | libc::SA_ONSTACK | libc::SA_SIGINFO;
269
270        let old_action = old_action_mutex.insert(std::mem::zeroed());
271
272        assert!(libc::sigaction(signo.into(), &action, old_action) == 0);
273    }
274}
275
276fn unregister_handler(signo: Signo) {
277    let mut old_handler = lookup_oldaction(signo);
278
279    let action = match old_handler.take() {
280        Some(action) => action,
281        None => return,
282    };
283
284    unsafe {
285        assert!(libc::sigaction(signo.into(), &action, std::ptr::null_mut()) == 0);
286    }
287}
288
289/// Registers a exception hook.
290/// 
291/// The hook will be invoked from a signal handler when any of the specified
292/// signals are raised.
293///
294/// If multiple exception hooks are registered for one signal, they will be
295/// invoked in reverse order of registration until one of them returns `true`.
296/// If all hooks return `false`, whatever signal action was installed prior to
297/// any hooks being registered will be taken. In the case of the prior signal
298/// action being the default signal handler, this is implemented by reinstalling
299/// `SIG_DFL` and then returning, which should re-trigger the exception and dump
300/// core.
301///
302/// Hooks will be called only when the signal's subtype code indicates that it
303/// was triggered synchronously by a hardware exception. In particular, hooks
304/// will *not* run for any signal which was delivered using `kill(2)`. Hooks
305/// therefore do not need to worry about async-signal safety and can do things
306/// like allocate memory without risking deadlocks.
307///
308/// The returned [`ExceptionHookId`] can later be used to unregister the hook.
309///
310/// # Safety
311/// *In isolation*, this call is always safe. The function is declared as
312/// `unsafe` in order to simplify reasoning about the soundness of unsafe code
313/// which potentially triggers exceptions. Making the registration of exception
314/// hooks an unsafe operation means that unsafe code can rely on knowing exactly
315/// what will happen when an exception occurs, without having to account for the
316/// possibility that untrusted safe code may have installed a rogue exception
317/// hook.
318pub unsafe fn register_hook<S, H>(signals: &S, hook: H) -> ExceptionHookId
319where
320    S: AsRef<[Signo]> + ?Sized,
321    H: for<'a> Fn(ExceptionInfo<'a>) -> bool + Send + Sync + 'static,
322{
323    let mut counter = ID_COUNTER.lock().unwrap();
324    let id = ExceptionHookId(*counter);
325    *counter += 1;
326    std::mem::drop(counter);
327
328    let raw = ExceptionHookRaw(Box::into_raw(hook.into()));
329
330    for signo in signals.as_ref().iter().copied() {
331        let mut hooks = lookup_hooks(signo);
332        hooks.push((id, raw));
333        register_handler(signo);
334    }
335
336    id
337}
338
339/// Unregisters an exception hook.
340///
341/// Returns the exception hook, if found. Calling `unregister_exception_hook`
342/// multiple times with the same hook id will result in subsequent calls
343/// returning `None`.
344///
345/// # Safety
346/// *In isolation*, this call is always safe. The function is declared as
347/// `unsafe` in order to simplify reasoning about the soundness of unsafe code
348/// which potentially triggers exceptions. Making unregistration of exception
349/// hooks an unsafe operation means that unsafe code can rely on knowing exactly
350/// what will happen when a exception occurs, without having to account for the
351/// possibility that untrusted safe code may have uninstalled an exception hook
352/// it was relying on.
353pub unsafe fn unregister_hook(id: ExceptionHookId) -> Option<DynExceptionHook> {
354    let mut maybe_raw = None;
355
356    for signo in Signo::all().iter().copied() {
357        let mut hooks = lookup_hooks(signo);
358
359        if let Ok(index) = hooks.binary_search_by(|probe| probe.0.cmp(&id)) {
360            maybe_raw = Some(hooks.remove(index).1);
361            if hooks.is_empty() {
362                unregister_handler(signo);
363            }
364        }
365    }
366
367    maybe_raw.map(|raw| Box::from_raw(raw.0.cast_mut()))
368}
369
370/// Throws an exception which can be caught by [`catch`].
371///
372/// If there is no `catch` invocation anywhere on the stack, this
373/// functions returns `false`; otherwise, it does not return. It will never
374/// return `true`; its return type is `bool` rather than `()` just to make it
375/// more ergonomic to use as the final statement of an exception hook.
376pub fn throw<F: Into<ExtExceptionInfo>>(exception: F) -> bool {
377    let extinfo: ExtExceptionInfo = exception.into();
378    JMP_BUF_PTR.with(|jmp_buf| {
379        ERR_BUF_PTR.with(|err_buf| {
380            unsafe {
381                hwexception_throw(
382                    addr_of!(extinfo).cast(),
383                    std::mem::size_of_val(&extinfo),
384                    jmp_buf.get(),
385                    err_buf.get(),
386                )
387            }
388            false
389        })
390    })
391}
392
393/// Catches an exception raised by [`throw`].
394///
395/// Runs `block`. If an exception is thrown during its execution, returns boxed
396/// exception details. Otherwise, returns the return value of the block.
397///
398/// Internally, this function sets up a `setjmp` buffer, and [`throw`] performs
399/// a `longjmp` back to it. No unwinding occurs: drop methods for any objects
400/// created in the dynamic scope of the callback will not run. So, be wary of
401/// resource leaks.
402///
403/// Despite this function's similar type signature to
404/// [`std::panic::catch_unwind`], exceptions and panics are distinct. This
405/// function will not catch panics and `catch_unwind` will not catch exceptions.
406/// If `block` panics, the panic will continue to propagate out from this call.
407///
408/// This function will not catch anything that was not thrown by [`throw`], so a
409/// signal being raised during the execution of `block` will not automatically
410/// result in it being caught. You first need to use [`register_hook`] to
411/// register a hook which calls `throw`.
412pub fn catch<F, R>(block: F) -> Result<R, ExtExceptionInfo>
413where
414    F: FnOnce() -> R + UnwindSafe,
415{
416    // Okay, so here our control flow is spaghetti embedded in five-dimensional
417    // non-Euclidean space. We're going to be calling into the C function
418    // hwexception_catch. hwexception_catch *has* to be written in C, because it
419    // calls setjmp. setjmp returns twice, which is something Rust fundamentally
420    // can't cope with. But hwexception_catch only returns once, so it's okay
421    // for us to call into it from Rust code.
422    //
423    // hwexception_catch takes five arguments:
424    // 1. A pointer to a (C ABI) callback function which accepts a void* context
425    //    argument.
426    // 2. A context argument to pass to the callback.
427    // 3. A pointer to a buffer where `hwexcept_throw` can record exception
428    //    info. Specifically, this buffer will be holding an ExtExceptionInfo.
429    // 4. A pointer to thread-local storage which can be used to stash a pointer
430    //    to a setjmp buffer.
431    // 5. A pointer to thread-local storage which can be used to stash a copy of
432    //    the pointer from argument 3.
433    //
434    // hwexception_catch is going to set up a jump buffer, store the two
435    // pointers in the TLS cells we've given it, call the callback, restore the
436    // TLS cells to their original values, and then return 0 if no exception was
437    // thrown or nonzero otherwise. Whatever's stored in the TLS cells will be
438    // looked up by our throw function.
439    //
440    // The job of this function is to wrap up the Rust-ABI closure `block` into
441    // something callable from C. In addition to the obvious pointer-wrangling,
442    // this also necessitates dealing with panics. Unwinding from a panic will
443    // cause an immediate process abort if the unwind crosses a non-Rust frame,
444    // and we don't want to happen. So, panics that come out of `block` need get
445    // caught before they hit any C frame, and then rethrown once
446    // `hwexcept_throw` has returned and we're safely back in Rust-land.
447
448
449    // Before we call into hwexcept_throw, this union will hold `block` in its
450    // `call` member. After hwexcept_throw returns, `call` (an FnOnce) will have
451    // been consumed, and this union will be in one of three states:
452    // 1. If the block returned normally, this union's `noexception` member will
453    //    hold an Ok() of its return value.
454    // 2. If the block panicked, this union's `noexception` member will hold an
455    //    Err() of the panic payload.
456    // 3. If the block raised an exception, this union will hold a Box<dyn
457    //    Exception> in its `exception` member.
458    // We'll be using a pointer to a result buffer as the context argument to
459    // our callback.
460    union ResultBuffer<F, R> {
461        call: ManuallyDrop<F>,
462        noexception: ManuallyDrop<Result<R, Box<dyn Any + Send>>>,
463        exception: ManuallyDrop<ExtExceptionInfo>,
464    }
465
466    // This is the callback we'll passing as our first argument to hwexcept_throw.
467    unsafe extern "C" fn callback<F, R>(ctx: *mut c_void)
468    where
469        F: FnOnce() -> R + UnwindSafe,
470    {
471        let ctx: &mut ResultBuffer<F, R> = &mut *ctx.cast();
472        // Cast the void* context argument back into its real type.
473        let f = ManuallyDrop::take(&mut ctx.call);
474        // Call it, and if it panics, catch it.
475        let result = std::panic::catch_unwind(f);
476        // If we reach this line, no exception was raised. Write the result into
477        // the `noexception` member of the ResultBuffer.
478        ctx.noexception = ManuallyDrop::new(result);
479    }
480
481    // Create the result buffer in its initial state with its `call` member
482    // active.
483    let mut result_buffer = ResultBuffer::<F, R> {
484        call: ManuallyDrop::new(block),
485    };
486
487    unsafe {
488        JMP_BUF_PTR.with(|jmp_buf| {
489            ERR_BUF_PTR.with(|err_buf| {
490                if hwexception_catch(
491                    callback::<F, R>,
492                    addr_of_mut!(result_buffer).cast(),
493                    addr_of_mut!(result_buffer.exception).cast(),
494                    jmp_buf.get(),
495                    err_buf.get(),
496                ) == 0
497                {
498                    // A zero return means that no exception was raised, so we reached
499                    // the end of the callback and now the `noexception` member contains
500                    // what the callback stored into it, so extract it. If it's a normal
501                    // return, return it. If it's a panic, resume panicking.
502                    match ManuallyDrop::into_inner(result_buffer.noexception) {
503                        Ok(success) => Ok(success),
504                        Err(payload) => std::panic::resume_unwind(payload),
505                    }
506                } else {
507                    // A non-zero return means that an exception was raised, and now the
508                    // `exception` member contains what the throw function put there.
509                    // Extract this and return it.
510                    Err(ManuallyDrop::into_inner(result_buffer.exception))
511                }
512            })
513        })
514    }
515}