hw_exception/lib.rs
1// Copyright (c) 2023 Daniel Fox Franke
2// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
3
4#![warn(missing_docs)]
5//! This crate handles POSIX signals which are triggered in response to hardware
6//! exceptions. These signals include:
7//!
8//! * `SIGILL`
9//! * `SIGFPE`
10//! * `SIGSEGV`
11//! * `SIGBUS`
12//! * `SIGTRAP`
13//!
14//! Examples of hardware exceptions which trigger them include:
15//!
16//! * Illegal instructions
17//! * General protection faults
18//! * Divide-by-zero errors
19//! * Floating point exceptions
20//! * Page faults
21//! * General protection faults
22//! * Machine check exceptions (raised, *e.g.*, on double-bit errors from ECC
23//! memory)
24//! * Hardware breakpoints
25//!
26//! Normally, receiving any of these signals indicates either a hardware failure
27//! or certain kinds of bugs which shouldn't be possible in safe Rust code. When
28//! they're received unexpectedly, the only sensible way to proceed is to abort
29//! the process and dump core, which is exactly what would normally happen.
30//! However, many use cases exist where such signals are expected, and recovery
31//! is possible. Here are just a few:
32//!
33//! * **Stop-and-copy garbage collectors**. Certain garbage-collection
34//! techniques routinely trigger segmentation faults. The signal handler can
35//! map a valid page into the faulting address and then execution can resume
36//! where it left off. (Consider the
37//! [`userfaultfd`](https://docs.rs/userfaultfd) crate as an alternative for
38//! this and similar use cases.)
39//! * **Sharing memory with untrusted peers**. Writers to a shared memory
40//! segment can do various unfriendly things, such as truncating it
41//! unexpectedly, which will cause other processes accessing the segment to
42//! get a `SIGBUS`, which they can't guard against without running into TOCTOU
43//! problems. Victims of such behavior can catch the signal and jump back to a
44//! recovery point. (Consider the [`memfd`](https://docs.rs/memfd) crate as an
45//! alternative to avoid such complications.)
46//! * **Fancy numerical stuff**. Sometimes it's more efficient to let a
47//! divide-by-zero or a floating point exception occur than it is to check
48//! every operation which might trigger it.
49//! * **Robust storage layers**. As the size of disk or memory approaches
50//! infinity, the probability of a hardware error approaches one. Catching
51//! machine check exceptions makes it possible to handle such failures
52//! robustly by switching to redundant storage or by tolerating small amounts
53//! of data loss.
54//! * **Debuggers**, which will get a `SIGTRAP` upon hitting a breakpoint
55//! they've set.
56//!
57//! Hardware exceptions are generally handled in one of three ways; this crate
58//! supports all of them to varying degrees. They are:
59//!
60//! * **Patch and continue**: Fix the problem from within the signal handler,
61//! and then return from it to re-execute the excepting instruction. For
62//! example, by mapping a valid page to correct a segmentation fault.
63//! * **Catch and recover**: Use `setjmp` to store a recovery point, and then
64//! `longjmp` back to it from the signal handler.
65//! * **Scream and die**: Don't attempt to recover from the exception at all;
66//! just use the signal handler to log some diagnostics before aborting.
67//!
68//! In all three cases, your first entrypoint into this crate will be the
69//! [`register_hook`] function, to which you will provide a callback which
70//! receives [`ExceptionInfo`]. For catch-and-recover, you will wrap the
71//! potentially-excepting block using [`catch`] and have the hook that you
72//! registered call [`throw`].
73//!
74//! Hardware exceptions are synchronous signals, which means that the usual
75//! cautions about ["signal
76//! safety"](https://man7.org/linux/man-pages/man7/signal-safety.7.html) don't
77//! apply; you can safely allocate memory, for example. However, do be cautious
78//! that signal handlers run an alternate stack which is usually much smaller
79//! than the main one, typically 8 KiB, and is easy to overflow by accident. If
80//! you find that a `SIGSEGV` hook is mysteriously hanging, it may be that the
81//! hook is itself segfaulting due to a stack overflow, resulting in an infinite
82//! loop. [`Backtrace`](std::backtrace::Backtrace)'s
83//! [`Display`](std::fmt::Display) implementation seems to be particularly
84//! stack-hungry, so printing a backtrace from a signal handler is likely to
85//! lead to an overflow. Two good ways to get around this are either to
86//! [`throw`] the [`Backtrace`](std::backtrace::Backtrace) and print it upon
87//! returning from [`catch`], or to spawn a thread from the signal handler and
88//! do your work from the child thread.
89//!
90//! # Example
91//!
92//! The following example triggers a segmentation fault by dereferencing a null
93//! pointer, catches and recovers from it, and then prints a backtrace showing
94//! where the segfault occurred.
95//!
96//! ```
97//! use hw_exception::*;
98//! use std::backtrace::Backtrace;
99//!
100//! unsafe {
101//! // Register a hook for SIGSEGV, which captures and throws a backtrace.
102//! register_hook(&[Signo::SIGSEGV], |e| {
103//! let bt = Backtrace::force_capture();
104//! throw((e, bt))
105//! });
106//! }
107//!
108//! // Dereference a null pointer from within a `catch` block. Using `read_volatile`
109//! // prevents this from being UB.
110//! let result = catch(|| unsafe {
111//! std::ptr::null::<usize>().read_volatile()
112//! });
113//!
114//! // Assert that this block resulted in an exception, and extract it.
115//! let e = result.expect_err("dereferencing a null pointer should have segfaulted, but gave");
116//!
117//! // Extract and print the backtrace
118//! let bt : &Backtrace = e
119//! .additional()
120//! .expect("thrown exception info should have included additional data")
121//! .downcast_ref()
122//! .expect("additional data should have been a `Backtrace`");
123//! println!("{}", bt);
124//! ```
125
126
127mod cdecls;
128mod signals;
129
130pub use cdecls::*;
131pub use signals::*;
132use std::any::Any;
133use std::cell::UnsafeCell;
134use std::ffi::{c_int, c_void};
135use std::mem::ManuallyDrop;
136use std::panic::UnwindSafe;
137use std::ptr::{addr_of, addr_of_mut};
138use std::sync::{Mutex, MutexGuard};
139
140
141/// A boxed closure which can be invoked in response to an exception being raised.
142pub type DynExceptionHook = Box<dyn (Fn(ExceptionInfo<'_>) -> bool) + Send + Sync + 'static>;
143
144#[derive(Debug, Copy, Clone)]
145struct ExceptionHookRaw(*const (dyn (Fn(ExceptionInfo<'_>) -> bool) + Send + Sync + 'static));
146
147unsafe impl Send for ExceptionHookRaw {}
148unsafe impl Sync for ExceptionHookRaw {}
149
150/// A handle to a registered exception hook.
151#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
152pub struct ExceptionHookId(u128); //Make this huge so we never have to think about overflow.
153
154static ID_COUNTER: Mutex<u128> = Mutex::new(0);
155
156static SIGILL_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
157static SIGFPE_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
158static SIGSEGV_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
159static SIGBUS_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
160static SIGTRAP_HOOKS: Mutex<Vec<(ExceptionHookId, ExceptionHookRaw)>> = Mutex::new(Vec::new());
161
162static SIGILL_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
163static SIGFPE_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
164static SIGSEGV_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
165static SIGBUS_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
166static SIGTRAP_OLDACTION: Mutex<Option<libc::sigaction>> = Mutex::new(None);
167
168thread_local!(static JMP_BUF_PTR: UnsafeCell<*mut c_void> = UnsafeCell::new(std::ptr::null_mut()));
169thread_local!(static ERR_BUF_PTR: UnsafeCell<*mut c_void> = UnsafeCell::new(std::ptr::null_mut()));
170
171extern "C" fn run_hooks(
172 signo_raw: c_int,
173 siginfo: *mut libc::siginfo_t,
174 context: *mut c_void,
175) -> bool {
176 unsafe {
177 let signo = if let Ok(signal) = Signal::from_raw(signo_raw, (*siginfo).si_code) {
178 signal.signo()
179 } else {
180 return false;
181 };
182
183 let hooks: Vec<_> = lookup_hooks(signo)
184 .iter()
185 .rev()
186 .map(|(_, raw)| raw.0)
187 .collect();
188 for hook in hooks {
189 let exception_info = ExceptionInfo::new(signo_raw, siginfo, context);
190 let f = &*hook;
191 if f(exception_info) {
192 return true;
193 }
194 }
195
196 false
197 }
198}
199
200extern "C" fn handler(signo_raw: c_int, siginfo: *mut libc::siginfo_t, context: *mut c_void) {
201 if run_hooks(signo_raw, siginfo, context) {
202 return;
203 }
204
205 let signo = match signo_raw {
206 libc::SIGILL => Signo::SIGILL,
207 libc::SIGFPE => Signo::SIGFPE,
208 libc::SIGSEGV => Signo::SIGSEGV,
209 libc::SIGBUS => Signo::SIGBUS,
210 libc::SIGTRAP => Signo::SIGTRAP,
211 _ => return,
212 };
213
214 if let Some(old_action) = *lookup_oldaction(signo) {
215 if old_action.sa_sigaction == libc::SIG_DFL {
216 unsafe {
217 if libc::signal(signo_raw, libc::SIG_DFL) == libc::SIG_ERR {
218 std::process::abort();
219 }
220 }
221 } else if old_action.sa_sigaction == libc::SIG_IGN {
222 } else if old_action.sa_flags & libc::SA_SIGINFO != 0 {
223 let f: extern "C" fn(c_int, *mut libc::siginfo_t, context: *mut c_void) =
224 unsafe { std::mem::transmute(old_action.sa_sigaction) };
225 f(signo_raw, siginfo, context);
226 } else {
227 let f: extern "C" fn(c_int) = unsafe { std::mem::transmute(old_action.sa_sigaction) };
228 f(signo_raw);
229 }
230 }
231}
232
233fn lookup_oldaction(signo: Signo) -> MutexGuard<'static, Option<libc::sigaction>> {
234 match signo {
235 Signo::SIGILL => SIGILL_OLDACTION.lock().unwrap(),
236 Signo::SIGFPE => SIGFPE_OLDACTION.lock().unwrap(),
237 Signo::SIGSEGV => SIGSEGV_OLDACTION.lock().unwrap(),
238 Signo::SIGBUS => SIGBUS_OLDACTION.lock().unwrap(),
239 Signo::SIGTRAP => SIGTRAP_OLDACTION.lock().unwrap(),
240 }
241}
242
243fn lookup_hooks(signo: Signo) -> MutexGuard<'static, Vec<(ExceptionHookId, ExceptionHookRaw)>> {
244 match signo {
245 Signo::SIGILL => SIGILL_HOOKS.lock().unwrap(),
246 Signo::SIGFPE => SIGFPE_HOOKS.lock().unwrap(),
247 Signo::SIGSEGV => SIGSEGV_HOOKS.lock().unwrap(),
248 Signo::SIGBUS => SIGBUS_HOOKS.lock().unwrap(),
249 Signo::SIGTRAP => SIGTRAP_HOOKS.lock().unwrap(),
250 }
251}
252
253fn register_handler(signo: Signo) {
254 let mut old_action_mutex = lookup_oldaction(signo);
255
256 if old_action_mutex.is_some() {
257 return;
258 }
259
260 unsafe {
261 let mut sigset: libc::sigset_t = std::mem::zeroed();
262 assert!(libc::sigemptyset(&mut sigset) == 0);
263 let mut action: libc::sigaction = std::mem::zeroed();
264
265 let handler_ptr = handler as extern "C" fn(c_int, *mut libc::siginfo_t, *mut c_void);
266 action.sa_mask = sigset;
267 action.sa_sigaction = handler_ptr as usize;
268 action.sa_flags = libc::SA_NODEFER | libc::SA_ONSTACK | libc::SA_SIGINFO;
269
270 let old_action = old_action_mutex.insert(std::mem::zeroed());
271
272 assert!(libc::sigaction(signo.into(), &action, old_action) == 0);
273 }
274}
275
276fn unregister_handler(signo: Signo) {
277 let mut old_handler = lookup_oldaction(signo);
278
279 let action = match old_handler.take() {
280 Some(action) => action,
281 None => return,
282 };
283
284 unsafe {
285 assert!(libc::sigaction(signo.into(), &action, std::ptr::null_mut()) == 0);
286 }
287}
288
289/// Registers a exception hook.
290///
291/// The hook will be invoked from a signal handler when any of the specified
292/// signals are raised.
293///
294/// If multiple exception hooks are registered for one signal, they will be
295/// invoked in reverse order of registration until one of them returns `true`.
296/// If all hooks return `false`, whatever signal action was installed prior to
297/// any hooks being registered will be taken. In the case of the prior signal
298/// action being the default signal handler, this is implemented by reinstalling
299/// `SIG_DFL` and then returning, which should re-trigger the exception and dump
300/// core.
301///
302/// Hooks will be called only when the signal's subtype code indicates that it
303/// was triggered synchronously by a hardware exception. In particular, hooks
304/// will *not* run for any signal which was delivered using `kill(2)`. Hooks
305/// therefore do not need to worry about async-signal safety and can do things
306/// like allocate memory without risking deadlocks.
307///
308/// The returned [`ExceptionHookId`] can later be used to unregister the hook.
309///
310/// # Safety
311/// *In isolation*, this call is always safe. The function is declared as
312/// `unsafe` in order to simplify reasoning about the soundness of unsafe code
313/// which potentially triggers exceptions. Making the registration of exception
314/// hooks an unsafe operation means that unsafe code can rely on knowing exactly
315/// what will happen when an exception occurs, without having to account for the
316/// possibility that untrusted safe code may have installed a rogue exception
317/// hook.
318pub unsafe fn register_hook<S, H>(signals: &S, hook: H) -> ExceptionHookId
319where
320 S: AsRef<[Signo]> + ?Sized,
321 H: for<'a> Fn(ExceptionInfo<'a>) -> bool + Send + Sync + 'static,
322{
323 let mut counter = ID_COUNTER.lock().unwrap();
324 let id = ExceptionHookId(*counter);
325 *counter += 1;
326 std::mem::drop(counter);
327
328 let raw = ExceptionHookRaw(Box::into_raw(hook.into()));
329
330 for signo in signals.as_ref().iter().copied() {
331 let mut hooks = lookup_hooks(signo);
332 hooks.push((id, raw));
333 register_handler(signo);
334 }
335
336 id
337}
338
339/// Unregisters an exception hook.
340///
341/// Returns the exception hook, if found. Calling `unregister_exception_hook`
342/// multiple times with the same hook id will result in subsequent calls
343/// returning `None`.
344///
345/// # Safety
346/// *In isolation*, this call is always safe. The function is declared as
347/// `unsafe` in order to simplify reasoning about the soundness of unsafe code
348/// which potentially triggers exceptions. Making unregistration of exception
349/// hooks an unsafe operation means that unsafe code can rely on knowing exactly
350/// what will happen when a exception occurs, without having to account for the
351/// possibility that untrusted safe code may have uninstalled an exception hook
352/// it was relying on.
353pub unsafe fn unregister_hook(id: ExceptionHookId) -> Option<DynExceptionHook> {
354 let mut maybe_raw = None;
355
356 for signo in Signo::all().iter().copied() {
357 let mut hooks = lookup_hooks(signo);
358
359 if let Ok(index) = hooks.binary_search_by(|probe| probe.0.cmp(&id)) {
360 maybe_raw = Some(hooks.remove(index).1);
361 if hooks.is_empty() {
362 unregister_handler(signo);
363 }
364 }
365 }
366
367 maybe_raw.map(|raw| Box::from_raw(raw.0.cast_mut()))
368}
369
370/// Throws an exception which can be caught by [`catch`].
371///
372/// If there is no `catch` invocation anywhere on the stack, this
373/// functions returns `false`; otherwise, it does not return. It will never
374/// return `true`; its return type is `bool` rather than `()` just to make it
375/// more ergonomic to use as the final statement of an exception hook.
376pub fn throw<F: Into<ExtExceptionInfo>>(exception: F) -> bool {
377 let extinfo: ExtExceptionInfo = exception.into();
378 JMP_BUF_PTR.with(|jmp_buf| {
379 ERR_BUF_PTR.with(|err_buf| {
380 unsafe {
381 hwexception_throw(
382 addr_of!(extinfo).cast(),
383 std::mem::size_of_val(&extinfo),
384 jmp_buf.get(),
385 err_buf.get(),
386 )
387 }
388 false
389 })
390 })
391}
392
393/// Catches an exception raised by [`throw`].
394///
395/// Runs `block`. If an exception is thrown during its execution, returns boxed
396/// exception details. Otherwise, returns the return value of the block.
397///
398/// Internally, this function sets up a `setjmp` buffer, and [`throw`] performs
399/// a `longjmp` back to it. No unwinding occurs: drop methods for any objects
400/// created in the dynamic scope of the callback will not run. So, be wary of
401/// resource leaks.
402///
403/// Despite this function's similar type signature to
404/// [`std::panic::catch_unwind`], exceptions and panics are distinct. This
405/// function will not catch panics and `catch_unwind` will not catch exceptions.
406/// If `block` panics, the panic will continue to propagate out from this call.
407///
408/// This function will not catch anything that was not thrown by [`throw`], so a
409/// signal being raised during the execution of `block` will not automatically
410/// result in it being caught. You first need to use [`register_hook`] to
411/// register a hook which calls `throw`.
412pub fn catch<F, R>(block: F) -> Result<R, ExtExceptionInfo>
413where
414 F: FnOnce() -> R + UnwindSafe,
415{
416 // Okay, so here our control flow is spaghetti embedded in five-dimensional
417 // non-Euclidean space. We're going to be calling into the C function
418 // hwexception_catch. hwexception_catch *has* to be written in C, because it
419 // calls setjmp. setjmp returns twice, which is something Rust fundamentally
420 // can't cope with. But hwexception_catch only returns once, so it's okay
421 // for us to call into it from Rust code.
422 //
423 // hwexception_catch takes five arguments:
424 // 1. A pointer to a (C ABI) callback function which accepts a void* context
425 // argument.
426 // 2. A context argument to pass to the callback.
427 // 3. A pointer to a buffer where `hwexcept_throw` can record exception
428 // info. Specifically, this buffer will be holding an ExtExceptionInfo.
429 // 4. A pointer to thread-local storage which can be used to stash a pointer
430 // to a setjmp buffer.
431 // 5. A pointer to thread-local storage which can be used to stash a copy of
432 // the pointer from argument 3.
433 //
434 // hwexception_catch is going to set up a jump buffer, store the two
435 // pointers in the TLS cells we've given it, call the callback, restore the
436 // TLS cells to their original values, and then return 0 if no exception was
437 // thrown or nonzero otherwise. Whatever's stored in the TLS cells will be
438 // looked up by our throw function.
439 //
440 // The job of this function is to wrap up the Rust-ABI closure `block` into
441 // something callable from C. In addition to the obvious pointer-wrangling,
442 // this also necessitates dealing with panics. Unwinding from a panic will
443 // cause an immediate process abort if the unwind crosses a non-Rust frame,
444 // and we don't want to happen. So, panics that come out of `block` need get
445 // caught before they hit any C frame, and then rethrown once
446 // `hwexcept_throw` has returned and we're safely back in Rust-land.
447
448
449 // Before we call into hwexcept_throw, this union will hold `block` in its
450 // `call` member. After hwexcept_throw returns, `call` (an FnOnce) will have
451 // been consumed, and this union will be in one of three states:
452 // 1. If the block returned normally, this union's `noexception` member will
453 // hold an Ok() of its return value.
454 // 2. If the block panicked, this union's `noexception` member will hold an
455 // Err() of the panic payload.
456 // 3. If the block raised an exception, this union will hold a Box<dyn
457 // Exception> in its `exception` member.
458 // We'll be using a pointer to a result buffer as the context argument to
459 // our callback.
460 union ResultBuffer<F, R> {
461 call: ManuallyDrop<F>,
462 noexception: ManuallyDrop<Result<R, Box<dyn Any + Send>>>,
463 exception: ManuallyDrop<ExtExceptionInfo>,
464 }
465
466 // This is the callback we'll passing as our first argument to hwexcept_throw.
467 unsafe extern "C" fn callback<F, R>(ctx: *mut c_void)
468 where
469 F: FnOnce() -> R + UnwindSafe,
470 {
471 let ctx: &mut ResultBuffer<F, R> = &mut *ctx.cast();
472 // Cast the void* context argument back into its real type.
473 let f = ManuallyDrop::take(&mut ctx.call);
474 // Call it, and if it panics, catch it.
475 let result = std::panic::catch_unwind(f);
476 // If we reach this line, no exception was raised. Write the result into
477 // the `noexception` member of the ResultBuffer.
478 ctx.noexception = ManuallyDrop::new(result);
479 }
480
481 // Create the result buffer in its initial state with its `call` member
482 // active.
483 let mut result_buffer = ResultBuffer::<F, R> {
484 call: ManuallyDrop::new(block),
485 };
486
487 unsafe {
488 JMP_BUF_PTR.with(|jmp_buf| {
489 ERR_BUF_PTR.with(|err_buf| {
490 if hwexception_catch(
491 callback::<F, R>,
492 addr_of_mut!(result_buffer).cast(),
493 addr_of_mut!(result_buffer.exception).cast(),
494 jmp_buf.get(),
495 err_buf.get(),
496 ) == 0
497 {
498 // A zero return means that no exception was raised, so we reached
499 // the end of the callback and now the `noexception` member contains
500 // what the callback stored into it, so extract it. If it's a normal
501 // return, return it. If it's a panic, resume panicking.
502 match ManuallyDrop::into_inner(result_buffer.noexception) {
503 Ok(success) => Ok(success),
504 Err(payload) => std::panic::resume_unwind(payload),
505 }
506 } else {
507 // A non-zero return means that an exception was raised, and now the
508 // `exception` member contains what the throw function put there.
509 // Extract this and return it.
510 Err(ManuallyDrop::into_inner(result_buffer.exception))
511 }
512 })
513 })
514 }
515}