tracing_mutex/
lib.rs

1//! Mutexes can deadlock each other, but you can avoid this by always acquiring your locks in a
2//! consistent order. This crate provides tracing to ensure that you do.
3//!
4//! This crate tracks a virtual "stack" of locks that the current thread holds, and whenever a new
5//! lock is acquired, a dependency is created from the last lock to the new one. These dependencies
6//! together form a graph. As long as that graph does not contain any cycles, your program is
7//! guaranteed to never deadlock.
8//!
9//! # Panics
10//!
11//! The primary method by which this crate signals an invalid lock acquisition order is by
12//! panicking. When a cycle is created in the dependency graph when acquiring a lock, the thread
13//! will instead panic. This panic will not poison the underlying mutex.
14//!
15//! This conflicting dependency is not added to the graph, so future attempts at locking should
16//! succeed as normal.
17//!
18//! # Structure
19//!
20//! Each module in this crate exposes wrappers for a specific base-mutex with dependency trakcing
21//! added. This includes [`stdsync`] which provides wrappers for the base locks in the standard
22//! library, and more depending on enabled compile-time features. More back-ends may be added as
23//! features in the future.
24//!
25//! # Feature flags
26//!
27//! `tracing-mutex` uses feature flags to reduce the impact of this crate on both your compile time
28//! and runtime overhead. Below are the available flags. Modules are annotated with the features
29//! they require.
30//!
31//! - `backtraces`: Enables capturing backtraces of mutex dependencies, to make it easier to
32//!   determine what sequence of events would trigger a deadlock. This is enabled by default, but if
33//!   the performance overhead is unacceptable, it can be disabled by disabling default features.
34//!
35//! - `lockapi`: Enables the wrapper lock for [`lock_api`][lock_api] locks
36//!
37//! - `parkinglot`: Enables wrapper types for [`parking_lot`][parking_lot] mutexes
38//!
39//! - `experimental`: Enables experimental features. Experimental features are intended to test new
40//!   APIs and play with new APIs before committing to them. As such, breaking changes may be
41//!   introduced in it between otherwise semver-compatible versions, and the MSRV does not apply to
42//!   experimental features.
43//!
44//! # Performance considerations
45//!
46//! Tracing a mutex adds overhead to certain mutex operations in order to do the required
47//! bookkeeping. The following actions have the following overhead.
48//!
49//! - **Acquiring a lock** locks the global dependency graph temporarily to check if the new lock
50//!   would introduce a cyclic dependency. This crate uses the algorithm proposed in ["A Dynamic
51//!   Topological Sort Algorithm for Directed Acyclic Graphs" by David J. Pearce and Paul H.J.
52//!   Kelly][paper] to detect cycles as efficently as possible. In addition, a thread local lock set
53//!   is updated with the new lock.
54//!
55//! - **Releasing a lock** updates a thread local lock set to remove the released lock.
56//!
57//! - **Allocating a lock** performs an atomic update to a shared counter.
58//!
59//! - **Deallocating a mutex** temporarily locks the global dependency graph to remove the lock
60//!   entry in the dependency graph.
61//!
62//! These operations have been reasonably optimized, but the performance penalty may yet be too much
63//! for production use. In those cases, it may be beneficial to instead use debug-only versions
64//! (such as [`stdsync::Mutex`]) which evaluate to a tracing mutex when debug assertions are
65//! enabled, and to the underlying mutex when they're not.
66//!
67//! For ease of debugging, this crate will, by default, capture a backtrace when establishing a new
68//! dependency between two mutexes. This has an additional overhead of over 60%. If this additional
69//! debugging aid is not required, it can be disabled by disabling default features.
70//!
71//! [paper]: https://whileydave.com/publications/pk07_jea/
72//! [lock_api]: https://docs.rs/lock_api/0.4/lock_api/index.html
73//! [parking_lot]: https://docs.rs/parking_lot/0.12.1/parking_lot/
74#![cfg_attr(docsrs, feature(doc_cfg))]
75use std::cell::RefCell;
76use std::fmt;
77use std::marker::PhantomData;
78use std::ops::Deref;
79use std::ops::DerefMut;
80use std::sync::Mutex;
81use std::sync::MutexGuard;
82use std::sync::OnceLock;
83use std::sync::PoisonError;
84use std::sync::atomic::AtomicUsize;
85use std::sync::atomic::Ordering;
86
87#[cfg(feature = "lock_api")]
88#[cfg_attr(docsrs, doc(cfg(feature = "lockapi")))]
89#[deprecated = "The top-level re-export `lock_api` is deprecated. Use `tracing_mutex::lockapi::raw` instead"]
90pub use lock_api;
91#[cfg(feature = "parking_lot")]
92#[cfg_attr(docsrs, doc(cfg(feature = "parkinglot")))]
93#[deprecated = "The top-level re-export `parking_lot` is deprecated. Use `tracing_mutex::parkinglot::raw` instead"]
94pub use parking_lot;
95
96use graph::DiGraph;
97use reporting::Dep;
98use reporting::Reportable;
99
100mod graph;
101#[cfg(any(feature = "lock_api", feature = "lockapi"))]
102#[cfg_attr(docsrs, doc(cfg(feature = "lock_api")))]
103#[cfg_attr(
104    all(not(docsrs), feature = "lockapi", not(feature = "lock_api")),
105    deprecated = "The `lockapi` feature has been renamed `lock_api`"
106)]
107pub mod lockapi;
108#[cfg(any(feature = "parking_lot", feature = "parkinglot"))]
109#[cfg_attr(docsrs, doc(cfg(feature = "parking_lot")))]
110#[cfg_attr(
111    all(not(docsrs), feature = "parkinglot", not(feature = "parking_lot")),
112    deprecated = "The `parkinglot` feature has been renamed `parking_lot`"
113)]
114pub mod parkinglot;
115mod reporting;
116pub mod stdsync;
117pub mod util;
118
119thread_local! {
120    /// Stack to track which locks are held
121    ///
122    /// Assuming that locks are roughly released in the reverse order in which they were acquired,
123    /// a stack should be more efficient to keep track of the current state than a set would be.
124    static HELD_LOCKS: RefCell<Vec<usize>> = const { RefCell::new(Vec::new()) };
125}
126
127/// Dedicated ID type for Mutexes
128///
129/// # Unstable
130///
131/// This type is currently private to prevent usage while the exact implementation is figured out,
132/// but it will likely be public in the future.
133struct MutexId(usize);
134
135impl MutexId {
136    /// Get a new, unique, mutex ID.
137    ///
138    /// This ID is guaranteed to be unique within the runtime of the program.
139    ///
140    /// # Panics
141    ///
142    /// This function may panic when there are no more mutex IDs available. The number of mutex ids
143    /// is `usize::MAX - 1` which should be plenty for most practical applications.
144    pub fn new() -> Self {
145        // Counter for Mutex IDs. Atomic avoids the need for locking.
146        static ID_SEQUENCE: AtomicUsize = AtomicUsize::new(0);
147
148        ID_SEQUENCE
149            .fetch_update(Ordering::SeqCst, Ordering::SeqCst, |id| id.checked_add(1))
150            .map(Self)
151            .expect("Mutex ID wraparound happened, results unreliable")
152    }
153
154    pub fn value(&self) -> usize {
155        self.0
156    }
157
158    /// Get a borrowed guard for this lock.
159    ///
160    /// This method adds checks adds this Mutex ID to the dependency graph as needed, and adds the
161    /// lock to the list of
162    ///
163    /// # Panics
164    ///
165    /// This method panics if the new dependency would introduce a cycle.
166    pub fn get_borrowed(&self) -> BorrowedMutex<'_> {
167        self.mark_held();
168        BorrowedMutex {
169            id: self,
170            _not_send: PhantomData,
171        }
172    }
173
174    /// Mark this lock as held for the purposes of dependency tracking.
175    ///
176    /// # Panics
177    ///
178    /// This method panics if the new dependency would introduce a cycle.
179    pub fn mark_held(&self) {
180        let opt_cycle = HELD_LOCKS.with(|locks| {
181            if let Some(&previous) = locks.borrow().last() {
182                let mut graph = get_dependency_graph();
183
184                graph.add_edge(previous, self.value(), Dep::capture).err()
185            } else {
186                None
187            }
188        });
189
190        if let Some(cycle) = opt_cycle {
191            panic!("{}", Dep::panic_message(&cycle))
192        }
193
194        HELD_LOCKS.with(|locks| locks.borrow_mut().push(self.value()));
195    }
196
197    pub unsafe fn mark_released(&self) {
198        HELD_LOCKS.with(|locks| {
199            let mut locks = locks.borrow_mut();
200
201            for (i, &lock) in locks.iter().enumerate().rev() {
202                if lock == self.value() {
203                    locks.remove(i);
204                    return;
205                }
206            }
207
208            // Drop impls shouldn't panic but if this happens something is seriously broken.
209            unreachable!("Tried to drop lock for mutex {:?} but it wasn't held", self)
210        });
211    }
212
213    /// Execute the given closure while the guard is held.
214    pub fn with_held<T>(&self, f: impl FnOnce() -> T) -> T {
215        // Note: we MUST construct the RAII guard, we cannot simply mark held + mark released, as
216        // f() may panic and corrupt our state.
217        let _guard = self.get_borrowed();
218        f()
219    }
220}
221
222impl Default for MutexId {
223    fn default() -> Self {
224        Self::new()
225    }
226}
227
228impl fmt::Debug for MutexId {
229    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
230        write!(f, "MutexID({:?})", self.0)
231    }
232}
233
234impl Drop for MutexId {
235    fn drop(&mut self) {
236        get_dependency_graph().remove_node(self.value());
237    }
238}
239
240/// `const`-compatible version of [`crate::MutexId`].
241///
242/// This struct can be used similarly to the normal mutex ID, but to be const-compatible its ID is
243/// generated on first use. This allows it to be used as the mutex ID for mutexes with a `const`
244/// constructor.
245///
246/// This type can be largely replaced once std::lazy gets stabilized.
247struct LazyMutexId {
248    inner: OnceLock<MutexId>,
249}
250
251impl LazyMutexId {
252    pub const fn new() -> Self {
253        Self {
254            inner: OnceLock::new(),
255        }
256    }
257}
258
259impl fmt::Debug for LazyMutexId {
260    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
261        write!(f, "{:?}", self.deref())
262    }
263}
264
265impl Default for LazyMutexId {
266    fn default() -> Self {
267        Self::new()
268    }
269}
270
271impl Deref for LazyMutexId {
272    type Target = MutexId;
273
274    fn deref(&self) -> &Self::Target {
275        self.inner.get_or_init(MutexId::new)
276    }
277}
278
279/// Borrowed mutex ID
280///
281/// This type should be used as part of a mutex guard wrapper. It can be acquired through
282/// [`MutexId::get_borrowed`] and will automatically mark the mutex as not borrowed when it is
283/// dropped.
284///
285/// This type intentionally is [`!Send`](std::marker::Send) because the ownership tracking is based
286/// on a thread-local stack which doesn't work if a guard gets released in a different thread from
287/// where they're acquired.
288#[derive(Debug)]
289struct BorrowedMutex<'a> {
290    /// Reference to the mutex we're borrowing from
291    id: &'a MutexId,
292    /// This value serves no purpose but to make the type [`!Send`](std::marker::Send)
293    _not_send: PhantomData<MutexGuard<'static, ()>>,
294}
295
296/// Drop a lock held by the current thread.
297///
298/// # Panics
299///
300/// This function panics if the lock did not appear to be handled by this thread. If that happens,
301/// that is an indication of a serious design flaw in this library.
302impl Drop for BorrowedMutex<'_> {
303    fn drop(&mut self) {
304        // Safety: the only way to get a BorrowedMutex is by locking the mutex.
305        unsafe { self.id.mark_released() };
306    }
307}
308
309/// Get a reference to the current dependency graph
310fn get_dependency_graph() -> impl DerefMut<Target = DiGraph<usize, Dep>> {
311    static DEPENDENCY_GRAPH: OnceLock<Mutex<DiGraph<usize, Dep>>> = OnceLock::new();
312
313    DEPENDENCY_GRAPH
314        .get_or_init(Default::default)
315        .lock()
316        .unwrap_or_else(PoisonError::into_inner)
317}
318
319#[cfg(test)]
320mod tests {
321    use rand::seq::SliceRandom;
322    use rand::thread_rng;
323
324    use super::*;
325
326    #[test]
327    fn test_next_mutex_id() {
328        let initial = MutexId::new();
329        let next = MutexId::new();
330
331        // Can't assert N + 1 because multiple threads running tests
332        assert!(initial.0 < next.0);
333    }
334
335    #[test]
336    fn test_lazy_mutex_id() {
337        let a = LazyMutexId::new();
338        let b = LazyMutexId::new();
339        let c = LazyMutexId::new();
340
341        let mut graph = get_dependency_graph();
342        assert!(graph.add_edge(a.value(), b.value(), Dep::capture).is_ok());
343        assert!(graph.add_edge(b.value(), c.value(), Dep::capture).is_ok());
344
345        // Creating an edge c → a should fail as it introduces a cycle.
346        assert!(graph.add_edge(c.value(), a.value(), Dep::capture).is_err());
347
348        // Drop graph handle so we can drop vertices without deadlocking
349        drop(graph);
350
351        drop(b);
352
353        // If b's destructor correctly ran correctly we can now add an edge from c to a.
354        assert!(
355            get_dependency_graph()
356                .add_edge(c.value(), a.value(), Dep::capture)
357                .is_ok()
358        );
359    }
360
361    /// Test creating a cycle, then panicking.
362    #[test]
363    #[should_panic]
364    fn test_mutex_id_conflict() {
365        let ids = [MutexId::new(), MutexId::new(), MutexId::new()];
366
367        for i in 0..3 {
368            let _first_lock = ids[i].get_borrowed();
369            let _second_lock = ids[(i + 1) % 3].get_borrowed();
370        }
371    }
372
373    /// Fuzz the global dependency graph by fake-acquiring lots of mutexes in a valid order.
374    ///
375    /// This test generates all possible forward edges in a 100-node graph consisting of natural
376    /// numbers, shuffles them, then adds them to the graph. This will always be a valid directed,
377    /// acyclic graph because there is a trivial order (the natural numbers) but because the edges
378    /// are added in a random order the DiGraph will still occassionally need to reorder nodes.
379    #[test]
380    fn fuzz_mutex_id() {
381        const NUM_NODES: usize = 100;
382
383        let ids: Vec<MutexId> = (0..NUM_NODES).map(|_| Default::default()).collect();
384
385        let mut edges = Vec::with_capacity(NUM_NODES * NUM_NODES);
386        for i in 0..NUM_NODES {
387            for j in i..NUM_NODES {
388                if i != j {
389                    edges.push((i, j));
390                }
391            }
392        }
393
394        edges.shuffle(&mut thread_rng());
395
396        for (x, y) in edges {
397            // Acquire the mutexes, smallest first to ensure a cycle-free graph
398            let _ignored = ids[x].get_borrowed();
399            let _ = ids[y].get_borrowed();
400        }
401    }
402}