1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
//! Shareable mutable containers easing detection of race conditions in thread
//! synchronization testing code.
//!
//! # Motivation
//!
//! The main purpose of a thread synchronization protocol is to ensure that
//! some operations which are not atomic in hardware, such as writing to two
//! unrelated memory locations, appear to occur as atomic transactions from the
//! point of view of other threads: at any point of time, either a given
//! operation appears to be done, or it appears not to have started.
//!
//! Testing a thread synchronization primitive involves showing that
//! inconsistent states (where an operation appears to be half-performed) have a
//! negligible probability of being exposed to the outside world in all expected
//! usage scenarios. It is typically done by showing that a given non-atomic
//! operation is properly encapsulated by the transactional semantics of the
//! synchronization protocol, and will never appear as half-done to observers.
//!
//! Non-atomic operations are easier said than done, however, when the set of
//! operations which can be atomic in hardware is larger than most people think
//! (at the time of writing, current Intel CPUs can access memory in blocks of
//! 128 bits, and current NVidia GPUs can do so in blocks of 1024 bits), not
//! well-defined by the architecture (and thus subjected to increase in future
//! hardware), and highly dependent on the compiler's optimization choices in a
//! high-level programming language such as Rust.
//!
//! Which is why I think we need some types whose operations are guaranteed to
//! be non-atomic under a set of reasonable assumptions, and which can easily be
//! observed to be in an inconsistent state.
//!
//! # Functionality
//!
//! This module provides the RaceCell type, which can hold a value of a
//! certain type T like a `Cell<T>` would, but is guaranteed **not** to be read
//! or written to in a single atomic operation, even if the corresponding type T
//! can be atomically read or written to in hardware.
//!
//! Furthermore, a RaceCell can detect scenarios where it is read at the same
//! time as it is being written (which constitutes a read-after-write race
//! condition) and report them to the reader thread.
//!
//! Such "controlled data races" can, in turn, be used to detect failures in
//! thread synchronization protocols, manifesting as inconsistent shared states
//! being exposed to the outside world.
//!
//! # Requirements on T
//!
//! In principle, any Clone + Eq type T whose equality operator and clone()
//! implementation behave well even if the inner data is in a inconsistent state
//! (e.g. filled with random bits) could be used. This is true of all primitive
//! integral types and aggregates thereof, for example.
//!
//! However, in practice, unsynchronized concurrent read/write access to
//! arbitrary data from multiple threads constitutes a data race, which is
//! undefined behaviour in Rust even when it occurs inside an UnsafeCell. This
//! is problematic because rustc's optimizer allows itself to transform code
//! which relies on undefined behaviour however it likes, leading to breakages
//! in release builds such as infinite loops appearing out of nowhere.
//!
//! For this reason, we currently only support some specific types T which have
//! atomic load and store operations, implemented as part of an atomic wrapper.
//! Note that although individual loads and stores to T are atomic, loads and
//! stores to RaceCell<T> are still guaranteed not to be atomic.

#![deny(missing_docs)]

use std::sync::atomic::{
    AtomicBool, AtomicI16, AtomicI32, AtomicI64, AtomicI8, AtomicIsize, AtomicPtr, AtomicU16,
    AtomicU32, AtomicU64, AtomicU8, AtomicUsize, Ordering,
};

/// Shareable mutable container for triggering and detecting write-after-read
/// data races in a well-controlled fashion.
#[derive(Debug, Default)]
pub struct RaceCell<T: AtomicData> {
    /// Two copies of a value of type T are made. One is stored on the stack...
    local_contents: T::AtomicWrapper,

    /// ...and one is stored on the heap, which in all popular OSs is too far
    /// away from the stack to allow any significant probability of the hardware
    /// writing both copies in a single atomic transactions.
    ///
    /// Of course, a malicious optimizer could still use hardware transactional
    /// memory or a software emulation thereof to achieve this effect, but there
    /// are no performance benefits in doing so, and in fact it will rather have
    /// an averse effect on performance, so a realistic optimizer won't do it.
    ///
    remote_version: Box<T::AtomicWrapper>,
}
//
impl<T: AtomicData> RaceCell<T> {
    /// Create a new RaceCell with a certain initial content
    pub fn new(value: T) -> Self {
        RaceCell {
            local_contents: T::AtomicWrapper::new(value.clone()),
            remote_version: Box::new(T::AtomicWrapper::new(value)),
        }
    }

    /// Update the internal contents of the RaceCell in a non-atomic fashion
    pub fn set(&self, value: T) {
        self.local_contents.relaxed_store(value.clone());
        self.remote_version.relaxed_store(value);
    }

    /// Read the current contents of the RaceCell, detecting any data race
    /// caused by a concurrently occurring write along the way.
    pub fn get(&self) -> Racey<T> {
        let local_data = self.local_contents.relaxed_load();
        let remote_data = self.remote_version.relaxed_load();
        if local_data == remote_data {
            Racey::Consistent(local_data)
        } else {
            Racey::Inconsistent
        }
    }
}
//
impl<T: AtomicData> Clone for RaceCell<T> {
    /// Making RaceCells cloneable allows putting them in concurrent containers
    fn clone(&self) -> Self {
        let local_copy = self.local_contents.relaxed_load();
        let remote_copy = self.remote_version.relaxed_load();
        RaceCell {
            local_contents: T::AtomicWrapper::new(local_copy),
            remote_version: Box::new(T::AtomicWrapper::new(remote_copy)),
        }
    }
}

/// This is the result of a RaceCell read
#[derive(Debug, Eq, PartialEq)]
pub enum Racey<U: AtomicData> {
    /// The RaceCell was internally consistent, and its content was copied
    Consistent(U),

    /// The RaceCell was internally inconsistent: a data race has occurred
    Inconsistent,
}

/// Requirements on the data held by a RaceCell
pub trait AtomicData: Clone + Eq + Sized {
    /// Atomic wrapper type for this data implementing relaxed atomic load/store
    type AtomicWrapper: AtomicLoadStore<Content = Self>;
}
///
/// Atomic wrapper type for a certain kind of value
///
/// For the implementation of RaceCell not to be considered as undefined
/// behaviour and trashed by the Rust compiler in release mode, the underlying
/// type must be loaded and stored atomically. This requires synchronizing
/// accesses to it using things like the std::sync::atomic::AtomicXyz wrappers.
///
/// The only guarantee that we need is that loads and stores are atomic. We do
/// not need any other memory ordering guarantee. This is why we allow for
/// more wrapper type implementation freedom by not specifying memory orderings,
/// and internally relying only on relaxed ordering.
///
pub trait AtomicLoadStore: Sized {
    /// Type of data that is being wrapped
    type Content: AtomicData<AtomicWrapper = Self>;

    /// Create an atomic wrapper for a value of type U
    fn new(v: Self::Content) -> Self;

    /// Atomically load a value from the wrapper
    fn relaxed_load(&self) -> Self::Content;

    /// Atomically store a new value into the wrapper
    fn relaxed_store(&self, val: Self::Content);
}
///
/// This macro implements support for non-generic standard atomic types
///
macro_rules! impl_atomic_data {
    ($($data:ty => $wrapper:ty),*) => ($(
        impl AtomicData for $data {
            type AtomicWrapper = $wrapper;
        }

        impl AtomicLoadStore for $wrapper {
            type Content = $data;

            fn new(v: $data) -> $wrapper {
                <$wrapper>::new(v)
            }

            fn relaxed_load(&self) -> $data {
                <$wrapper>::load(self, Ordering::Relaxed)
            }

            fn relaxed_store(&self, val: $data) {
                <$wrapper>::store(self, val, Ordering::Relaxed)
            }
        }
    )*)
}
//
impl_atomic_data! {
    bool  => AtomicBool,
    i8    => AtomicI8,
    i16   => AtomicI16,
    i32   => AtomicI32,
    i64   => AtomicI64,
    isize => AtomicIsize,
    u8    => AtomicU8,
    u16   => AtomicU16,
    u32   => AtomicU32,
    u64   => AtomicU64,
    usize => AtomicUsize
}
//
// Atomic pointers are a bit special as they are generic, for now we will just
// treat them as a special case.
//
impl<V> AtomicData for *mut V {
    type AtomicWrapper = AtomicPtr<V>;
}
///
impl<V> AtomicLoadStore for AtomicPtr<V> {
    type Content = *mut V;

    fn new(v: *mut V) -> AtomicPtr<V> {
        <AtomicPtr<V>>::new(v)
    }

    fn relaxed_load(&self) -> *mut V {
        <AtomicPtr<V>>::load(self, Ordering::Relaxed)
    }

    fn relaxed_store(&self, val: *mut V) {
        <AtomicPtr<V>>::store(self, val, Ordering::Relaxed)
    }
}

// FIXME: The astute reader will have noted that any data could be theoretically
//        put in a RaceCell by using a Mutex as the AtomicWrapper. However, this
//        will only be implemented once Rust has specialization, to avoid
//        pessimizing the common case where a primitive type is enough.

/// Here are some RaceCell tests
#[cfg(test)]
mod tests {
    use super::{AtomicLoadStore, RaceCell, Racey};
    use std::sync::Mutex;

    /// A RaceCell should be created in a consistent and correct state
    #[test]
    fn initial_state() {
        let cell = RaceCell::new(true);
        assert_eq!(cell.local_contents.relaxed_load(), true);
        assert_eq!(cell.remote_version.relaxed_load(), true);
    }

    /// Reading a consistent RaceCell should work as expected
    #[test]
    fn consistent_read() {
        let cell = RaceCell::new(-42_isize);
        assert_eq!(cell.get(), Racey::Consistent(-42));
    }

    /// Reading an inconsistent RaceCell should work as expected
    #[test]
    fn inconsistent_read() {
        let cell = RaceCell::new(0xbad_usize);
        cell.local_contents.relaxed_store(0xdead);
        assert_eq!(cell.get(), Racey::Inconsistent);
    }

    /// RaceCells should be cloned as-is, even if in an inconsistent state
    #[test]
    fn clone() {
        let cell = RaceCell::new(0xbeef_usize);
        cell.local_contents.relaxed_store(0xdeaf);
        let clone = cell.clone();
        assert_eq!(clone.local_contents.relaxed_load(), 0xdeaf);
        assert_eq!(clone.remote_version.relaxed_load(), 0xbeef);
    }

    /// Unprotected concurrent reads and writes to a RaceCell should trigger
    /// detectable race conditions, illustrating its non-atomic nature.
    ///
    /// To maximize the odds of race conditions, this kind of test should be run
    /// in single-threaded mode.
    ///
    #[test]
    #[ignore]
    fn unprotected_race() {
        // Amount of writes to carry out
        const WRITES_COUNT: usize = 100_000_000;

        // RaceCell in which the writes will be carried out
        let cell = RaceCell::new(0);

        // Make sure that RaceCell does expose existing data races, with a
        // detection probability better than 1% for very obvious ones :)
        crate::concurrent_test_2(
            || {
                for i in 1..=WRITES_COUNT {
                    cell.set(i);
                }
            },
            || {
                let mut last_value = 0;
                let mut data_race_count = 0usize;
                while last_value != WRITES_COUNT {
                    match cell.get() {
                        Racey::Consistent(value) => last_value = value,
                        Racey::Inconsistent => data_race_count += 1,
                    }
                }
                print!("{} races detected: ", data_race_count);
                assert!(data_race_count > WRITES_COUNT / 100);
            },
        );
    }

    /// Appropriately protected concurrent reads and writes to a RaceCell should
    /// not yield any detectable race conditions.
    ///
    /// To maximize the odds of race conditions, this kind of test should be run
    /// in single-threaded mode.
    ///
    #[test]
    #[ignore]
    fn protected_transaction() {
        // Amount of writes to carry out
        const WRITES_COUNT: usize = 10_000_000;

        // Mutex-protected RaceCell in which the writes will be carried out
        let cell = Mutex::new(RaceCell::new(0));

        // Make sure that RaceCell does not incorrectly detect race conditions
        crate::concurrent_test_2(
            || {
                for i in 1..=WRITES_COUNT {
                    cell.lock().unwrap().set(i);
                }
            },
            || {
                let mut last_value = 0;
                let mut data_race_count = 0usize;
                while last_value != WRITES_COUNT {
                    match cell.lock().unwrap().get() {
                        Racey::Consistent(value) => last_value = value,
                        Racey::Inconsistent => data_race_count += 1,
                    }
                }
                assert_eq!(data_race_count, 0);
            },
        );
    }
}