1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
//! Process-wide memory barrier for Linux, Windows, OSX, FreeBSD, Android, iOS,
//! [Miri](https://github.com/rust-lang/miri) and [Loom](https://github.com/tokio-rs/loom).
//!
//! Memory barrier is one of the strongest synchronization primitives in modern relaxed-memory
//! concurrency. In relaxed-memory concurrency, two threads may have different viewpoint on the
//! underlying memory system, e.g. thread T1 may have recognized a value V at location X, while T2 does
//! not know of X=V at all. This discrepancy is one of the main reasons why concurrent programming is
//! hard. Memory barrier synchronizes threads in such a way that after memory barriers, threads have the
//! same viewpoint on the underlying memory system.
//!
//! Unfortunately, memory barrier is not cheap. Usually, in modern computer systems, there's a
//! designated memory barrier instruction, e.g. `MFENCE` in x86 and `DMB SY` in ARM, and they may
//! take more than 100 cycles. Use of memory barrier instruction may be tolerable for several use
//! cases, e.g. context switching of a few threads, or synchronizing events that happen only once in
//! the lifetime of a long process. However, sometimes a memory barrier is necessary in the fast path,
//! which significantly degrades performance.
//!
//! In order to reduce the synchronization cost of memory barrier, some OSs provide
//! *process-wide memory barrier*, which basically performs a memory barrier for every thread in the
//! process. Provided that it's even slower than the ordinary memory barrier instruction, what's the
//! benefit? At the cost of process-wide memory barrier, other threads may be exempted from issuing a
//! memory barrier instruction at all! In other words, by using process-wide memory barrier, you can
//! optimize the fast path at the performance cost of the slow path.
//!
//! This crate provides an abstraction of process-wide memory barrier over different operating
//! systems and hardware. It is implemented as follows. For Linux 4.14+ and FreeBSD 14.1+ systems,
//! we use the `membarrier()` system call. On older x86 and x86_64 Linux systems without
//! support for `membarrier()`, we fall back to the `mprotect()` system call that is known to provide
//! process-wide memory barrier semantics. For Windows, we use the `FlushProcessWriteBuffers()`
//! API. For Apple systems, we call `thread_get_register_pointer_values()` for every thread.
//! For all the other systems, we fall back to the normal `SeqCst` fence for both fast and slow
//! paths.
//!
//! # Usage
//!
//! Use this crate as follows:
//!
//! ```
//! use std::sync::atomic::{fence, Ordering};
//!
//! membarrier2::light(); // light-weight barrier
//! membarrier2::heavy(); // heavy-weight barrier
//! fence(Ordering::SeqCst); // normal barrier
//! ```
//!
//! # Semantics
//!
//! Formally, there are three kinds of memory barriers: the light one `membarrier2::light()`, the heavy
//! one `membarrier2::heavy()`, and the normal one `fence(Ordering::SeqCst)`. In an execution of a
//! program, there is a total order over all instances of memory barriers. If thread A issues barrier
//! X and thread B issues barrier Y and X is ordered before Y, then A's knowledge on the underlying
//! memory system at the time of X is transferred to B after Y, provided that:
//!
//! - Either of A's or B's barrier is heavy; or
//! - Both of A's and B's barriers are normal.
//!
//! # Reference
//!
//! For more information, see the [Linux `man` page for
//! `membarrier`](http://man7.org/linux/man-pages/man2/membarrier.2.html).
cfg_if!
/// Issues a light memory barrier for the fast path.
///
/// On supported systems, it issues a *compiler* fence, which disallows compiler optimizations across itself.
/// On unsupported systems, it falls back to the normal memory barrier instruction.
/// Issues a heavy memory barrier for the slow path.
///
/// On supported systems, it uses an OS specific syscall or API to issue a process-wide memory barrier.
/// On unsupported systems, it falls back to the normal memory barrier instruction.
pub
pub