membarrier 0.2.0

# Process-wide memory barrier

[![Build Status](https://travis-ci.org/jeehoonkang/membarrier-rs.svg?branch=master)](https://travis-ci.org/jeehoonkang/membarrier-rs)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](https://github.com/jeehoonkang/membarrier-rs)
[![Cargo](https://img.shields.io/crates/v/membarrier.svg)](https://crates.io/crates/membarrier)
[![Documentation](https://docs.rs/membarrier/badge.svg)](https://docs.rs/membarrier)

Memory barrier is one of the strongest synchronization primitives in modern relaxed-memory
concurrency. In relaxed-memory concurrency, two threads may have different viewpoint on the
underlying memory system, e.g. thread T1 may have recognized a value V at location X, while T2 does
not know of X=V at all. This discrepancy is one of the main reasons why concurrent programming is
hard. Memory barrier synchronizes threads in such a way that after memory barriers, threads have the
same viewpoint on the underlying memory system.

Unfortunately, memory barrier is not cheap. Usually, in modern computer systems, there's a
designated memory barrier instruction, e.g. `MFENCE` in x86 and `DMB SY` in ARM, and they may
take more than 100 cycles. Use of memory barrier instruction may be tolerable for several use
cases, e.g. context switching of a few threads, or synchronizing events that happen only once in
the lifetime of a long process. However, sometimes memory barrier is necessary in a fast path,
which significantly degrades the performance.

In order to reduce the synchronization cost of memory barrier, Linux and Windows provides
*process-wide memory barrier*, which basically performs memory barrier for every thread in the
process. Provided that it's even slower than the ordinary memory barrier instruction, what's the
benefit? At the cost of process-wide memory barrier, other threads may be exempted form issuing a
memory barrier instruction at all! In other words, by using process-wide memory barrier, you can
optimize fast path at the performance cost of slow path.

For process-wide memory barrier, Linux recently introduced the `sys_membarrier()` system call, but
it's known that in older Linux, the `mprotect()` system call with appropriate arguments provides
process-wide memory barrier semantics. Windows provides `FlushProcessWriteBuffers()` API.


## Usage

Use this crate as follows:

```rust
extern crate membarrier;
use std::sync::atomic::{fence, Ordering};

membarrier::light();     // light-weight barrier
membarrier::heavy();     // heavy-weight barrier
fence(Ordering::SeqCst); // normal barrier
```

## Semantics

Formally, there are three kinds of memory barrier: light one (`membarrier::light()`), heavy one
(`membarrier::heavy()`), and the normal one (`fence(Ordering::SeqCst)`). In an execution of a
program, there is a total order over all instances of memory barrier. If thread A issues barrier X
and thread B issues barrier Y and X is ordered before Y, then A's knowledge on the underlying memory
system at the time of X is transferred to B after Y, provided that:

- Either of A's or B's barrier is heavy; or
- Both of A's and B's barriers are normal.

## Reference

For more information, see the [Linux `man` page for
`membarrier`](http://man7.org/linux/man-pages/man2/membarrier.2.html).