Disruptor
This library is a low latency, inter-thread communication library written in Rust.
It's heavily inspired by the brilliant Disruptor library from LMAX.
Use it when you want to trade CPU resources for lower latency and higher throughput compared to e.g. Crossbeam or std::sync::mpsc channels.
Contents
- Getting Started
- Features
- Patterns
- Design Choices
- Correctness
- Performance
- Related Work
- Contributions
- Roadmap
Getting Started
Add the following to your Cargo.toml file:
disruptor = "3.7.0"
To read details of how to use the library, check out the documentation on docs.rs/disruptor.
Processing Events
There are two ways to process events:
- Supply a closure to the Disruptor and let it manage the processing thread(s).
- Use the
EventPollerAPI where you can poll for events (and manage your own threads).
Both have comparable performance so use what fits your use case best. (See also benchmarks.)
Single and Batch Publication With Managed Threads
Here's a minimal example demonstrating both single and batch publication. Note, batch publication should be used whenever possible for best latency and throughput (see benchmarks below).
use *;
// The event on the ring buffer.
Pinning Threads and Dependencies Between Processors
The library also supports pinning threads on cores to avoid latency induced by context switching. A more advanced usage demonstrating this and with multiple producers and multiple interdependent consumers could look like this:
use *;
use thread;
Processors with State
If you need to store some state in the processor thread which is neither Send nor Sync, e.g. a Rc<RefCell<i32>>, then you can create a closure for initializing that state and pass it along with the processing closure when you build the Disruptor. The Disruptor will then pass a mutable reference to your state on each event. As an example:
use ;
use *;
Event Polling
An alternative to storing state in the processor is to use the Event Poller API:
use *;
// The event on the ring buffer.
Features
- Single Producer Single Consumer (SPSC).
- Single Producer Multi Consumer (SPMC) with consumer interdependencies.
- Multi Producer Single Consumer (MPSC).
- Multi Producer Multi Consumer (MPMC) with consumer interdependencies.
- Busy-spin wait strategies.
- Batch publication of events.
- Batch consumption of events.
- Event Poller API.
- Thread affinity can be set for the event processor thread(s).
- Set thread name of each event processor thread.
Patterns
A Disruptor with Different Event Types
Let's assume you have multiple different types of producers that each publish distinct events. It could be an exchange where you receive e.g. client logins, logouts, orders, etc.
You can model this by using an enum as event type:
Then you can differentiate on different event types in your processor:
Splitting Workload Across Processors
Let's assume you have a high ingress rate of events and you need to split the work across multiple processors
to cope with the load. You can do that by assigning an id to each processor and then only process events
with a sequence number that modulo the number of processors equal the id.
Here, for simplicity, we split it across two processors:
let processor0 =
let processor1 =
This scheme ensures each event is processed once.
Design Choices
Everything in the library is about low-latency and this heavily influences all choices made in this library. As an example, you cannot allocate an event and move that into the ringbuffer. Instead, events are allocated on startup to ensure they are co-located in memory to increase cache coherency. However, you can still allocate e.g. a struct and move ownership to a field in the event on the Ringbuffer.
There's also no use of dynamic dispatch - everything is monomorphed.
Correctness
This library needs to use Unsafe to achieve low latency. Although the absence of bugs cannot be guaranteed, these approaches have been used to eliminate bugs:
- Minimal usage of Unsafe blocks.
- High test coverage.
- All tests are run on Miri in CI/CD.
- Verification in TLA+ (see the
verification/folder).
Performance
The SPSC and MPSC Disruptor variants have been benchmarked and compared to Crossbeam. See the code in the benches/spsc.rs and benches/mpsc.rs files.
The results below of the SPSC benchmark are gathered from running the benchmarks on a 2016 Macbook Pro running a 2,6 GHz Quad-Core Intel Core i7. So on a modern Intel Xeon the numbers should be even better. Furthermore, it's not possible to isolate cores on Mac and pin threads which would produce even more stable results. This is future work.
If you have any suggestions to improving the benchmarks, please feel free to open an issue.
To provide a somewhat realistic benchmark not only burst of different sizes are considered but also variable pauses between bursts: 0 ms, 1 ms and 10 ms.
The latencies below are the mean latency per element with 95% confidence interval (standard criterion settings). Capturing all latencies and calculating misc. percentiles (and in particular the max latency) is future work. However, I expect the below measurements to be representative for the actual performance you can achieve in a real application.
No Pause Between Bursts
Latency:
| Burst Size | Crossbeam | Disruptor | Improvement |
|---|---|---|---|
| 1 | 65 ns | 32 ns | 51% |
| 10 | 68 ns | 9 ns | 87% |
| 100 | 29 ns | 8 ns | 72% |
Throughput:
| Burst Size | Crossbeam | Disruptor | Improvement |
|---|---|---|---|
| 1 | 15.2M / s | 31.7M / s | 109% |
| 10 | 14.5M / s | 117.3M / s | 709% |
| 100 | 34.3M / s | 119.7M / s | 249% |
1 ms Pause Between Bursts
Latency:
| Burst Size | Crossbeam | Disruptor | Improvement |
|---|---|---|---|
| 1 | 63 ns | 33 ns | 48% |
| 10 | 67 ns | 8 ns | 88% |
| 100 | 30 ns | 9 ns | 70% |
Throughput:
| Burst Size | Crossbeam | Disruptor | Improvement |
|---|---|---|---|
| 1 | 15.9M / s | 30.7M / s | 93% |
| 10 | 14.9M / s | 117.7M / s | 690% |
| 100 | 33.8M / s | 105.0M / s | 211% |
10 ms Pause Between Bursts
Latency:
| Burst Size | Crossbeam | Disruptor | Improvement |
|---|---|---|---|
| 1 | 51 ns | 32 ns | 37% |
| 10 | 67 ns | 9 ns | 87% |
| 100 | 30 ns | 10 ns | 67% |
Throughput:
| Burst Size | Crossbeam | Disruptor | Improvement |
|---|---|---|---|
| 1 | 19.5M / s | 31.6M / s | 62% |
| 10 | 14.9M / s | 114.5M / s | 668% |
| 100 | 33.6M / s | 105.0M / s | 213% |
Conclusion
There's clearly a difference between the Disruptor and the Crossbeam libs. However, this is not because the Crossbeam library is not a great piece of software. It is. The Disruptor trades CPU and memory resources for lower latency and higher throughput and that is why it's able to achieve these results. The Disruptor also excels if you can publish batches of events as demonstrated in the benchmarks with bursts of 10 and 100 events.
Both libraries greatly improves as the burst size goes up but the Disruptor's performance is more resilient to the pauses between bursts which is one of the design goals.
Related Work
There are multiple other Rust projects that mimic the LMAX Disruptor library:
A key feature that this library supports is multiple producers from different threads that neither of the above libraries support (at the time of writing).
Contributions
You are welcome to create a Pull-Request or open an issue with suggestions for improvements.
Changes are accepted solely at my discretion and I will focus on whether the changes are a good fit for the purpose and design of this crate.
Roadmap
Empty! All the items have been implemented.