rust-coroutines
'Native' stackful coroutine library for Rust.
Replace std::thread::spawn
with coroutines::spawn
and that's all. Everything will just work!
Features
- Most
libstd
functions that was blocking get patched and become asynchronous automatically - Efficient asynchronous I/O and task scheduling
- Voluntary preemption and work stealing
How does it work?
rust-coroutines includes a separate runtime library which provides coroutine scheduling and patches the runtime environment, hooking the entries to blocking system calls in libc.
Build and use
The runtime library includes a lot of platform-specific code and writing it in Rust won't provide much benefit. Therefore, I decide to write it in C. Before using the library, you need to build (and optionally 'install') core_impl/unblock_hook
first (clang required):
cd core_impl
make
sudo cp libunblock_hook.so /usr/lib/ # Optional (see below)
If you can't or don't want to install libunblock_hook
as a system-wide library, you could specify RUSTFLAGS="-L/path/to/core_impl/"
for cargo commands and specify LD_LIBRARY_PATH=/path/to/core_impl/
when running the built binaries.
After setting up the runtime library, you will be able to use the coroutines
crate in your application.
For example:
sleep.rs
extern crate coroutines;
tcp_listener.rs
extern crate coroutines;
use ;
use ;
More examples can be found in examples/
.
It should be noted that all data structures related to blocking libstd
API calls should be constructed in a coroutine to inform the runtime library to apply necessary patches. Typically, this can be done by spawning a coroutine from main
and do everything there. Not doing so may lead to unexpected blocking behaviors (e.g. TcpListener::accept
blocks instead of yielding out if the TcpListener
object is constructed outside a coroutine and does not have the nonblocking
flag bit set. )
Performance
The performance of rust-coroutines is similar to most other (stackful/stackless) coroutine libraries in Rust, including may and futures (with tokio).
With a modified version of corona benchmarks:
test async ... bench: 31,028,198 ns/iter (+/- 2,738,679)
test async_cpupool ... bench: 29,813,426 ns/iter (+/- 14,394,817)
test async_cpupool_cpus ... bench: 31,497,284 ns/iter (+/- 7,074,686)
test async_cpupool_many ... bench: 30,785,178 ns/iter (+/- 7,982,811)
test async_cpus ... bench: 25,660,063 ns/iter (+/- 204,811,182)
test async_many ... bench: 24,347,067 ns/iter (+/- 1,980,457)
test corona ... bench: 38,565,408 ns/iter (+/- 1,717,367)
test corona_blocking_wrapper ... bench: 38,856,394 ns/iter (+/- 2,242,614)
test corona_blocking_wrapper_cpus ... bench: 29,147,673 ns/iter (+/- 89,727,410)
test corona_blocking_wrapper_many ... bench: 28,384,512 ns/iter (+/- 2,480,628)
test corona_cpus ... bench: 28,862,550 ns/iter (+/- 2,197,395)
test corona_many ... bench: 28,509,142 ns/iter (+/- 2,767,814)
test coroutines ... bench: 26,673,276 ns/iter (+/- 3,232,604)
test coroutines_cpus ... bench: 27,194,849 ns/iter (+/- 2,787,879)
test coroutines_many ... bench: 26,744,986 ns/iter (+/- 3,161,595)
test futures ... bench: 30,695,434 ns/iter (+/- 2,447,777)
test futures_cpupool ... bench: 29,626,141 ns/iter (+/- 3,090,787)
test futures_cpupool_cpus ... bench: 30,573,408 ns/iter (+/- 3,549,979)
test futures_cpupool_many ... bench: 30,276,154 ns/iter (+/- 4,121,736)
test futures_cpus ... bench: 24,814,705 ns/iter (+/- 2,107,849)
test futures_many ... bench: 24,750,719 ns/iter (+/- 1,875,699)
test may ... bench: 27,553,510 ns/iter (+/- 3,164,095)
test may_cpus ... bench: 28,315,233 ns/iter (+/- 2,892,440)
test may_many ... bench: 27,812,071 ns/iter (+/- 2,814,072)
test threads ... bench: 42,440,270 ns/iter (+/- 4,181,660)
test threads_cpus ... bench: 40,202,798 ns/iter (+/- 6,837,590)
test threads_many ... bench: 40,259,324 ns/iter (+/- 864,916,488)
Known issues
- Entering asynchronous operations while holding a Mutex or RwLock on the current coroutine is currently a very expensive operation because the current coroutine has to be pinned to an execution unit and cannot yield out while waiting for the lock to be available. However, doing I/O while holding a lock is a slow operation itself so performance penalty with coroutine pinning / blocking wait will not be noticed most of the time.
- Thread local storage (TLS) is not yet patched and might be dangerous to use across an async_enter/exit boundary. It needs some work to figure out how to do the hookings correctly.
- Only x86-64 Linux is supported at the moment. Porting to other Unix-like systems and i386 & ARM architectures is planned.