Expand description

A global, auto-scaling, preemptive scheduler using work-balancing.

What? Another executor?

smolscale is a work-balancing executor based on [async-task], designed to be a drop-in replacement to smol and async-global-executor. It is designed based on the thesis that work-stealing, the usual approach in async executors like async-executor and tokio, is not the right algorithm for scheduling huge amounts of tiny, interdependent work units, which are what message-passing futures end up being. Instead, smolscale uses work-balancing, an approach also found in Erlang, where a global “balancer” thread periodically balances work between workers, but workers do not attempt to steal tasks from each other. This avoids the extremely frequent stealing attempts that work-stealing schedulers generate when applied to async tasks.

smolscale’s approach especially excels in two circumstances:

  • When the CPU cores are not fully loaded: Traditional work stealing optimizes for the case where most workers have work to do, which is only the case in fully-loaded scenarios. When workers often wake up and go back to sleep, however, a lot of CPU time is wasted stealing work. smolscale will instead drastically reduce CPU usage in these circumstances — a async-executor app that takes 80% of CPU time may now take only 20%. Although this does not improve fully-loaded throughput, it significantly reduces power consumption and does increase throughput in circumstances where multiple thread pools compete for CPU time.
  • When a lot of message-passing is happening: Message-passing workloads often involve tasks quickly waking up and going back to sleep. In a work-stealing scheduler, this again floods the scheduler with stealing requests. smolscale can significantly improve throughput, especially compared to executors like async-executor that do not special-case message passing.

Furthermore, smolscale has a preemptive thread pool that ensures that tasks cannot block other tasks no matter what. This means that you can do things like run expensive computations or even do blocking I/O within a task without worrying about causing deadlocks. Even with “traditional” tasks that do not block, this approach can reduce worst-case latency. Preemption is heavily inspired by Stjepan Glavina’s previous work on async-std.

smolscale also experimentally includes Nursery, a helper for structured concurrency on the smolscale global executor.

Show me the benchmarks!

Right now, smolscale uses a very naive implementation (for example, stealable local queues are implemented as SPSC queues with a spinlock on the consumer side, and worker parking is done naively through event-listener), and its performance is expected to drastically increase. However, at most tasks it is already much faster than async-global-executor (the de-facto standard “non-Tokio-world” executor, which powers async-std), sometimes an order of magnitude faster. Here are some unscientific benchmark results; percentages are compared to async-global-executor:

spawn_one               time:   [105.08 ns 105.21 ns 105.36 ns]                      
                        change: [-98.570% -98.549% -98.530%] (p = 0.00 < 0.05)
                        Performance has improved.

spawn_many              time:   [3.0585 ms 3.0598 ms 3.0613 ms]                        
                        change: [-87.576% -87.291% -86.948%] (p = 0.00 < 0.05)
                        Performance has improved.

yield_now               time:   [4.1676 ms 4.1917 ms 4.2166 ms]                       
                        change: [-50.455% -49.994% -49.412%] (p = 0.00 < 0.05)
                        Performance has improved.
ping_pong               time:   [8.5389 ms 8.6990 ms 8.8525 ms]                      
                        change: [+12.264% +14.548% +16.917%] (p = 0.00 < 0.05)
                        Performance has regressed.

Benchmarking spawn_executors_recursively:
spawn_executors_recursively                                                                            
                        time:   [180.26 ms 180.40 ms 180.56 ms]
                        change: [+497.14% +500.08% +502.97%] (p = 0.00 < 0.05)
                        Performance has regressed.

context_switch_quiet    time:   [100.67 us 102.05 us 103.07 us]                                 
                        change: [-42.789% -41.170% -39.490%] (p = 0.00 < 0.05)
                        Performance has improved.

context_switch_busy     time:   [8.7637 ms 8.9012 ms 9.0561 ms]                                
                        change: [+3.3147% +5.5719% +7.6684%] (p = 0.00 < 0.05)
                        Performance has regressed.

Structs

A self-contained executor context.

A nursery represents a dynamic scope in which tasks can be spawned. It is used for structured concurrency, and it ensures that tasks spawned within the nursery terminate before the nursery falls out of scope.

Enums

The strategy used to recover from errors that a task returns.

Functions

Returns the current number of active tasks.

Spawns a future onto the global executor and immediately blocks on it.

Irrevocably puts smolscale into single-threaded mode.

Returns the number of running threads.

Spawns a task onto the lazily-initialized global executor.