Expand description
A deterministic simulation framework for distributed systems using async.
Refault takes your existing async Rust systems and runs one or multiple instances of it in a deterministic simulation. Determinism allows you to reliably reproduce test failures.
SimBuilder::new().run(||async move{
let t1 = Instant::now();
sleep(Duration::from_nanos(5)).await;
assert_eq!(t1.elapsed().as_nanos(),5);
}).unwrap()
§How it works
Refault aims to eliminate sources of non-determinism from your (distributed) system:
- Random number generation is intercepted to return deterministic PRNG sequences, including:.
- Timekeeping functions are intercepted to return a simulated time that progresses deterministically:
- std::time::Instant
- std::time::SystemTime
- the
clock_gettime
function from libc
- Custom Async Executor:
- Single threaded deterministic scheduling
- Support for multiple nodes to simulate distributed systems
- Fast forward simulated time when all tasks are sleeping
§Utilities
Refault ships with a few useful building blocks to help you port your system to the refault simulation.
§Simulated Packet network
Net implements a simulated packet network. Network abnormalities can be simulated using a user provided SendFunction.
§Send type wrappers
Many refault types are often not Send or Sync, as synchronization is not needed in the single-threaded runtime. For compatibility with the Rust async ecosystem, provides wrappers in send_bind that make the wrapped value Send and Sync and ensure it is not accidentally moved out of the simulation or between nodes.
§Id generation
You can generate unique Ids for use in your simulation code using id.
§Non-Determinism detection:
If non-determinism is accidentally introduced, the cause can be very hard to debug. By default, refault runs your simulation multiple times and compares event traces to detect non-determinism. It will tell you at which point the executions diverged. To speed up tests, you may opt out of this via with_determinsim_check.
§The Simulator trait
users can register Simulators. These are notified about various events, like the creation, stopping, and starting of nodes. They can also be used to hold simulation-scoped global state, for example to simulate some external system that your system interacts with.
§Third Party integration
Refault integrates with various third party crates to reduce the amount of code you need to cahnge to get your system to run in the simulation:
- agnostic-lite: An abstraction layer for any async runtimes. Refault implements RuntimeLite.
- [tower]: Refault comes with an RPC implementation based on the [tower] abstraction
- [serde]: Refault types imlement Serialize and Deserialize where it makes sense
§Caveats
§Configure getrandom
Many crates, most notably rand, use getrandom to fetch random numbers from the OS.
If you are using rand (or getrandom), you should explicitly configure the getrandom backend so that refault can intercept it.
You can do so by adding this to your .cargo/config.toml
:
rustflags = ['--cfg', 'getrandom_backend="linux_getrandom"']
Refault may work out of the box on your platform if getrandom
chooses this backend by default, but this may break on other platforms or when getrandom
updates.
§Avoid Spawning your own threads
Refault uses thread local storage to manage its state. Many refault functions will panic if invoked from a different thread. Moreover, multithreading has the potential to introduce all sorts of non-determinsm. If you have a reasonable large number of tests or are fuzz-testing, you can get plenty of parallelism by running one test per thread. If you have only a small number of tests, running each one on a single thread is probably fast enough.
§Avoid Global variables
The state of global variables at the begin of a simulation has the potential to introduce non-determinism. Refault cannot ensure that user-defined global variables always start out in the same state when the simulation starts. If you want to keep global state, consider using Simulators.
Refault runs each simulation in a fresh thread, so using thread local variables might be fine, depending on how you initialize them. Facilities of the standrad library such as std::thread_local should work fine. However, platform specific mechanisms may perform initialization before refault sets up the simulation context on the thread, so they might bypass the interception of OS facilities.
§Cargo features
name | description |
---|---|
serde | derive Serialize and Deserialize on refault types where applicable. |
tower | provide RPC functionality via Net based on [tower] |
agnostic-lite | provide a RuntimeLite implementation |
emit-tracing | emit tracing data via [tracing]. This is fairly noisy but sometimes helpful for debugging |
send-bind | Provide wrappers for ensuring data is not moved inappropriately out of the simulation or between nodes and to make types Send and Sync |
Modules§
- executor
- Functions for handling tasks.
- id
- Unique Id generation.
- net
- A simulated packet network.
- send_
bind - Prevent Values from accidentally moving between nodes or out of the simulation.
- sim_
builder - Running simulations.
- simulator
- Simulation-scoped singletons.
- time
- Waiting for the passage of simulated time.
Structs§
- NodeId
- A unique identifier for a node within a simulation.
Functions§
- is_
in_ simulation - Returns true if called from within a simulation.