Expand description
§Moonpool Simulation Framework
Deterministic simulation for testing distributed systems, inspired by FoundationDB’s simulation testing.
§Why Deterministic Simulation?
FoundationDB’s insight: bugs hide in error paths. Production code rarely exercises timeout handlers, retry logic, or failure recovery. Deterministic simulation with fault injection finds these bugs before production does.
Key properties:
- Reproducible: Same seed produces identical execution
- Comprehensive: Tests all failure modes (network, timing, corruption)
- Fast: Logical time skips idle periods
§Core Components
SimWorld: The simulation runtime managing events and timeSimulationBuilder: Configure and run simulationschaos: Fault injection (buggify, 14 assertion macros, invariants)storage: Storage simulation with fault injection- Multiverse exploration via
moonpool-explorer(re-exported asExplorationConfig,AdaptiveConfig)
§Quick Start
use moonpool_sim::{SimulationBuilder, WorkloadTopology};
SimulationBuilder::new()
.topology(WorkloadTopology::ClientServer { clients: 2, servers: 1 })
.run(|ctx| async move {
// Your distributed system workload
});§Fault Injection Overview
See chaos module for detailed documentation.
| Mechanism | Default | What it tests |
|---|---|---|
| TCP latencies | 1-11ms connect | Async scheduling |
| Random connection close | 0.001% | Reconnection, redelivery |
| Bit flip corruption | 0.01% | Checksum validation |
| Connect failure | 50% probabilistic | Timeout handling, retries |
| Clock drift | 100ms max | Leases, heartbeats |
| Buggified delays | 25% | Race conditions |
| Partial writes | 1000 bytes max | Message fragmentation |
| Packet loss | disabled | At-least-once delivery |
| Network partitions | disabled | Split-brain handling |
| Storage corruption | configurable | Checksum validation, recovery |
| Torn writes | configurable | Write atomicity, journaling |
| Sync failures | configurable | Durability guarantees |
§Multi-Seed Testing
Tests run across multiple seeds to explore the state space:
SimulationBuilder::new()
.run_count(IterationControl::UntilAllSometimesReached(1000))
.run(workload);Debugging a failing seed:
SimulationBuilder::new()
.set_seed(failing_seed)
.run_count(IterationControl::FixedCount(1))
.run(workload);§Coverage-Preserving Multi-Seed Exploration
When exploration is enabled, multiple seeds share coverage context. The explored map (coverage bitmap union) and assertion watermarks are preserved between seeds so each subsequent seed focuses energy on genuinely new branches rather than re-treading already-discovered paths.
A warm start mechanism reduces wasted forks: on seeds after the first,
marks whose first probe batch finds zero new coverage bits stop after
warm_min_timelines forks instead of the full min_timelines.
Seed 1 (cold start) Seed 2 (warm start) Seed 3 (warm start)
energy: 400K energy: 400K energy: 400K
root ─┬─ mark A ──> 400 root ─┬─ mark A ──> 30 root ─┬─ mark A ──> 30
│ (new bits!) forks │ (barren!) skip │ (barren!) skip
│ │ │
├─ mark B ──> 400 ├─ mark B ──> 30 ├─ mark B ──> 30
│ (new bits!) forks │ (barren!) skip │ (barren!) skip
│ │ │
└─ mark C ──> 400 ├─ mark C ──> 30 ├─ mark C ──> 30
(new bits!) forks │ (barren!) skip │ (barren!) skip
│ │
└─ mark D ──> 400 └─ mark E ──> 400
(NEW bits!) forks (NEW bits!) forks
│ │ │
┌────────────────┘ ┌────────────────┘ ┌────────────────┘
v v v
┌──────────┐ ──preserved──> ┌──────────┐ ──preserved──> ┌──────────┐
│ explored │ coverage map │ explored │ coverage map │ explored │
│ map: │ + watermarks │ map: │ + watermarks │ map: │
│ A,B,C │ │ A,B,C,D │ │ A,B,C,D,E│
└──────────┘ └──────────┘ └──────────┘
Total: each seed spends most energy on NEW discoveries.
Warm marks (A,B,C on seed 2) exit after warm_min_timelines (30)
instead of min_timelines (400), saving ~95% energy per barren mark.SimulationBuilder::new()
.set_iterations(3) // 3 root seeds with coverage forwarding
.enable_exploration(ExplorationConfig {
max_depth: 120,
timelines_per_split: 4,
global_energy: 400_000, // per-seed energy budget
adaptive: Some(AdaptiveConfig {
batch_size: 30,
min_timelines: 400,
max_timelines: 1_000,
per_mark_energy: 10_000,
warm_min_timelines: Some(30), // quick skip for barren warm marks
}),
parallelism: Some(Parallelism::HalfCores),
})
.workload(my_workload);Re-exports§
pub use sim::ConnectionStateChange;pub use sim::Event;pub use sim::EventQueue;pub use sim::NetworkOperation;pub use sim::ScheduledEvent;pub use sim::SimWorld;pub use sim::SleepFuture;pub use sim::StorageOperation;pub use sim::WeakSimWorld;pub use sim::clear_rng_breakpoints;pub use sim::get_current_sim_seed;pub use sim::get_rng_call_count;pub use sim::reset_rng_call_count;pub use sim::reset_sim_rng;pub use sim::set_rng_breakpoints;pub use sim::set_sim_seed;pub use sim::sim_random;pub use sim::sim_random_range;pub use sim::sim_random_range_or_default;pub use runner::FaultContext;pub use runner::FaultInjector;pub use runner::IterationControl;pub use runner::PhaseConfig;pub use runner::SimContext;pub use runner::SimulationBuilder;pub use runner::SimulationMetrics;pub use runner::SimulationReport;pub use runner::TokioReport;pub use runner::TokioRunner;pub use runner::Workload;pub use runner::WorkloadCount;pub use runner::WorkloadTopology;pub use chaos::AssertionStats;pub use chaos::Invariant;pub use chaos::StateHandle;pub use chaos::buggify_init;pub use chaos::buggify_reset;pub use chaos::get_assertion_results;pub use chaos::has_always_violations;pub use chaos::invariant_fn;pub use chaos::panic_on_assertion_violations;pub use chaos::reset_always_violations;pub use chaos::reset_assertion_results;pub use chaos::validate_assertion_contracts;pub use network::ChaosConfiguration;pub use network::ConnectFailureMode;pub use network::NetworkConfiguration;pub use network::SimNetworkProvider;pub use network::sample_duration;pub use storage::InMemoryStorage;pub use storage::SECTOR_SIZE;pub use storage::SectorBitSet;pub use storage::SimStorageProvider;pub use storage::StorageConfiguration;pub use providers::SimProviders;pub use providers::SimRandomProvider;pub use providers::SimTimeProvider;pub use runner::report::BugRecipe;pub use runner::report::ExplorationReport;
Modules§
- chaos
- Chaos testing infrastructure for deterministic fault injection. Chaos testing infrastructure for deterministic fault injection.
- network
- Network simulation and configuration. Network simulation and configuration.
- providers
- Provider implementations for simulation. Provider implementations for simulation.
- runner
- Simulation runner and orchestration framework. Simulation runner and orchestration framework.
- sim
- Core simulation engine for deterministic testing. Core simulation engine for deterministic testing.
- storage
- Storage simulation and configuration. Storage simulation and configuration.
Macros§
- assert_
always - Assert that a condition is always true.
- assert_
always_ greater_ than - Assert that
val > thresholdalways holds. - assert_
always_ greater_ than_ or_ equal_ to - Assert that
val >= thresholdalways holds. - assert_
always_ less_ than - Assert that
val < thresholdalways holds. - assert_
always_ less_ than_ or_ equal_ to - Assert that
val <= thresholdalways holds. - assert_
always_ or_ unreachable - Assert that a condition is always true when reached, but the code path need not be reached. Does not panic if never evaluated.
- assert_
reachable - Assert that a code path is reachable (should be reached at least once).
- assert_
sometimes - Assert a condition that should sometimes be true, tracking stats and triggering exploration.
- assert_
sometimes_ all - Compound boolean assertion: all named bools should sometimes be true simultaneously.
- assert_
sometimes_ each - Per-value bucketed sometimes assertion with optional quality watermarks.
- assert_
sometimes_ greater_ than - Assert that
val > thresholdsometimes holds. Forks on watermark improvement. - assert_
sometimes_ greater_ than_ or_ equal_ to - Assert that
val >= thresholdsometimes holds. Forks on watermark improvement. - assert_
sometimes_ less_ than - Assert that
val < thresholdsometimes holds. Forks on watermark improvement. - assert_
sometimes_ less_ than_ or_ equal_ to - Assert that
val <= thresholdsometimes holds. Forks on watermark improvement. - assert_
unreachable - Assert that a code path should never be reached.
- buggify
- Buggify with 25% probability
- buggify_
with_ prob - Buggify with custom probability
Structs§
- Adaptive
Config - Configuration for adaptive batch-based timeline splitting.
- Endpoint
- Endpoint = Address + Token.
- Exploration
Config - Configuration for exploration.
- Json
Codec - JSON codec using serde_json.
- Network
Address - Network address (IPv4/IPv6 + port + flags).
- Tokio
Network Provider - Real Tokio networking implementation.
- Tokio
Providers - Production providers using Tokio runtime.
- Tokio
Task Provider - Tokio-based task provider using spawn_local for single-threaded execution.
- Tokio
Time Provider - Real time provider using Tokio’s time facilities.
- UID
- 128-bit unique identifier.
Enums§
- Assert
Cmp - Comparison operator for numeric assertions.
- Assert
Kind - The kind of assertion being tracked.
- Codec
Error - Error type for codec operations.
- Network
Address Parse Error - Error parsing a network address from string.
- Parallelism
- Controls how many children can run in parallel during splitting.
- Simulation
Error - Errors that can occur during simulation operations.
- Time
Error - Errors that can occur during time operations.
- Well
Known Token - Well-known endpoint tokens.
Constants§
- WELL_
KNOWN_ RESERVED_ COUNT - Number of reserved well-known token slots.
Traits§
- Message
Codec - Pluggable message serialization format.
- Network
Provider - Provider trait for creating network connections and listeners.
- Providers
- Bundle of all provider types for a runtime environment.
- Random
Provider - Provider trait for random number generation.
- Task
Provider - Provider for spawning local tasks in single-threaded context.
- TcpListener
Trait - Trait for TCP listeners that can accept connections.
- Time
Provider - Provider trait for time operations.
Functions§
- format_
timeline - Format a recipe as a human-readable timeline string.
- parse_
timeline - Parse a timeline string back into a recipe.
Type Aliases§
- Simulation
Result - A type alias for
Result<T, SimulationError>.