Crate moonpool_sim

Expand description

§Moonpool Simulation Framework

Deterministic simulation for testing distributed systems, inspired by FoundationDB’s simulation testing.

§Why Deterministic Simulation?

FoundationDB’s insight: bugs hide in error paths. Production code rarely exercises timeout handlers, retry logic, or failure recovery. Deterministic simulation with fault injection finds these bugs before production does.

Key properties:

Reproducible: Same seed produces identical execution
Comprehensive: Tests all failure modes (network, timing, corruption)
Fast: Logical time skips idle periods

§Core Components

SimWorld: The simulation runtime managing events and time
SimulationBuilder: Configure and run simulations
chaos: Fault injection (buggify, 14 assertion macros, invariants)
storage: Storage simulation with fault injection
Multiverse exploration via moonpool-explorer (re-exported as ExplorationConfig, AdaptiveConfig)

§Quick Start

use moonpool_sim::{SimulationBuilder, WorkloadTopology};

SimulationBuilder::new()
    .topology(WorkloadTopology::ClientServer { clients: 2, servers: 1 })
    .run(|ctx| async move {
        // Your distributed system workload
    });

§Fault Injection Overview

See chaos module for detailed documentation.

Mechanism	Default	What it tests
TCP latencies	1-11ms connect	Async scheduling
Random connection close	0.001%	Reconnection, redelivery
Bit flip corruption	0.01%	Checksum validation
Connect failure	50% probabilistic	Timeout handling, retries
Clock drift	100ms max	Leases, heartbeats
Buggified delays	25%	Race conditions
Partial writes	1000 bytes max	Message fragmentation
Packet loss	disabled	At-least-once delivery
Network partitions	disabled	Split-brain handling
Storage corruption	configurable	Checksum validation, recovery
Torn writes	configurable	Write atomicity, journaling
Sync failures	configurable	Durability guarantees

§Multi-Seed Testing

Tests run across multiple seeds to explore the state space:

SimulationBuilder::new()
    .run_count(IterationControl::UntilAllSometimesReached(1000))
    .run(workload);

Debugging a failing seed:

SimulationBuilder::new()
    .set_seed(failing_seed)
    .run_count(IterationControl::FixedCount(1))
    .run(workload);

§Coverage-Preserving Multi-Seed Exploration

When exploration is enabled, multiple seeds share coverage context. The explored map (coverage bitmap union) and assertion watermarks are preserved between seeds so each subsequent seed focuses energy on genuinely new branches rather than re-treading already-discovered paths.

A warm start mechanism reduces wasted forks: on seeds after the first, marks whose first probe batch finds zero new coverage bits stop after warm_min_timelines forks instead of the full min_timelines.

 Seed 1 (cold start)           Seed 2 (warm start)          Seed 3 (warm start)
 energy: 400K                  energy: 400K                 energy: 400K

 root ─┬─ mark A ──> 400       root ─┬─ mark A ──> 30      root ─┬─ mark A ──> 30
       │  (new bits!) forks           │  (barren!) skip           │  (barren!) skip
       │                              │                           │
       ├─ mark B ──> 400              ├─ mark B ──> 30            ├─ mark B ──> 30
       │  (new bits!) forks           │  (barren!) skip           │  (barren!) skip
       │                              │                           │
       └─ mark C ──> 400              ├─ mark C ──> 30            ├─ mark C ──> 30
          (new bits!) forks           │  (barren!) skip           │  (barren!) skip
                                      │                           │
                                      └─ mark D ──> 400          └─ mark E ──> 400
                                         (NEW bits!) forks           (NEW bits!) forks
                       │                               │                            │
      ┌────────────────┘              ┌────────────────┘           ┌────────────────┘
      v                               v                            v
 ┌──────────┐  ──preserved──>  ┌──────────┐  ──preserved──>  ┌──────────┐
 │ explored │  coverage map    │ explored │  coverage map    │ explored │
 │ map:     │  + watermarks    │ map:     │  + watermarks    │ map:     │
 │ A,B,C    │                  │ A,B,C,D  │                  │ A,B,C,D,E│
 └──────────┘                  └──────────┘                  └──────────┘

 Total: each seed spends most energy on NEW discoveries.
 Warm marks (A,B,C on seed 2) exit after warm_min_timelines (30)
 instead of min_timelines (400), saving ~95% energy per barren mark.

SimulationBuilder::new()
    .set_iterations(3)  // 3 root seeds with coverage forwarding
    .enable_exploration(ExplorationConfig {
        max_depth: 120,
        timelines_per_split: 4,
        global_energy: 400_000,  // per-seed energy budget
        adaptive: Some(AdaptiveConfig {
            batch_size: 30,
            min_timelines: 400,
            max_timelines: 1_000,
            per_mark_energy: 10_000,
            warm_min_timelines: Some(30),  // quick skip for barren warm marks
        }),
        parallelism: Some(Parallelism::HalfCores),
    })
    .workload(my_workload);

Re-exports§

pub use sim::ConnectionStateChange;
pub use sim::Event;
pub use sim::EventQueue;
pub use sim::NetworkOperation;
pub use sim::ScheduledEvent;
pub use sim::SimWorld;
pub use sim::SleepFuture;
pub use sim::StorageOperation;
pub use sim::WeakSimWorld;
pub use sim::clear_rng_breakpoints;
pub use sim::get_current_sim_seed;
pub use sim::get_rng_call_count;
pub use sim::reset_rng_call_count;
pub use sim::reset_sim_rng;
pub use sim::set_rng_breakpoints;
pub use sim::set_sim_seed;
pub use sim::sim_random;
pub use sim::sim_random_range;
pub use sim::sim_random_range_or_default;
pub use runner::FaultContext;
pub use runner::FaultInjector;
pub use runner::IterationControl;
pub use runner::PhaseConfig;
pub use runner::SimContext;
pub use runner::SimulationBuilder;
pub use runner::SimulationMetrics;
pub use runner::SimulationReport;
pub use runner::TokioReport;
pub use runner::TokioRunner;
pub use runner::Workload;
pub use runner::WorkloadCount;
pub use runner::WorkloadTopology;
pub use chaos::AssertionStats;
pub use chaos::Invariant;
pub use chaos::StateHandle;
pub use chaos::buggify_init;
pub use chaos::buggify_reset;
pub use chaos::get_assertion_results;
pub use chaos::has_always_violations;
pub use chaos::invariant_fn;
pub use chaos::panic_on_assertion_violations;
pub use chaos::reset_always_violations;
pub use chaos::reset_assertion_results;
pub use chaos::validate_assertion_contracts;
pub use network::ChaosConfiguration;
pub use network::ConnectFailureMode;
pub use network::NetworkConfiguration;
pub use network::SimNetworkProvider;
pub use network::sample_duration;
pub use storage::InMemoryStorage;
pub use storage::SECTOR_SIZE;
pub use storage::SectorBitSet;
pub use storage::SimStorageProvider;
pub use storage::StorageConfiguration;
pub use providers::SimProviders;
pub use providers::SimRandomProvider;
pub use providers::SimTimeProvider;
pub use runner::report::BugRecipe;
pub use runner::report::ExplorationReport;

Modules§

chaos: Chaos testing infrastructure for deterministic fault injection. Chaos testing infrastructure for deterministic fault injection.
network: Network simulation and configuration. Network simulation and configuration.
providers: Provider implementations for simulation. Provider implementations for simulation.
runner: Simulation runner and orchestration framework. Simulation runner and orchestration framework.
sim: Core simulation engine for deterministic testing. Core simulation engine for deterministic testing.
storage: Storage simulation and configuration. Storage simulation and configuration.

Macros§

assert_always: Assert that a condition is always true.
assert_always_greater_than: Assert that val > threshold always holds.
assert_always_greater_than_or_equal_to: Assert that val >= threshold always holds.
assert_always_less_than: Assert that val < threshold always holds.
assert_always_less_than_or_equal_to: Assert that val <= threshold always holds.
assert_always_or_unreachable: Assert that a condition is always true when reached, but the code path need not be reached. Does not panic if never evaluated.
assert_reachable: Assert that a code path is reachable (should be reached at least once).
assert_sometimes: Assert a condition that should sometimes be true, tracking stats and triggering exploration.
assert_sometimes_all: Compound boolean assertion: all named bools should sometimes be true simultaneously.
assert_sometimes_each: Per-value bucketed sometimes assertion with optional quality watermarks.
assert_sometimes_greater_than: Assert that val > threshold sometimes holds. Forks on watermark improvement.
assert_sometimes_greater_than_or_equal_to: Assert that val >= threshold sometimes holds. Forks on watermark improvement.
assert_sometimes_less_than: Assert that val < threshold sometimes holds. Forks on watermark improvement.
assert_sometimes_less_than_or_equal_to: Assert that val <= threshold sometimes holds. Forks on watermark improvement.
assert_unreachable: Assert that a code path should never be reached.
buggify: Buggify with 25% probability
buggify_with_prob: Buggify with custom probability

Structs§

AdaptiveConfig: Configuration for adaptive batch-based timeline splitting.
Endpoint: Endpoint = Address + Token.
ExplorationConfig: Configuration for exploration.
JsonCodec: JSON codec using serde_json.
NetworkAddress: Network address (IPv4/IPv6 + port + flags).
TokioNetworkProvider: Real Tokio networking implementation.
TokioProviders: Production providers using Tokio runtime.
TokioTaskProvider: Tokio-based task provider using spawn_local for single-threaded execution.
TokioTimeProvider: Real time provider using Tokio’s time facilities.
UID: 128-bit unique identifier.

Enums§

AssertCmp: Comparison operator for numeric assertions.
AssertKind: The kind of assertion being tracked.
CodecError: Error type for codec operations.
NetworkAddressParseError: Error parsing a network address from string.
Parallelism: Controls how many children can run in parallel during splitting.
SimulationError: Errors that can occur during simulation operations.
TimeError: Errors that can occur during time operations.
WellKnownToken: Well-known endpoint tokens.

Constants§

WELL_KNOWN_RESERVED_COUNT: Number of reserved well-known token slots.

Traits§

MessageCodec: Pluggable message serialization format.
NetworkProvider: Provider trait for creating network connections and listeners.
Providers: Bundle of all provider types for a runtime environment.
RandomProvider: Provider trait for random number generation.
TaskProvider: Provider for spawning local tasks in single-threaded context.
TcpListenerTrait: Trait for TCP listeners that can accept connections.
TimeProvider: Provider trait for time operations.

Functions§

format_timeline: Format a recipe as a human-readable timeline string.
parse_timeline: Parse a timeline string back into a recipe.

Type Aliases§

SimulationResult: A type alias for Result<T, SimulationError>.