Skip to main content

Crate moonpool_sim

Crate moonpool_sim 

Source
Expand description

§Moonpool Simulation Framework

Deterministic simulation for testing distributed systems, inspired by FoundationDB’s simulation testing.

§Why Deterministic Simulation?

FoundationDB’s insight: bugs hide in error paths. Production code rarely exercises timeout handlers, retry logic, or failure recovery. Deterministic simulation with fault injection finds these bugs before production does.

Key properties:

  • Reproducible: Same seed produces identical execution
  • Comprehensive: Tests all failure modes (network, timing, corruption)
  • Fast: Logical time skips idle periods

§Core Components

§Quick Start

use moonpool_sim::{SimulationBuilder, WorkloadTopology};

SimulationBuilder::new()
    .topology(WorkloadTopology::ClientServer { clients: 2, servers: 1 })
    .run(|ctx| async move {
        // Your distributed system workload
    });

§Fault Injection Overview

See chaos module for detailed documentation.

MechanismDefaultWhat it tests
TCP latencies1-11ms connectAsync scheduling
Random connection close0.001%Reconnection, redelivery
Bit flip corruption0.01%Checksum validation
Connect failure50% probabilisticTimeout handling, retries
Clock drift100ms maxLeases, heartbeats
Buggified delays25%Race conditions
Partial writes1000 bytes maxMessage fragmentation
Packet lossdisabledAt-least-once delivery
Network partitionsdisabledSplit-brain handling
Storage corruptionconfigurableChecksum validation, recovery
Torn writesconfigurableWrite atomicity, journaling
Sync failuresconfigurableDurability guarantees

§Multi-Seed Testing

Tests run across multiple seeds to explore the state space:

SimulationBuilder::new()
    .run_count(IterationControl::UntilAllSometimesReached(1000))
    .run(workload);

Debugging a failing seed:

SimulationBuilder::new()
    .set_seed(failing_seed)
    .run_count(IterationControl::FixedCount(1))
    .run(workload);

§Coverage-Preserving Multi-Seed Exploration

When exploration is enabled, multiple seeds share coverage context. The explored map (coverage bitmap union) and assertion watermarks are preserved between seeds so each subsequent seed focuses energy on genuinely new branches rather than re-treading already-discovered paths.

A warm start mechanism reduces wasted forks: on seeds after the first, marks whose first probe batch finds zero new coverage bits stop after warm_min_timelines forks instead of the full min_timelines.

 Seed 1 (cold start)           Seed 2 (warm start)          Seed 3 (warm start)
 energy: 400K                  energy: 400K                 energy: 400K

 root ─┬─ mark A ──> 400       root ─┬─ mark A ──> 30      root ─┬─ mark A ──> 30
       │  (new bits!) forks           │  (barren!) skip           │  (barren!) skip
       │                              │                           │
       ├─ mark B ──> 400              ├─ mark B ──> 30            ├─ mark B ──> 30
       │  (new bits!) forks           │  (barren!) skip           │  (barren!) skip
       │                              │                           │
       └─ mark C ──> 400              ├─ mark C ──> 30            ├─ mark C ──> 30
          (new bits!) forks           │  (barren!) skip           │  (barren!) skip
                                      │                           │
                                      └─ mark D ──> 400          └─ mark E ──> 400
                                         (NEW bits!) forks           (NEW bits!) forks
                       │                               │                            │
      ┌────────────────┘              ┌────────────────┘           ┌────────────────┘
      v                               v                            v
 ┌──────────┐  ──preserved──>  ┌──────────┐  ──preserved──>  ┌──────────┐
 │ explored │  coverage map    │ explored │  coverage map    │ explored │
 │ map:     │  + watermarks    │ map:     │  + watermarks    │ map:     │
 │ A,B,C    │                  │ A,B,C,D  │                  │ A,B,C,D,E│
 └──────────┘                  └──────────┘                  └──────────┘

 Total: each seed spends most energy on NEW discoveries.
 Warm marks (A,B,C on seed 2) exit after warm_min_timelines (30)
 instead of min_timelines (400), saving ~95% energy per barren mark.
SimulationBuilder::new()
    .set_iterations(3)  // 3 root seeds with coverage forwarding
    .enable_exploration(ExplorationConfig {
        max_depth: 120,
        timelines_per_split: 4,
        global_energy: 400_000,  // per-seed energy budget
        adaptive: Some(AdaptiveConfig {
            batch_size: 30,
            min_timelines: 400,
            max_timelines: 1_000,
            per_mark_energy: 10_000,
            warm_min_timelines: Some(30),  // quick skip for barren warm marks
        }),
        parallelism: Some(Parallelism::HalfCores),
    })
    .workload(my_workload);

Re-exports§

pub use sim::ConnectionStateChange;
pub use sim::Event;
pub use sim::EventQueue;
pub use sim::NetworkOperation;
pub use sim::ScheduledEvent;
pub use sim::SimWorld;
pub use sim::SleepFuture;
pub use sim::StorageOperation;
pub use sim::WeakSimWorld;
pub use sim::clear_rng_breakpoints;
pub use sim::get_current_sim_seed;
pub use sim::get_rng_call_count;
pub use sim::reset_rng_call_count;
pub use sim::reset_sim_rng;
pub use sim::set_rng_breakpoints;
pub use sim::set_sim_seed;
pub use sim::sim_random;
pub use sim::sim_random_range;
pub use sim::sim_random_range_or_default;
pub use runner::FaultContext;
pub use runner::FaultInjector;
pub use runner::IterationControl;
pub use runner::PhaseConfig;
pub use runner::SimContext;
pub use runner::SimulationBuilder;
pub use runner::SimulationMetrics;
pub use runner::SimulationReport;
pub use runner::TokioReport;
pub use runner::TokioRunner;
pub use runner::Workload;
pub use runner::WorkloadCount;
pub use runner::WorkloadTopology;
pub use chaos::AssertionStats;
pub use chaos::Invariant;
pub use chaos::StateHandle;
pub use chaos::buggify_init;
pub use chaos::buggify_reset;
pub use chaos::get_assertion_results;
pub use chaos::has_always_violations;
pub use chaos::invariant_fn;
pub use chaos::panic_on_assertion_violations;
pub use chaos::reset_always_violations;
pub use chaos::reset_assertion_results;
pub use chaos::validate_assertion_contracts;
pub use network::ChaosConfiguration;
pub use network::ConnectFailureMode;
pub use network::NetworkConfiguration;
pub use network::SimNetworkProvider;
pub use network::sample_duration;
pub use storage::InMemoryStorage;
pub use storage::SECTOR_SIZE;
pub use storage::SectorBitSet;
pub use storage::SimStorageProvider;
pub use storage::StorageConfiguration;
pub use providers::SimProviders;
pub use providers::SimRandomProvider;
pub use providers::SimTimeProvider;
pub use runner::report::BugRecipe;
pub use runner::report::ExplorationReport;

Modules§

chaos
Chaos testing infrastructure for deterministic fault injection. Chaos testing infrastructure for deterministic fault injection.
network
Network simulation and configuration. Network simulation and configuration.
providers
Provider implementations for simulation. Provider implementations for simulation.
runner
Simulation runner and orchestration framework. Simulation runner and orchestration framework.
sim
Core simulation engine for deterministic testing. Core simulation engine for deterministic testing.
storage
Storage simulation and configuration. Storage simulation and configuration.

Macros§

assert_always
Assert that a condition is always true.
assert_always_greater_than
Assert that val > threshold always holds.
assert_always_greater_than_or_equal_to
Assert that val >= threshold always holds.
assert_always_less_than
Assert that val < threshold always holds.
assert_always_less_than_or_equal_to
Assert that val <= threshold always holds.
assert_always_or_unreachable
Assert that a condition is always true when reached, but the code path need not be reached. Does not panic if never evaluated.
assert_reachable
Assert that a code path is reachable (should be reached at least once).
assert_sometimes
Assert a condition that should sometimes be true, tracking stats and triggering exploration.
assert_sometimes_all
Compound boolean assertion: all named bools should sometimes be true simultaneously.
assert_sometimes_each
Per-value bucketed sometimes assertion with optional quality watermarks.
assert_sometimes_greater_than
Assert that val > threshold sometimes holds. Forks on watermark improvement.
assert_sometimes_greater_than_or_equal_to
Assert that val >= threshold sometimes holds. Forks on watermark improvement.
assert_sometimes_less_than
Assert that val < threshold sometimes holds. Forks on watermark improvement.
assert_sometimes_less_than_or_equal_to
Assert that val <= threshold sometimes holds. Forks on watermark improvement.
assert_unreachable
Assert that a code path should never be reached.
buggify
Buggify with 25% probability
buggify_with_prob
Buggify with custom probability

Structs§

AdaptiveConfig
Configuration for adaptive batch-based timeline splitting.
Endpoint
Endpoint = Address + Token.
ExplorationConfig
Configuration for exploration.
JsonCodec
JSON codec using serde_json.
NetworkAddress
Network address (IPv4/IPv6 + port + flags).
TokioNetworkProvider
Real Tokio networking implementation.
TokioProviders
Production providers using Tokio runtime.
TokioTaskProvider
Tokio-based task provider using spawn_local for single-threaded execution.
TokioTimeProvider
Real time provider using Tokio’s time facilities.
UID
128-bit unique identifier.

Enums§

AssertCmp
Comparison operator for numeric assertions.
AssertKind
The kind of assertion being tracked.
CodecError
Error type for codec operations.
NetworkAddressParseError
Error parsing a network address from string.
Parallelism
Controls how many children can run in parallel during splitting.
SimulationError
Errors that can occur during simulation operations.
TimeError
Errors that can occur during time operations.
WellKnownToken
Well-known endpoint tokens.

Constants§

WELL_KNOWN_RESERVED_COUNT
Number of reserved well-known token slots.

Traits§

MessageCodec
Pluggable message serialization format.
NetworkProvider
Provider trait for creating network connections and listeners.
Providers
Bundle of all provider types for a runtime environment.
RandomProvider
Provider trait for random number generation.
TaskProvider
Provider for spawning local tasks in single-threaded context.
TcpListenerTrait
Trait for TCP listeners that can accept connections.
TimeProvider
Provider trait for time operations.

Functions§

format_timeline
Format a recipe as a human-readable timeline string.
parse_timeline
Parse a timeline string back into a recipe.

Type Aliases§

SimulationResult
A type alias for Result<T, SimulationError>.