Module chaos

Module chaos 

Source
Expand description

Chaos testing infrastructure for deterministic fault injection. Chaos testing infrastructure for deterministic fault injection.

This module implements FoundationDB’s buggify approach for finding bugs through comprehensive chaos testing.

§Philosophy

FoundationDB’s insight: bugs hide in error paths. Production code rarely exercises timeout handlers, retry logic, or failure recovery. Chaos testing finds these bugs before production does.

Key principles:

  • Deterministic: Same seed produces same faults for reproducible debugging
  • Comprehensive: Test all failure modes (network, timing, corruption)
  • Low probability: Faults rare enough for progress, frequent enough to find bugs

§Components

ComponentPurpose
buggify!Probabilistic fault injection at code locations
always_assert!Invariants that must never fail
sometimes_assert!Behaviors that should occur under chaos
InvariantCheckCross-actor properties validated after events

§The Buggify System

Each buggify location is randomly activated once per simulation run, then fires probabilistically on each call.

// 25% probability when activated
if buggify!() {
    return Err(SimulatedFailure);
}

// Custom probability
if buggify_with_prob!(0.02) {
    corrupt_data();
}

§Two-Phase Activation

  1. Activation (once per location per seed): random() < activation_prob
  2. Firing (each call): If active, random() < firing_prob

This ensures consistent behavior within a run while varying which locations are active across different seeds.

ParameterDefaultFDB Reference
activation_prob25%Buggify.h:79-88
firing_prob25%P_GENERAL_BUGGIFIED_SECTION_FIRES

§Fault Injection Mechanisms

§Network Faults

MechanismDefaultWhat it tests
Random connection close0.001%Reconnection, message redelivery
Bit flip corruption0.01%CRC32C checksum validation
Connect failure50% probabilisticTimeout handling, retries
Partial/short writes1000 bytes maxMessage fragmentation
Packet lossdisabledAt-least-once delivery
Network partitionsdisabledSplit-brain handling
Half-open connectionsmanualPeer crash detection

§Timing Faults

MechanismDefaultWhat it tests
TCP operation latencies1-11ms connectAsync scheduling
Clock drift100ms maxLeases, heartbeats, leader election
Buggified delays25% probabilityRace conditions
Per-connection asymmetric delaysoptionalSatellite links, geographic distance

§Assertions

§always_assert!

Guards invariants that must never fail:

always_assert!(
    sent_count >= received_count,
    "message_ordering",
    "received more than sent: {} > {}", received_count, sent_count
);

§sometimes_assert!

Validates that error paths do execute under chaos:

if buggify!() {
    sometimes_assert!("timeout_triggered");
    return Err(Timeout);
}

Multi-seed testing with UntilAllSometimesReached(1000) ensures all sometimes_assert! statements fire across the seed space.

§Strategic Placement

Place buggify!() calls at:

  • Error handling paths
  • Timeout boundaries
  • Retry logic entry points
  • Resource limit checks
  • State transitions

§Configuration

use moonpool_sim::{ChaosConfiguration, NetworkConfiguration};

// Full chaos (default)
let chaos = ChaosConfiguration::default();

// No chaos (fast tests)
let chaos = ChaosConfiguration::disabled();

// Randomized per seed
let chaos = ChaosConfiguration::random_for_seed();

Re-exports§

pub use assertions::AssertionStats;
pub use assertions::get_assertion_results;
pub use assertions::panic_on_assertion_violations;
pub use assertions::record_assertion;
pub use assertions::reset_assertion_results;
pub use assertions::validate_assertion_contracts;
pub use buggify::buggify_init;
pub use buggify::buggify_internal;
pub use buggify::buggify_reset;
pub use invariants::InvariantCheck;
pub use state_registry::StateRegistry;

Modules§

assertions
Assertion macros and result tracking for simulation testing.
buggify
Deterministic fault injection following FoundationDB’s buggify approach.
invariants
Cross-workload invariant checking for simulation testing.
state_registry
Simple state registry for simulation invariant checking.