dscale 0.7.1

A fast & deterministic simulation framework for benchmarking and testing distributed systems
Documentation
# DScale

[![Crates.io](https://img.shields.io/crates/v/dscale)](https://crates.io/crates/dscale)
[![License](https://img.shields.io/badge/license-MIT-blue?)](LICENSE)
[![Documentation](https://img.shields.io/docsrs/dscale)](https://docs.rs/dscale)

A fast, deterministic simulation framework for testing and benchmarking distributed systems. It simulates network latency, bandwidth constraints, and process execution in an event-driven environment with support for both single-threaded and parallel execution modes.

## Usage

### 1. Define Messages

Messages must implement the `Message` trait, which allows defining a `virtual_size` for bandwidth simulation.

```rust
use dscale::*;

#[derive(Debug)]
struct MyMessage {
    data: u32,
}

impl Message for MyMessage {
    fn virtual_size(&self) -> usize {
        // Size in bytes used for bandwidth simulation.
        // Can be much bigger than real memory size to simulate heavy payloads.
        1000
    }
}

// Or (if there is no need in bandwidth)
impl Message for MyMessage {}
```

### 2. Implement Process Logic

Implement `Process` to define how your process reacts to initialization, messages, and timers.

```rust
use dscale::*;

#[derive(Default)]
struct MyProcess;

impl Process for MyProcess {
    fn on_start(&mut self, _seed: Seed) {
        schedule_timer_after(Jiffies(100));
    }

    fn on_message(&mut self, from: Pid, message: MessagePtr) {
        if let Some(msg) = message.try_as_type::<MyMessage>() {
            dscale_debug!("Received message from {from}: {}", msg.data);
        }
    }

    fn on_timer(&mut self, _id: TimerId) {
        broadcast(MyMessage { data: 42 });
    }
}
```

### 3. Run the Simulation

Use `SimulationBuilder` to configure the topology, network constraints, and start the simulation.

```rust
use dscale::*;

fn main() {
    let mut runner = SimulationBuilder::new()
        .add_pool::<MyClient>("Client", 1)
        .add_pool::<MyServer>("Server", 3)
        .default_latency(Distr::Uniform{low: Jiffies(1), high: Jiffies(5)})
        .between_pool_latency("Client", "Server", Distr::Normal {
            mean: Jiffies(10),
            std_dev: Jiffies(2),
            low: Jiffies(5),
            high: Jiffies(20),
        })
        .vnic_bandwidth(BandwidthConfig::Bounded{inbound: 1000, outbound: 1000})
        .time_budget(Jiffies(1_000_000))
        .name("My simulation optional name")
        .seq_sched()
        .build();

    runner.run_full_budget();
}
```

#### Parallel Execution

For large simulations, enable parallel execution to distribute process steps across multiple threads:

```rust
let mut runner = SimulationBuilder::new()
    .add_pool::<MyProcess>("Nodes", 1000)
    .within_pool_latency("Nodes", Distr::Uniform{low: Jiffies(1), high: Jiffies(10)})
    .time_budget(Jiffies(1_000_000))
    .par_sched(ThreadNumber::Specific(8)) // use 8 worker threads
    .build();

runner.run_full_budget();
```

When is the parallel scheduler efficient?

1. A lot of simulated processes (at least 200-300)
2. on_message/on_timer execution takes most of the simulation time
3. Independent work inside on_message/on_timer handlers (not so much synchronization)

#### Omtimizations

For faster simulations we advise you to use these settings in your `Cargo.toml`:

```rust
[profile.release]
lto = "fat"           # Link Time Optimization: enables cross-crate optimizations
codegen-units = 1     # Reduces parallelism in code generation for better optimization
panic = "abort"       # Removes stack unwinding code, slightly smaller and faster binary
```

#### Fault Injection

DScale supports injecting network faults into simulations. Faults are scheduled as events — you specify when a fault starts and when it ends.

```rust
let mut runner = SimulationBuilder::new()
    .add_pool::<MyProcess>("Nodes", 5)
    .within_pool_latency("Nodes", Distr::Uniform{low: Jiffies(1), high: Jiffies(5)})
    .time_budget(Jiffies(1_000_000))
    // Break the link between pid 0 and pid 1 from time 100 to 500
    .break_link(Jiffies(100), Jiffies(500), 0, 1)
    // Isolate pid 2 (all links broken) from time 200 to 800
    .isolate(Jiffies(200), Jiffies(800), 2)
    .seq_sched()
    .build();

runner.run_full_budget();
```

| Method                               | Description                                                       |
| ------------------------------------ | ----------------------------------------------------------------- |
| `break_link(start, end, pid1, pid2)` | Breaks the link between two pids for the given time interval      |
| `isolate(start, end, pid)`           | Isolates a pid (breaks all its links) for the given time interval |

## Public API

### Simulation Control

**`SimulationBuilder`** — Configures the simulation environment.

| Method                                               | Description                                                                                                                                                               |
| ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default`                                            | Creates simulation with no processes and default parameters                                                                                                               |
| `seed`                                               | Sets the random seed for deterministic execution                                                                                                                          |
| `time_budget`                                        | Sets the maximum simulation duration                                                                                                                                      |
| `add_pool`                                           | Creates a named pool of processes (all processes also join `GLOBAL_POOL`)                                                                                                 |
| `default_latency(distribution)`                      | Configures default latency distribution which will be used unless configured other distribution explicitly                                                                |
| `within_pool_latency(pool, distribution)`            | Configures latency between processes within a pool                                                                                                                        |
| `between_pool_latency(pool_a, pool_b, distribution)` | Configures latency between two pools (symmetric). Every pool pair must have latency configured before calling `build`                                                     |
| `vnic_bandwidth`                                     | Configures per-process network bandwidth limits for "virtual" NIC. `Bounded{usize,usize}`: limits bandwidth (bytes per jiffy). `Unbounded`: no bandwidth limits (default) |
| `seq_sched`                                          | Selects single-threaded execution (default). Mutually exclusive with `par_sched` — calling both panics                                                                    |
| `par_sched(threads)`                                 | Selects parallel execution with the given number of worker threads. Mutually exclusive with `seq_sched` — calling both panics                                             |
| `break_link(start, end, pid1, pid2)`                 | Breaks the link between two pids for the given time interval.                                                                                                             |
| `isolate(start, end, pid)`                           | Isolates a pid (breaks all its links) for the given time interval.                                                                                                        |
| `name`                                               | Gives simulation instance the name                                                                                                                                        |
| `build`                                              | Finalizes configuration and returns a simulation runner                                                                                                                   |

**`SimulationRunner`**

| Method            | Description                                                                                              |
| ----------------- | -------------------------------------------------------------------------------------------------------- |
| `run_full_budget` | Runs the simulation until the time budget is exhausted                                                   |
| `run_steps`       | Runs the simulation until it performs the requested number of steps or the global budget is exhausted    |
| `run_sub_budget`  | Runs the simulation until the sub-budget starting from current time point or global budget are exhausted |

### Network Topology

**`Constants`**

| Constant      | Description                                                        |
| ------------- | ------------------------------------------------------------------ |
| `GLOBAL_POOL` | Implicit pool containing all processes. `broadcast` uses this pool |

**`Distributions`**

| Variant                             | Description                                              |
| ----------------------------------- | -------------------------------------------------------- |
| `Uniform {low, high}`               | Uniform distribution over `[low, high]`                  |
| `Bernoulli {p, value}`              | With probability `p` the latency is `value`, otherwise 0 |
| `Normal {mean, std_dev, low, high}` | Truncated normal distribution clamped to `[low, high]`   |
| `Pareto {scale, shape}`             | Pareto distribution                                      |

### Process Interaction (Context-Aware)

These functions are available globally but must be called within the context of a running process step.

| Function                | Description                                                          |
| ----------------------- | -------------------------------------------------------------------- |
| `broadcast`             | Shortcut for `broadcast_within_pool(GLOBAL_POOL)`                    |
| `broadcast_within_pool` | Sends a message to all processes within a named pool                 |
| `send_to`               | Sends a message to a specific process by pid                         |
| `send_random`           | Shortcut for `send_random_from_pool(GLOBAL_POOL)`                    |
| `send_random_from_pool` | Sends a message to a random process within a named pool              |
| `schedule_timer_after`  | Schedules a timer for the current process, returns a `TimerId`       |
| `pid`                   | Returns the pid of the currently executing process (pids start at 0) |
| `now`                   | Returns the current simulation time                                  |
| `list_pool`             | Returns a vector of all processes pids in a pool                     |
| `choose_from_pool`      | Picks a random process pid from a named pool                         |
| `unique_id`             | Generates a globally unique monotonic ID                             |

### Key-Value Store (`dscale::services::kv`)

Thread-safe store for passing shared state, metrics, or configuration between processes or back to the host.

> [!WARNING]  
> (1) You can't call kv functions from a custom Default trait definition for your process. If it is the case, use the on_start handler. (2) High modify load on the same key(s) may introduce high contention and performance degradation when used with the parallel scheduler.

| Function          | Description                                                      |
| ----------------- | ---------------------------------------------------------------- |
| `set(key, value)` | Stores a value under the given key                               |
| `get(key) -> T`   | Retrieves a clone of the value (panics if missing or wrong type) |
| `modify(key, f)`  | Mutates the value in place (panics if missing or wrong type)     |

### Macros

All logging macros prefix output with the current simulation time and process pid (`[Now: ... | P...]`). Controlled by the `RUST_LOG` environment variable.

| Macro           | Description             |
| --------------- | ----------------------- |
| `dscale_trace!` | Logs at **trace** level |
| `dscale_debug!` | Logs at **debug** level |
| `dscale_info!`  | Logs at **info** level  |
| `dscale_warn!`  | Logs at **warn** level  |
| `dscale_error!` | Logs at **error** level |

### Helpers (`dscale::helpers`)

| Item       | Description                                                                                               |
| ---------- | --------------------------------------------------------------------------------------------------------- |
| `Combiner` | Collects values until a threshold is reached, then yields them all at once. Useful for quorum-based logic |

### Message Downcasting (`MessagePtr`)

| Method               | Description                                         |
| -------------------- | --------------------------------------------------- |
| `try_as_type::<T>()` | Attempts to downcast to `T`, returns `Option<&T>`   |
| `as_type::<T>()`     | Downcasts to `T`, panics if the type does not match |
| `is::<T>()`          | Returns `true` if the message is of type `T`        |

## Logging Configuration (`RUST_LOG`)

DScale output is controlled via the `RUST_LOG` environment variable.

- **`RUST_LOG=[some_level]`**: Enables all `dscale_[level <= some_level]!` macros output.
- **`RUST_LOG=full::path::to::your::file::or::crate=[level],another::path=[level]`**: Filter events only for your specific file or crate.

> [!WARNING]  
> `RUST_LOG=[level > info]` only works without the `--release` flag.

## Examples

You can find usage examples [here](https://codeberg.org/kshprenger/dscale/src/branch/master/examples)

## Paper

You can find paper describing algorithms behind dscale [here](https://codeberg.org/kshprenger/dscale-paper/src/branch/master)

## Thanks to

- https://gitlab.com/whirl-framework
- https://github.com/jepsen-io/maelstrom
- https://github.com/systems-group/anysystem
- https://www.nsnam.org
- https://omnetpp.org
- https://peersim.sourceforge.net