calimero-node 0.10.1-rc.21

Core Calimero infrastructure and tools
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
# Sync Protocol Simulation Framework - Agent Guide

This guide is for AI agents working on the Calimero sync protocol implementation.

## Relationship to Production Runtime

The simulation framework replicates key aspects of the production runtime while enabling deterministic, reproducible testing without actual WASM execution or network I/O.

### What IS Replicated (Real Implementation)

| Component | Production | Simulation |
|-----------|------------|------------|
| **Merkle Tree** | `calimero-storage::Index<MainStorage>` | Same! Uses real implementation |
| **Storage Actions** | `Interface::apply_action` | Same! Real CRDT action application |
| **Hash Computation** | SHA-256 tree hashes | Same! Real hash propagation |
| **Protocol Selection** | `select_protocol()` from `calimero-node-primitives` | Same! Shared function |
| **Entity Metadata** | `Metadata { created_at, updated_at }` | Same! Real types |
| **RuntimeEnv** | Callbacks routing to RocksDB | Callbacks routing to `InMemoryDB` |

### What is NOT Replicated

| Component | Production | Simulation |
|-----------|------------|------------|
| **WASM Execution** | Full `calimero-runtime` with Wasmer | Skipped—direct state manipulation |
| **Network I/O** | libp2p gossipsub/streams | `NetworkRouter` with fault injection |
| **Time** | `SystemTime::now()` | Discrete `SimClock` |
| **Concurrency** | tokio async tasks | Sequential event processing |
| **Host Functions** | 80+ functions in `VMHostFunctions` | None—storage accessed directly |

### Why This Design?

1. **Real Merkle Tree**: HashComparison protocol depends on accurate subtree traversal.
   Using the real `calimero-storage` implementation ensures hash propagation works identically.

2. **Shared Protocol Selection**: `SimNode` implements `LocalSyncState` trait and uses
   `calimero_node_primitives::sync::protocol::select_protocol()` for consistency.

3. **Deterministic Testing**: Discrete clock and seeded RNG enable reproducible failures.

4. **Fault Injection**: `NetworkRouter` can simulate packet loss, latency, reordering,
   and partitions without actual network configuration.

### Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────────┐
│                         PRODUCTION RUNTIME                              │
├─────────────────────────────────────────────────────────────────────────┤
│  Client Request                                                         │
│       ↓                                                                 │
│  JSON-RPC Server                                                        │
│       ↓                                                                 │
│  WASM Runtime (calimero-runtime)  ←── VMHostFunctions, VMLimits         │
│       ↓                                                                 │
│  calimero-storage (Index, Interface::apply_action)                      │
│       ↓                                                                 │
│  calimero-store (RocksDB)                                               │
│       ↓                                                                 │
│  Network (libp2p gossipsub)                                             │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                         SIMULATION RUNTIME                              │
├─────────────────────────────────────────────────────────────────────────┤
│  Test Setup (Scenario)                                                  │
│       ↓                                                                 │
│  SimRuntime (orchestrator)                                              │
│       ↓                                                                 │
│  SimNode (state machine)                                                │
│       ↓                                                                 │
│  SimStorage ─────────────────┬──────────────────────────────────────────┤
│       │                      │                                          │
│       │  calimero-storage    │  ← REAL: Index, Interface, Merkle tree   │
│       │  (same crate!)       │                                          │
│       ↓                      │                                          │
│  InMemoryDB (calimero-store) │  ← Same Store interface, memory backend  │
│       ↓                      │                                          │
│  NetworkRouter (simulated)   │  ← Fault injection, partitions           │
└─────────────────────────────────────────────────────────────────────────┘
```

### Key Code Paths

**Production**: `VMHostFunctions::persist_root_state()` → `Interface::<MainStorage>::save_raw()`

**Simulation**: `SimStorage::add_entity()` → `Interface::<MainStorage>::apply_action()`

Both use the same `calimero_storage::interface::Interface` implementation!

## Framework Location

```
crates/node/tests/
├── sync_sim/              # Simulation framework (DO NOT MODIFY without review)
│   ├── mod.rs             # Main module, prelude exports
│   ├── actions.rs         # SyncMessage, SyncActions (effects model)
│   ├── types.rs           # NodeId, MessageId, EntityId, etc.
│   ├── runtime/           # SimClock, SimRng, EventQueue
│   ├── network/           # NetworkRouter, FaultConfig, PartitionManager
│   ├── node/              # SimNode state machine
│   ├── scenarios/         # Deterministic and random scenario generators
│   ├── convergence.rs     # Convergence checking (C1-C5 properties)
│   ├── metrics.rs         # SimMetrics collection
│   └── assertions.rs      # Test assertion macros
├── sync_sim.rs            # Framework unit tests
├── sync_scenarios/        # Protocol behavior tests (ADD NEW TESTS HERE)
└── sync_compliance/       # Compliance suite for issue #1785
```

## Quick Start

```rust
// NOTE: Minimal example - see sync_scenarios/ for real test patterns

use crate::sync_sim::prelude::*;

#[test]
fn test_example() {
    // Create runtime with seed for reproducibility
    let mut rt = SimRuntime::with_seed(42);
    
    // Add nodes with a scenario
    let scenario = Scenario::n_nodes_synced(3, 10); // 3 nodes, 10 shared entities
    let nodes = rt.apply_scenario(scenario);
    
    // Run until convergence or timeout
    let result = rt.run();
    
    // Assert convergence
    assert_converged!(rt);
    assert_eq!(result, StopCondition::Converged);
}
```

## Key Concepts

### SimRuntime
The orchestrator. Manages clock, event queue, nodes, and network.

```rust
let mut rt = SimRuntime::with_seed(42);           // Basic
let mut rt = SimRuntime::with_config(config);     // With custom config

rt.add_node("alice");                              // Add empty node
rt.apply_scenario(scenario);                       // Add nodes with state
rt.run();                                          // Run to completion
rt.run_until(|rt| rt.clock().now() > 1000.into()); // Run with predicate
rt.step();                                         // Single event step
```

### Scenarios

**Deterministic** (for specific test cases):
```rust
Scenario::n_nodes_synced(n, entities)      // All nodes have same state
Scenario::n_nodes_diverged(n, entities)    // Each node has unique state
Scenario::partial_overlap(n, shared, unique) // Mix of shared/unique
Scenario::force_snapshot()                 // Forces snapshot sync path
Scenario::force_none()                     // Empty nodes
```

**Random** (for property-based testing):
```rust
let config = RandomScenarioConfig::new(seed)
    .with_node_count(3, 5)
    .with_entity_count(10, 100)
    .with_divergence(0.3);
let scenario = RandomScenario::generate(&config);
```

### Fault Injection

```rust
let config = SimConfig::with_seed(42)
    .with_faults(FaultConfig::default()
        .with_loss(0.1)              // 10% message loss
        .with_latency(50, 10)        // 50ms base, 10ms jitter
        .with_reorder_window(100)    // 100ms reorder window
        .with_duplicate(0.05));      // 5% duplication

// Network partitions
rt.partition_bidirectional(&alice, &bob, None);           // Permanent
rt.partition_bidirectional(&alice, &bob, Some(1000.into())); // Temporary
rt.heal_partition(&alice, &bob);
```

### Assertions

```rust
assert_converged!(rt);                    // All nodes converged
assert_not_converged!(rt);                // Not converged
assert_entity_count!(rt, "alice", 10);    // Node has N entities
assert_has_entity!(rt, "alice", entity_id); // Node has specific entity
assert_idle!(rt, "alice");                // Node is idle (no pending work)
assert_buffer_empty!(rt, "alice");        // Delta buffer is empty
```

### Metrics

```rust
let metrics = rt.metrics();
metrics.protocol.messages_sent;
metrics.protocol.bytes_sent;
metrics.effects.messages_dropped;
metrics.convergence.converged;
metrics.convergence.time_to_converge;
```

## Writing Tests

### Where to Put Tests

| Test Type | Location | When to Use |
|-----------|----------|-------------|
| Framework tests | `sync_sim.rs` | Testing the framework itself |
| Protocol scenarios | `sync_scenarios/*.rs` | Testing sync protocol behavior |
| Compliance tests | `sync_compliance/*.rs` | Issue #1785 compliance suite |

### Test Patterns

**Basic convergence test:**
```rust
#[test]
fn test_two_nodes_converge() {
    let mut rt = SimRuntime::with_seed(42);
    let scenario = Scenario::n_nodes_diverged(2, 10);
    rt.apply_scenario(scenario);
    
    rt.run();
    
    assert_converged!(rt);
}
```

**Fault tolerance test:**
```rust
#[test]
fn test_convergence_with_packet_loss() {
    let config = SimConfig::with_seed(42)
        .with_faults(FaultConfig::default().with_loss(0.2));
    let mut rt = SimRuntime::with_config(config);
    
    // ... setup and run
    
    assert_converged!(rt);
}
```

**Partition healing test:**
```rust
#[test]
fn test_partition_healing() {
    let mut rt = SimRuntime::with_seed(42);
    let [a, b] = rt.apply_scenario(Scenario::n_nodes_diverged(2, 5));
    
    // Partition for 1000 ticks
    rt.partition_bidirectional(&a, &b, Some(1000.into()));
    
    rt.run();
    
    assert_converged!(rt); // Should converge after partition heals
}
```

**Property-based test:**
```rust
#[test]
fn test_random_scenarios_converge() {
    for seed in 0..100 {
        let config = RandomScenarioConfig::new(seed)
            .with_node_count(2, 5)
            .with_entity_count(5, 20);
        
        let mut rt = SimRuntime::with_seed(seed);
        rt.apply_scenario(RandomScenario::generate(&config));
        
        let result = rt.run();
        
        assert!(
            matches!(result, StopCondition::Converged),
            "Seed {} failed to converge", seed
        );
    }
}
```

## Invariants (DO NOT BREAK)

1. **Determinism**: Same seed MUST produce identical results
2. **No silent drops**: All message drops must be recorded in metrics
3. **Convergence properties C1-C5**: See `convergence.rs` for formal definitions
4. **Time monotonicity**: SimClock never goes backwards

## Debugging Failures

1. **Get the seed**: All random tests should log their seed
2. **Reproduce locally**: `SimRuntime::with_seed(failing_seed)`
3. **Step through**: Use `rt.step()` instead of `rt.run()`
4. **Check metrics**: `rt.metrics()` shows what happened
5. **Check convergence**: `rt.check_convergence()` returns detailed status

## Common Mistakes

- **Forgetting seed**: Always use deterministic seeds for reproducibility
- **Not checking StopCondition**: `run()` returns why it stopped
- **Ignoring metrics**: Metrics reveal silent failures
- **Hardcoding time**: Use `SimTime` and `SimDuration`, not raw numbers

## Simulation vs Production Network

This section details the differences between `NetworkRouter` (simulation) and `calimero-network` (production).

### Architecture Comparison

```text
┌─────────────────────────────────────────────────────────────────────┐
│                    PRODUCTION (calimero-network)                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  NetworkClient                                                       │
│       │                                                              │
│       ▼ NetworkMessage                                               │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    NetworkManager (actor)                    │    │
│  │  ┌──────────────────────────────────────────────────────┐   │    │
│  │  │               Swarm<Behaviour>                        │   │    │
│  │  │  ┌──────────┐ ┌─────────┐ ┌──────────┐ ┌──────────┐  │   │    │
│  │  │  │Gossipsub │ │   Kad   │ │  Stream  │ │Rendezvous│  │   │    │
│  │  │  │ (topics) │ │  (DHT)  │ │ (direct) │ │(discover)│  │   │    │
│  │  │  └──────────┘ └─────────┘ └──────────┘ └──────────┘  │   │    │
│  │  │  ┌──────────┐ ┌─────────┐ ┌──────────┐ ┌──────────┐  │   │    │
│  │  │  │  mDNS    │ │  Relay  │ │  DCUtR   │ │ AutoNAT  │  │   │    │
│  │  │  │ (local)  │ │  (NAT)  │ │(holepun) │ │(detect)  │  │   │    │
│  │  │  └──────────┘ └─────────┘ └──────────┘ └──────────┘  │   │    │
│  │  └──────────────────────────────────────────────────────┘   │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       │                                                              │
│       ▼ TCP/QUIC                                                     │
│  Real Network I/O (libp2p)                                          │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                     SIMULATION (sync_sim)                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Test code                                                           │
│       │                                                              │
│       ▼ SyncActions                                                  │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                      SimRuntime                              │    │
│  │  ┌──────────────────────────────────────────────────────┐   │    │
│  │  │               NetworkRouter                           │   │    │
│  │  │  ┌────────────────────────────────────────────────┐  │   │    │
│  │  │  │           FaultConfig                          │  │   │    │
│  │  │  │  • message_loss_rate    • duplicate_rate       │  │   │    │
│  │  │  │  • base_latency_ms      • reorder_window_ms    │  │   │    │
│  │  │  │  • partition_probability • crash_probability   │  │   │    │
│  │  │  └────────────────────────────────────────────────┘  │   │    │
│  │  │  ┌────────────────────────────────────────────────┐  │   │    │
│  │  │  │         PartitionManager                       │  │   │    │
│  │  │  │  • Bidirectional partitions                    │  │   │    │
│  │  │  │  • Timed healing                               │  │   │    │
│  │  │  └────────────────────────────────────────────────┘  │   │    │
│  │  └──────────────────────────────────────────────────────┘   │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       │                                                              │
│       ▼ SimEvent (in EventQueue)                                     │
│  Deterministic event processing (no real I/O)                       │
└─────────────────────────────────────────────────────────────────────┘
```

### Feature Comparison

| Feature | Production (`calimero-network`) | Simulation (`NetworkRouter`) |
|---------|--------------------------------|------------------------------|
| **Message Delivery** | Real TCP/QUIC | `SimEvent::DeliverMessage` in queue |
| **Latency** | Real network latency | Configurable `base_latency_ms + jitter` |
| **Message Loss** | Real packet loss | Configurable `message_loss_rate` (0.0-1.0) |
| **Message Reorder** | Possible in real network | Configurable `reorder_window_ms` |
| **Message Duplicate** | Rare in real network | Configurable `duplicate_rate` |
| **Network Partition** | Real disconnection | `PartitionManager` with timed healing |
| **Gossipsub Topics** | Per-context topics | ❌ Not simulated (single context) |
| **Peer Discovery** | mDNS, Kad, Rendezvous | ❌ Not simulated (nodes pre-connected) |
| **NAT Traversal** | AutoNAT, Relay, DCUtR | ❌ Not simulated |
| **Connection Setup** | TCP/QUIC handshake, TLS | ❌ Instant (no handshake) |
| **Bandwidth Limits** | Real throughput | ❌ Not simulated |
| **Encryption** | TLS/Noise per connection | Optional via `EncryptionState` |
| **Streams** | `libp2p_stream` protocol | `SimStream` (tokio channels) |

### When to Use Each

| Testing Goal | Use Simulation | Use Integration Tests |
|--------------|---------------|----------------------|
| Protocol correctness | ✅ Yes | Overkill |
| Convergence properties | ✅ Yes | Also useful |
| Fault tolerance | ✅ Yes (configurable faults) | Harder to control |
| Message ordering | ✅ Yes (reorder_window_ms) | Non-deterministic |
| Partition healing | ✅ Yes (PartitionManager) | Complex setup |
| Peer discovery | ❌ No | ✅ Yes |
| NAT traversal | ❌ No | ✅ Yes |
| Real latency behavior | ❌ No | ✅ Yes |
| Connection management | ❌ No | ✅ Yes |
| Multi-context | ❌ No | ✅ Yes |

### SimStream vs Production Stream

The `SimStream` type implements the same `SyncTransport` trait as production, allowing the **real sync protocol code** to run in simulation:

```rust
// Production (calimero-network)
pub struct Stream {
    inner: Framed<BufStream<Compat<P2pStream>>, MessageCodec>,
}

// Simulation (sync_sim)
pub struct SimStream {
    tx: Option<mpsc::Sender<Vec<u8>>>,
    rx: mpsc::Receiver<Vec<u8>>,
    buffer: VecDeque<Vec<u8>>,
    encryption: EncryptionState,
}

// Both implement:
#[async_trait]
impl SyncTransport for SimStream {  // or Stream
    async fn send(&mut self, message: &StreamMessage<'_>) -> Result<()>;
    async fn recv(&mut self) -> Result<Option<StreamMessage<'static>>>;
    async fn recv_timeout(&mut self, budget: Duration) -> Result<Option<StreamMessage<'static>>>;
    fn set_encryption(&mut self, encryption: Option<(SharedKey, Nonce)>);
    async fn close(&mut self) -> Result<()>;
}
```

This design allows `hash_comparison_sync()` and other protocol functions to run unchanged in simulation.

### Fault Injection Examples

```rust
// Light chaos (realistic network)
FaultConfig::light_chaos()
// base_latency_ms: 10, jitter: 5
// message_loss_rate: 0.01 (1%)
// reorder_window_ms: 20
// duplicate_rate: 0.01 (1%)

// Heavy chaos (stress test)
FaultConfig::heavy_chaos()
// base_latency_ms: 50, jitter: 25
// message_loss_rate: 0.1 (10%)
// reorder_window_ms: 100
// duplicate_rate: 0.05 (5%)
// partition_probability: 0.01
// crash_probability: 0.001

// Custom configuration
FaultConfig::none()
    .with_latency(100, 50)    // 100ms ± 50ms
    .with_loss(0.05)          // 5% loss
    .with_reorder(200)        // 200ms reorder window
    .with_duplicates(0.02)    // 2% duplication
    .with_partitions(0.01, 500..2000)  // 1% chance, 500-2000ms duration
```

### Key Differences Summary

1. **Determinism**: Simulation is fully deterministic (same seed = same results). Production is not.

2. **Time Model**: Simulation uses discrete `SimTime`. Production uses real `SystemTime`.

3. **Concurrency**: Simulation is sequential event processing. Production uses tokio async tasks.

4. **Discovery**: Simulation assumes all nodes can reach each other. Production must discover peers.

5. **Scope**: Simulation tests single-context sync. Production handles multiple contexts.

For protocol correctness and fault tolerance testing, use simulation. For discovery, NAT, and real network behavior, use integration tests with actual `calimero-network`.