Skip to main content

moonpool_explorer/
lib.rs

1//! Multiverse exploration for deterministic simulation testing.
2//!
3//! This crate discovers rare bugs by splitting simulation timelines at key
4//! moments. When your simulation reaches an interesting state for the first
5//! time (like a retry path firing, or a timeout being hit), the explorer
6//! forks the process and re-runs from that point with different randomness,
7//! creating a tree of alternate timelines.
8//!
9//! # Glossary
10//!
11//! ```text
12//! Term              What it means
13//! ────────────────  ──────────────────────────────────────────────────────
14//! Seed              A u64 that completely determines a simulation's randomness.
15//!                   Same seed = same coin flips = same execution every time.
16//!
17//! Timeline          One complete simulation run. A seed + a sequence of
18//!                   splitpoints uniquely identifies a timeline.
19//!
20//! Splitpoint        A moment where the explorer decides to branch the
21//!                   multiverse. Happens when a `sometimes` assertion
22//!                   succeeds for the first time, a numeric watermark
23//!                   improves, or a frontier advances.
24//!
25//! Multiverse        The tree of all timelines explored from one root seed.
26//!                   Each splitpoint creates new children with different seeds.
27//!
28//! Coverage bitmap   A small bitfield (8192 bits) that records which assertion
29//!                   paths a timeline touched. Used to detect whether a
30//!                   new timeline discovered anything its siblings didn't.
31//!
32//! Explored map      The union of all coverage bitmaps across all timelines.
33//!                   Lives in shared memory. "Has this ever been seen?"
34//!
35//! Energy budget     A finite pool that limits how many timelines the explorer
36//!                   can spawn. Prevents runaway forking.
37//!
38//! Watermark         The best numeric value ever observed for a given assertion.
39//!                   Improving the watermark triggers a new splitpoint.
40//!
41//! Frontier          For compound boolean assertions: the max number of
42//!                   conditions simultaneously true. Advancing it triggers
43//!                   a splitpoint.
44//!
45//! Recipe            The sequence of splitpoints that leads to a specific
46//!                   timeline. If a bug is found, its recipe lets you
47//!                   replay exactly how to reach it.
48//!
49//! Mark              An assertion site that can trigger splitpoints.
50//!                   Each mark has a name, a shared-memory slot,
51//!                   and (in adaptive mode) its own energy allowance.
52//! ```
53//!
54//! # The big idea
55//!
56//! A normal simulation picks one seed and runs one timeline:
57//!
58//! ```text
59//! Seed 42 ──────────────────────────────────────────────────▶ done
60//!   RNG call #1    #2    #3    #4   ...   #500
61//! ```
62//!
63//! But bugs hide in rare states. Maybe a retry only fires 5% of the time,
64//! and the bug only appears when two retries happen in the same run. With
65//! single-seed testing, you might run thousands of seeds and never hit it.
66//!
67//! Multiverse exploration changes the strategy. Instead of running many
68//! independent seeds, it dives deeper into promising seeds:
69//!
70//! ```text
71//! Seed 42 ─────────────┬── splitpoint! ─────────────────────▶ done
72//!   RNG call #1 ... #200│
73//!                       │
74//!                       ├── Timeline A (new seed) ──────────▶ done
75//!                       ├── Timeline B (new seed) ──┬───────▶ done
76//!                       │                           │
77//!                       │                    nested splitpoint!
78//!                       │                           │
79//!                       │                           ├── Timeline B1 ──▶ done
80//!                       │                           └── Timeline B2 ──▶ BUG!
81//!                       │
82//!                       └── Timeline C (new seed) ──────────▶ done
83//! ```
84//!
85//! The key insight: if a seed reached an interesting state (the retry fired),
86//! running from that point with different randomness is more likely to find
87//! a second interesting event than starting from scratch.
88//!
89//! # How the simulation seed works
90//!
91//! Every decision in the simulation (latency, failure, timeout) comes from
92//! a deterministic random number generator (RNG) seeded with a single `u64`.
93//! The RNG is a counter: every call to it increments a call count.
94//!
95//! ```text
96//! seed = 42
97//! RNG call #1 → 0.73  (used for: network latency)
98//! RNG call #2 → 0.12  (used for: should this connection fail?)
99//! RNG call #3 → 0.89  (used for: timer jitter)
100//! ...
101//! RNG call #N → 0.41  (used for: partition probability)
102//! ```
103//!
104//! Same seed = same sequence = same execution. This is what makes bugs
105//! reproducible.
106//!
107//! # What triggers a splitpoint?
108//!
109//! Not every assertion triggers exploration. Only *discovery* assertions do,
110//! and only when they discover something new:
111//!
112//! ```text
113//! Assertion kind          Triggers a splitpoint when...
114//! ──────────────────────  ────────────────────────────────────────────
115//! assert_sometimes!       The condition is true for the FIRST time
116//! assert_reachable!       The code path is reached for the FIRST time
117//! assert_sometimes_gt!    The observed value beats the previous watermark
118//! assert_sometimes_all!   More conditions are true simultaneously than ever
119//! assert_sometimes_each!  A new identity-key combination is seen, or
120//!                         the quality score improves for a known one
121//!
122//! assert_always!          NEVER (invariant, not a discovery)
123//! assert_unreachable!     NEVER (safety check, not a discovery)
124//! ```
125//!
126//! The "first time" check uses a CAS (compare-and-swap) on a `split_triggered`
127//! flag in shared memory. Once a mark has triggered, it won't trigger again
128//! (except for numeric watermarks and frontiers, which can trigger multiple
129//! times as they improve).
130//!
131//! # How a splitpoint works (step by step)
132//!
133//! When `assert_sometimes!(retry_fired, "server retried request")` is `true`
134//! for the first time at RNG call #200 with seed 42:
135//!
136//! ```text
137//!  Step 1: CAS split_triggered from 0 → 1        (first-time guard)
138//!  Step 2: Record splitpoint position              (RNG call count = 200)
139//!  Step 3: Check energy budget                     (can we afford to split?)
140//!  Step 4: Save parent's coverage bitmap to stack  (we'll restore it later)
141//!
142//!  For each new timeline (e.g., timelines_per_split = 3):
143//!  ┌──────────────────────────────────────────────────────────────────┐
144//!  │ Step 5: Clear child coverage bitmap                             │
145//!  │ Step 6: Compute child seed = FNV-1a(parent_seed, mark, index)   │
146//!  │ Step 7: fork()  ←── OS-level process fork (copy-on-write)       │
147//!  │                                                                  │
148//!  │   CHILD (pid=0):                                                │
149//!  │     - Reseed RNG with child_seed                                │
150//!  │     - Set is_child = true, depth += 1                           │
151//!  │     - Append (200, child_seed) to recipe                        │
152//!  │     - Return from split function                                │
153//!  │     - Continue simulation with NEW randomness ─────▶ eventually │
154//!  │       calls exit_child(0) or exit_child(42) if bug              │
155//!  │                                                                  │
156//!  │   PARENT (pid>0):                                               │
157//!  │     Sequential mode:                                            │
158//!  │       - waitpid(pid) — blocks until THIS child finishes         │
159//!  │       - Merge child's coverage bitmap into explored map         │
160//!  │       - If child exited with code 42 → record bug recipe        │
161//!  │       - Loop to next child                                      │
162//!  │     Parallel mode:                                              │
163//!  │       - Track child in active set                               │
164//!  │       - When slots are full → reap_one() via waitpid(-1)        │
165//!  │       - Continue forking until all children spawned             │
166//!  │       - Drain remaining active children at the end              │
167//!  └──────────────────────────────────────────────────────────────────┘
168//!
169//!  Step 8: Restore parent's coverage bitmap
170//!  Step 9: Parent continues its own simulation normally
171//! ```
172//!
173//! The child seed is deterministic: `FNV-1a(parent_seed + mark_name + child_index)`.
174//! So the multiverse tree is fully reproducible.
175//!
176//! # The coverage bitmap: "did this timeline find anything new?"
177//!
178//! Each timeline gets a small bitmap (1024 bytes = 8192 bits). When an
179//! assertion fires, it sets a bit at position `hash(assertion_name) % 8192`:
180//!
181//! ```text
182//! Bitmap for Timeline A:
183//! byte 0          byte 1          byte 2
184//! ┌─┬─┬─┬─┬─┬─┬─┬─┐┌─┬─┬─┬─┬─┬─┬─┬─┐┌─┬─┬─┬─┬─┬─┬─┬─┐
185//! │0│0│1│0│0│0│0│0││0│0│0│0│0│1│0│0││0│0│0│0│0│0│0│0│ ...
186//! └─┴─┴─┴─┴─┴─┴─┴─┘└─┴─┴─┴─┴─┴─┴─┴─┘└─┴─┴─┴─┴─┴─┴─┴─┘
187//!       ▲                       ▲
188//!       bit 2                   bit 13
189//!       (retry_fired)           (timeout_hit)
190//! ```
191//!
192//! The **explored map** is the union (OR) of all bitmaps across all timelines.
193//! It lives in `MAP_SHARED` memory so all forked processes can see it:
194//!
195//! ```text
196//! After Timeline A:   explored = 00100000 00000100 00000000 ...
197//! After Timeline B:   explored = 00100000 00000110 00000000 ...
198//!                                                 ▲
199//!                                      Timeline B found bit 14!
200//!                                      (that's new coverage)
201//! ```
202//!
203//! **How "has new bits" works:**
204//!
205//! For each byte: `(child_byte & !explored_byte) != 0`
206//!
207//! ```text
208//! child    = 00000110     (bits 1, 2 set)
209//! explored = 00000100     (bit 2 already known)
210//! !explored = 11111011
211//! child & !explored = 00000010  ← bit 1 is NEW!  → has_new_bits = true
212//! ```
213//!
214//! This is used in **adaptive mode** to decide whether a mark is still
215//! productive (finding new paths) or barren (just repeating known paths).
216//!
217//! # The energy budget: "how much exploring can we do?"
218//!
219//! Without a limit, the explorer could fork forever (exponential blowup).
220//! The energy budget caps the total number of timelines spawned.
221//!
222//! ## Fixed-count mode (simple)
223//!
224//! One global counter. Each timeline costs 1 energy. When it reaches 0,
225//! no more timelines are spawned:
226//!
227//! ```text
228//! global_energy: 10
229//!
230//! Split at mark A: spawn 3 timelines → energy = 7
231//! Split at mark B: spawn 3 timelines → energy = 4
232//! Split at mark C: spawn 3 timelines → energy = 1
233//! Split at mark D: spawn 1 timeline  → energy = 0
234//! Split at mark E: no energy left, skip
235//! ```
236//!
237//! Configure with:
238//! ```ignore
239//! ExplorationConfig {
240//!     max_depth: 2,
241//!     timelines_per_split: 3,
242//!     global_energy: 50,
243//!     adaptive: None,      // ← fixed-count mode
244//!     parallelism: None,   // ← sequential (or Some(Parallelism::MaxCores))
245//! }
246//! ```
247//!
248//! ## Adaptive mode (smart)
249//!
250//! A 3-level energy system that automatically gives more budget to productive
251//! marks and takes budget away from barren ones:
252//!
253//! ```text
254//! ┌──────────────────────────────────────────────────────────────┐
255//! │                    GLOBAL ENERGY (100)                       │
256//! │  Every timeline consumes 1 from here first.                 │
257//! │  When this hits 0, ALL exploration stops.                   │
258//! ├──────────────────────────────────────────────────────────────┤
259//! │                                                              │
260//! │  ┌─────────────────┐  ┌─────────────────┐  ┌────────────┐  │
261//! │  │  Mark A: 15     │  │  Mark B: 15     │  │ Mark C: 15 │  │
262//! │  │  (productive)   │  │  (barren)       │  │ (new)      │  │
263//! │  │  Used: 15/15    │  │  Used: 3/15     │  │ Used: 0/15 │  │
264//! │  │  Needs more! ───┼──┼─── Returns 12 ──┼──┤            │  │
265//! │  └────────┬────────┘  └─────────────────┘  └────────────┘  │
266//! │           │                     │                            │
267//! │           ▼                     ▼                            │
268//! │  ┌──────────────────────────────────────────────────────┐   │
269//! │  │           REALLOCATION POOL: 12                      │   │
270//! │  │  Energy returned by barren marks.                    │   │
271//! │  │  Productive marks draw from here when their          │   │
272//! │  │  per-mark budget runs out.                           │   │
273//! │  └──────────────────────────────────────────────────────┘   │
274//! └──────────────────────────────────────────────────────────────┘
275//! ```
276//!
277//! **How it decides if a mark is productive or barren:**
278//!
279//! Timelines are spawned in batches (e.g., 4 at a time). After each batch,
280//! the explorer checks the coverage bitmap:
281//!
282//! ```text
283//! Mark "retry_path":
284//!   Batch 1: spawn 4 timelines → check bitmap → 2 found new bits ✓
285//!   Batch 2: spawn 4 timelines → check bitmap → 1 found new bits ✓
286//!   Batch 3: spawn 4 timelines → check bitmap → 0 found new bits ✗
287//!     → Mark is BARREN. Return remaining per-mark energy to pool.
288//!     → Stop splitting at this mark.
289//!
290//! Mark "partition_heal":
291//!   Batch 1: spawn 4 timelines → check bitmap → 3 found new bits ✓
292//!   Batch 2: spawn 4 timelines → check bitmap → 2 found new bits ✓
293//!   ...continues until per-mark budget exhausted...
294//!   ...draws from reallocation pool for more...
295//!   Batch 6: spawn 4 timelines → hits max_timelines (20) → stop
296//! ```
297//!
298//! **The 3-level energy decrement (for each timeline):**
299//!
300//! ```text
301//! 1. Decrement global_remaining by 1
302//!    └─ If global is at 0 → STOP (hard cap, game over)
303//!
304//! 2. Decrement per_mark[slot] by 1
305//!    └─ If per_mark has budget → OK, continue
306//!    └─ If per_mark is at 0 → try step 3
307//!
308//! 3. Decrement realloc_pool by 1
309//!    └─ If pool has budget → OK, continue
310//!    └─ If pool is at 0 → UNDO global decrement, STOP for this mark
311//! ```
312//!
313//! Configure with:
314//! ```ignore
315//! ExplorationConfig {
316//!     max_depth: 3,
317//!     timelines_per_split: 4,  // ignored in adaptive mode
318//!     global_energy: 200,
319//!     adaptive: Some(AdaptiveConfig {
320//!         batch_size: 4,        // timelines per batch before checking yield
321//!         min_timelines: 4,     // minimum timelines even if barren
322//!         max_timelines: 20,    // hard cap per mark
323//!         per_mark_energy: 15,  // initial budget per mark
324//!     }),
325//!     parallelism: Some(Parallelism::MaxCores),  // use all CPU cores
326//! }
327//! ```
328//!
329//! # Complete walkthrough: from seed to bug
330//!
331//! Say you're testing a distributed lock service with 3 nodes. Your test has:
332//!
333//! ```ignore
334//! assert_sometimes!(lock_acquired_after_retry, "lock retry succeeded");
335//! assert_sometimes!(saw_split_brain, "split brain detected");
336//! assert_always!(no_double_grant, "lock never granted to two nodes");
337//! ```
338//!
339//! **Iteration 1: Seed 42, exploration enabled**
340//!
341//! ```text
342//! Root timeline (seed=42, depth=0)
343//! │
344//! │  RNG#1..#150: normal execution, no interesting events
345//! │
346//! │  RNG#151: lock_acquired_after_retry = true  ← FIRST TIME!
347//! │           CAS split_triggered 0→1 succeeds
348//! │           Splitpoint triggered at RNG call #151
349//! │
350//! ├── Timeline T0 (seed=FNV(42,"lock retry",0), depth=1)
351//! │   │  Reseed RNG, continue from same program state
352//! │   │  Different randomness → different network delays
353//! │   │  RNG#1..#80: saw_split_brain = true  ← FIRST TIME (for children)
354//! │   │  CAS split_triggered 0→1 succeeds
355//! │   │  Nested splitpoint at RNG call #80!
356//! │   │
357//! │   ├── Timeline T0-0 (depth=2)
358//! │   │     no_double_grant = false  ← BUG!
359//! │   │     exit_child(42)  ← special code for "bug found"
360//! │   │
361//! │   ├── Timeline T0-1 (depth=2)
362//! │   │     runs to completion, no bug
363//! │   │     exit_child(0)
364//! │   │
365//! │   └── Timeline T0-2 (depth=2)
366//! │         runs to completion, no bug
367//! │         exit_child(0)
368//! │   │
369//! │   │  T0 continues after children finish
370//! │   │  exit_child(0)
371//! │
372//! ├── Timeline T1 (seed=FNV(42,"lock retry",1), depth=1)
373//! │     runs to completion, nothing new
374//! │     exit_child(0)
375//! │
376//! └── Timeline T2 (seed=FNV(42,"lock retry",2), depth=1)
377//!       runs to completion, nothing new
378//!       exit_child(0)
379//!
380//! Root continues after all children finish.
381//! Bug found! Recipe saved.
382//! ```
383//!
384//! **The bug recipe** is the path from root to the buggy timeline:
385//!
386//! ```text
387//! Recipe: [(151, seed_T0), (80, seed_T0_0)]
388//! Formatted: "151@<seed_T0> -> 80@<seed_T0_0>"
389//! ```
390//!
391//! To replay: set the root seed to 42, install RNG breakpoints at
392//! call count 151 (reseed to seed_T0) and then at call count 80
393//! (reseed to seed_T0_0). The simulation will follow the exact same
394//! path to the bug every time.
395//!
396//! # How fork() makes this work
397//!
398//! The explorer uses Unix `fork()` to create timeline branches. This is
399//! cheap because of copy-on-write (COW): the child gets a copy of the
400//! parent's entire memory without actually copying it. Pages are only
401//! copied when one side writes to them.
402//!
403//! ```text
404//! Parent process memory:
405//! ┌──────────────────────────────────────────────────┐
406//! │  Simulation state (actors, network, timers)      │  ← COW pages
407//! │  RNG state (will be reseeded in child)           │  ← COW pages
408//! ├──────────────────────────────────────────────────┤
409//! │  MAP_SHARED memory:                              │  ← truly shared
410//! │    - Assertion table (128 slots)                 │
411//! │    - Coverage bitmap pool (1 slot per core)      │
412//! │    - Explored map (union of all coverage)        │
413//! │    - Energy budget                               │
414//! │    - Fork stats + bug recipe                     │
415//! └──────────────────────────────────────────────────┘
416//!                      │
417//!                    fork()
418//!                      │
419//!            ┌─────────┴─────────┐
420//!            ▼                   ▼
421//!     Parent (reaps)      Child (continues)
422//!     - waitpid()         - rng_reseed(new_seed)
423//!     - merge coverage    - run simulation
424//!     - check exit code   - exit_child(0 or 42)
425//! ```
426//!
427//! `MAP_SHARED` memory is the only communication channel between parent
428//! and child. Assertion counters, coverage bits, energy budgets, and bug
429//! recipes all live there. This is why the crate only depends on `libc`.
430//!
431//! # Parallel exploration (multi-core)
432//!
433//! By default, the fork loop is sequential: fork one child, wait for it,
434//! merge coverage, repeat. This uses only one CPU core.
435//!
436//! With `parallelism: Some(Parallelism::MaxCores)`, the explorer runs
437//! multiple children concurrently using a **sliding window** capped at
438//! the number of available CPU cores:
439//!
440//! ```text
441//! Sequential (parallelism: None):
442//!
443//!   fork child 0 --- waitpid --- fork child 1 --- waitpid --- ...
444//!   core 0 busy     core 0 idle  core 0 busy     core 0 idle
445//!
446//! Parallel (parallelism: Some(Parallelism::MaxCores)):
447//!
448//!   fork child 0 ──────────────────────── reap
449//!   fork child 1 ─────────────────── reap
450//!   fork child 2 ────────────── reap
451//!   fork child 3 ─────── reap
452//!   all cores busy until drained
453//! ```
454//!
455//! The parent uses `waitpid(-1)` to reap whichever child finishes first,
456//! then recycles that slot for the next child.
457//!
458//! ## How the bitmap pool works
459//!
460//! In sequential mode, children share one coverage bitmap (cleared between
461//! each child). In parallel mode, concurrent children would clobber each
462//! other's bitmap, so each child gets its own **bitmap pool slot**:
463//!
464//! ```text
465//! Bitmap pool (MAP_SHARED, allocated lazily on first parallel split):
466//! ┌─────────────┬─────────────┬─────────────┬─────────────┐
467//! │  Slot 0     │  Slot 1     │  Slot 2     │  Slot 3     │
468//! │  1024 bytes │  1024 bytes │  1024 bytes │  1024 bytes │
469//! │  child A    │  child B    │  child C    │  (free)     │
470//! └─────────────┴─────────────┴─────────────┴─────────────┘
471//! ```
472//!
473//! When a child finishes (`waitpid(-1)` returns its PID), the parent:
474//! 1. Looks up which slot that child used
475//! 2. Merges the slot's bitmap into the explored map
476//! 3. Recycles the slot for the next child
477//!
478//! Children that themselves become parents (nested splits) allocate their
479//! own fresh pool -- the pool pointer is reset to null in each child at
480//! fork time.
481//!
482//! ## Parallelism variants
483//!
484//! ```text
485//! Variant              Slot count
486//! ───────────────────  ─────────────────────────────────────
487//! MaxCores             All available CPU cores
488//! HalfCores            Half of available cores (rounded up, min 1)
489//! Cores(n)             Exactly n cores
490//! MaxCoresMinus(n)     All cores minus n (min 1)
491//! ```
492//!
493//! ## Concurrency safety
494//!
495//! All `MAP_SHARED` state uses atomic operations that are safe across
496//! `fork()` boundaries:
497//! - **Assertion slots**: CAS for first-time discovery, `fetch_add` for counters
498//! - **Energy budget**: `fetch_sub`/`fetch_add` with rollback on failure
499//! - **Bug recipe**: CAS first-bug-wins -- only the first bug is recorded
500//! - **Stats counters**: `fetch_add` -- concurrent increments are safe
501//! - **Explored map merge**: Done by the parent AFTER `waitpid`, never concurrent
502//!
503//! # Architecture
504//!
505//! ```text
506//! moonpool-explorer (this crate)  ── leaf, only depends on libc
507//!      ▲
508//! moonpool-sim                    ── wires RNG hooks at init
509//! ```
510//!
511//! Communication with the simulation's RNG uses two function pointers
512//! set via [`set_rng_hooks`]:
513//! - `get_count: fn() -> u64` — how many RNG calls have happened
514//! - `reseed: fn(u64)` — replace the RNG seed and reset the counter
515//!
516//! This crate has zero knowledge of moonpool internals. It doesn't know
517//! about actors, networks, or storage. It only knows about RNG call counts,
518//! seeds, and assertion slot positions.
519//!
520//! # Module overview
521//!
522//! ```text
523//! lib.rs             ── ExplorationConfig, init/cleanup, this documentation
524//! split_loop.rs      ── The fork loop: sequential + parallel sliding window,
525//!                       adaptive batching, Parallelism enum
526//! assertion_slots.rs ── Shared-memory assertion table (128 slots)
527//! each_buckets.rs    ── Per-value bucketed assertions (256 buckets)
528//! coverage.rs        ── CoverageBitmap + ExploredMap (8192-bit bitmaps)
529//! energy.rs          ── 3-level energy budget (global + per-mark + realloc pool)
530//! context.rs         ── Thread-local state, RNG hooks, bitmap pool pointers
531//! shared_stats.rs    ── Cross-process counters (timelines, fork_points, bugs)
532//! shared_mem.rs      ── mmap(MAP_SHARED|MAP_ANONYMOUS) allocation
533//! replay.rs          ── Recipe formatting and parsing ("151@seed -> 80@seed")
534//! ```
535//!
536//! # Usage
537//!
538//! ```ignore
539//! // In moonpool-sim's runner:
540//! moonpool_explorer::set_rng_hooks(get_rng_call_count, |seed| {
541//!     set_sim_seed(seed);
542//!     reset_rng_call_count();
543//! });
544//!
545//! moonpool_explorer::init(ExplorationConfig {
546//!     max_depth: 2,
547//!     timelines_per_split: 4,
548//!     global_energy: 100,
549//!     adaptive: None,
550//!     parallelism: Some(Parallelism::MaxCores),
551//! })?;
552//!
553//! // ... run simulation ...
554//!
555//! moonpool_explorer::cleanup();
556//! ```
557
558#![deny(missing_docs)]
559#![deny(clippy::unwrap_used)]
560
561pub mod assertion_slots;
562pub mod context;
563pub mod coverage;
564pub mod each_buckets;
565pub mod energy;
566pub mod replay;
567pub mod sancov;
568pub mod shared_mem;
569pub mod shared_stats;
570pub mod simulations;
571pub mod split_loop;
572
573// Re-exports for the public API
574pub use assertion_slots::{
575    ASSERTION_TABLE_MEM_SIZE, AssertCmp, AssertKind, AssertionSlot, AssertionSlotSnapshot,
576    assertion_bool, assertion_numeric, assertion_read_all, assertion_sometimes_all, msg_hash,
577};
578pub use context::{explorer_is_child, get_assertion_table_ptr, set_rng_hooks};
579pub use each_buckets::{
580    EachBucket, assertion_sometimes_each, each_bucket_read_all, unpack_quality,
581};
582pub use replay::{ParseTimelineError, format_timeline, parse_timeline};
583pub use sancov::{sancov_edge_count, sancov_edges_covered, sancov_is_available};
584pub use shared_stats::{ExplorationStats, get_bug_recipe, get_exploration_stats};
585pub use split_loop::{AdaptiveConfig, Parallelism, exit_child};
586
587use context::{
588    ASSERTION_TABLE, COVERAGE_BITMAP_PTR, EACH_BUCKET_PTR, ENERGY_BUDGET_PTR, EXPLORED_MAP_PTR,
589    SHARED_RECIPE, SHARED_STATS,
590};
591
592/// Configuration for exploration.
593#[derive(Debug, Clone)]
594pub struct ExplorationConfig {
595    /// Maximum fork depth (0 = no forking).
596    pub max_depth: u32,
597    /// Number of children to fork at each discovery point (fixed-count mode).
598    pub timelines_per_split: u32,
599    /// Global energy budget (total number of fork operations allowed).
600    pub global_energy: i64,
601    /// Optional adaptive forking configuration.
602    /// When `None`, uses fixed `timelines_per_split` (backward compatible).
603    /// When `Some`, uses coverage-yield-driven batch forking with 3-level energy.
604    pub adaptive: Option<split_loop::AdaptiveConfig>,
605    /// Optional parallelism for multi-core exploration.
606    /// When `None`, children are forked and reaped sequentially (one core).
607    /// When `Some`, a sliding window of concurrent children uses multiple cores.
608    pub parallelism: Option<split_loop::Parallelism>,
609}
610
611/// Initialize assertion table and each-bucket shared memory only.
612///
613/// This allocates the shared memory regions needed for assertion tracking
614/// without requiring a full exploration context. Idempotent (no-op if
615/// already initialized).
616///
617/// # Errors
618///
619/// Returns an error if shared memory allocation fails.
620pub fn init_assertions() -> Result<(), std::io::Error> {
621    let current = ASSERTION_TABLE.with(|c| c.get());
622    if !current.is_null() {
623        return Ok(()); // Already initialized
624    }
625
626    let table_ptr = shared_mem::alloc_shared(assertion_slots::ASSERTION_TABLE_MEM_SIZE)?;
627    let each_bucket_ptr = shared_mem::alloc_shared(each_buckets::EACH_BUCKET_MEM_SIZE)?;
628
629    ASSERTION_TABLE.with(|c| c.set(table_ptr));
630    EACH_BUCKET_PTR.with(|c| c.set(each_bucket_ptr));
631
632    Ok(())
633}
634
635/// Free assertion table and each-bucket shared memory.
636///
637/// Nulls the pointers after freeing. No-op if not initialized.
638pub fn cleanup_assertions() {
639    unsafe {
640        let table_ptr = ASSERTION_TABLE.with(|c| c.get());
641        if !table_ptr.is_null() {
642            shared_mem::free_shared(table_ptr, assertion_slots::ASSERTION_TABLE_MEM_SIZE);
643            ASSERTION_TABLE.with(|c| c.set(std::ptr::null_mut()));
644        }
645
646        let each_bucket_ptr = EACH_BUCKET_PTR.with(|c| c.get());
647        if !each_bucket_ptr.is_null() {
648            shared_mem::free_shared(each_bucket_ptr, each_buckets::EACH_BUCKET_MEM_SIZE);
649            EACH_BUCKET_PTR.with(|c| c.set(std::ptr::null_mut()));
650        }
651    }
652}
653
654/// Zero assertion table memory for between-run resets.
655///
656/// No-op if not initialized.
657pub fn reset_assertions() {
658    let table_ptr = ASSERTION_TABLE.with(|c| c.get());
659    if !table_ptr.is_null() {
660        unsafe {
661            std::ptr::write_bytes(table_ptr, 0, assertion_slots::ASSERTION_TABLE_MEM_SIZE);
662        }
663    }
664
665    let each_bucket_ptr = EACH_BUCKET_PTR.with(|c| c.get());
666    if !each_bucket_ptr.is_null() {
667        unsafe {
668            std::ptr::write_bytes(each_bucket_ptr, 0, each_buckets::EACH_BUCKET_MEM_SIZE);
669        }
670    }
671}
672
673/// Prepare the exploration framework for the next seed in multi-seed exploration.
674///
675/// Preserves the explored map (cumulative coverage) and assertion watermarks/frontiers
676/// so subsequent seeds benefit from prior discovery context. Resets per-seed transient
677/// state: energy budgets, statistics, split triggers, pass/fail counters, coverage
678/// bitmap, and bug recipe.
679///
680/// Call this between seeds instead of [`reset_assertions`] when you want
681/// coverage-preserving multi-seed exploration.
682pub fn prepare_next_seed(per_seed_energy: i64) {
683    // 1. PRESERVE explored map — do nothing
684
685    // 2. Selectively reset assertion table slots:
686    //    PRESERVE: watermark, split_watermark, frontier (quality context),
687    //              pass_count, fail_count (cumulative — needed by validate_assertion_contracts)
688    //    RESET: split_triggered (per-seed, controls re-forking)
689    let table_ptr = ASSERTION_TABLE.with(|c| c.get());
690    if !table_ptr.is_null() {
691        unsafe {
692            let count_ptr = table_ptr as *const std::sync::atomic::AtomicU32;
693            let count = (*count_ptr)
694                .load(std::sync::atomic::Ordering::Relaxed)
695                .min(assertion_slots::MAX_ASSERTION_SLOTS as u32) as usize;
696            let base = table_ptr.add(8) as *mut assertion_slots::AssertionSlot;
697            for i in 0..count {
698                let slot = &mut *base.add(i);
699                // Skip tombstones (msg_hash == 0) left by the duplicate-slot race fix.
700                if slot.msg_hash == 0 {
701                    continue;
702                }
703                slot.split_triggered = 0;
704                // pass_count/fail_count accumulate across seeds — needed by
705                // validate_assertion_contracts() to avoid false "was never reached"
706                // when a seed doesn't reach a guarded assertion (e.g. invariants
707                // behind `if let Some(model)`).  Forking uses split_triggered,
708                // not counts.
709            }
710        }
711    }
712
713    // 3. Selectively reset each-bucket slots:
714    //    PRESERVE: best_score (quality watermark), key_values, msg, pass_count
715    //    RESET: split_triggered (per-seed transient)
716    let each_ptr = EACH_BUCKET_PTR.with(|c| c.get());
717    if !each_ptr.is_null() {
718        unsafe {
719            let count_ptr = each_ptr as *const std::sync::atomic::AtomicU32;
720            let count = (*count_ptr)
721                .load(std::sync::atomic::Ordering::Relaxed)
722                .min(each_buckets::MAX_EACH_BUCKETS as u32) as usize;
723            let base = each_ptr.add(8) as *mut each_buckets::EachBucket;
724            for i in 0..count {
725                let bucket = &mut *base.add(i);
726                bucket.split_triggered = 0;
727                // pass_count accumulates across seeds (same rationale as assertion slots)
728            }
729        }
730    }
731
732    // 4. Zero coverage bitmap (per-timeline, NOT the explored map)
733    let bm_ptr = COVERAGE_BITMAP_PTR.with(|c| c.get());
734    if !bm_ptr.is_null() {
735        unsafe {
736            std::ptr::write_bytes(bm_ptr, 0, coverage::COVERAGE_MAP_SIZE);
737        }
738    }
739
740    // 4b. Clear sancov transfer buffer and reset BSS counters.
741    //     History map is preserved (cumulative, like the explored map).
742    sancov::clear_transfer_buffer();
743    sancov::reset_bss_counters();
744
745    // 5. Reset energy budget with fresh per-seed energy
746    let energy_ptr = ENERGY_BUDGET_PTR.with(|c| c.get());
747    if !energy_ptr.is_null() {
748        unsafe {
749            energy::reset_energy_budget(energy_ptr, per_seed_energy);
750        }
751    }
752
753    // 6. Reset shared stats
754    let stats_ptr = SHARED_STATS.with(|c| c.get());
755    if !stats_ptr.is_null() {
756        unsafe {
757            shared_stats::reset_shared_stats(stats_ptr, per_seed_energy);
758        }
759    }
760
761    // 7. Reset bug recipe
762    let recipe_ptr = SHARED_RECIPE.with(|c| c.get());
763    if !recipe_ptr.is_null() {
764        unsafe {
765            shared_stats::reset_shared_recipe(recipe_ptr);
766        }
767    }
768
769    // 8. Reset context for new seed, mark as warm start
770    context::with_ctx_mut(|ctx| {
771        ctx.is_child = false;
772        ctx.depth = 0;
773        ctx.current_seed = 0;
774        ctx.recipe.clear();
775        ctx.warm_start = true;
776    });
777}
778
779/// Count the number of bits set in the explored coverage map.
780///
781/// Returns `None` if the explored map is not initialized (exploration not active).
782/// The result represents how many unique assertion paths have been explored
783/// across all timelines.
784///
785/// Must be called before [`cleanup`] which frees the explored map memory.
786pub fn explored_map_bits_set() -> Option<u32> {
787    let ptr = EXPLORED_MAP_PTR.with(|c| c.get());
788    if ptr.is_null() {
789        return None;
790    }
791    // Safety: ptr was allocated during init() with COVERAGE_MAP_SIZE bytes.
792    let vm = unsafe { coverage::ExploredMap::new(ptr) };
793    Some(vm.count_bits_set())
794}
795
796/// Initialize the exploration framework.
797///
798/// Allocates shared memory for cross-process state and activates exploration.
799/// Must be called after [`set_rng_hooks`] and before the simulation loop.
800///
801/// # Errors
802///
803/// Returns an error if shared memory allocation fails.
804pub fn init(config: ExplorationConfig) -> Result<(), std::io::Error> {
805    // Initialize assertion table first (idempotent)
806    init_assertions()?;
807
808    // Initialize sancov shared memory (no-op if sancov unavailable)
809    sancov::init_sancov_shared()?;
810
811    // Allocate exploration-specific shared memory regions
812    let stats_ptr = shared_stats::init_shared_stats(config.global_energy)?;
813    let recipe_ptr = shared_stats::init_shared_recipe()?;
814    let explored_ptr = shared_mem::alloc_shared(coverage::COVERAGE_MAP_SIZE)?;
815    let bitmap_ptr = shared_mem::alloc_shared(coverage::COVERAGE_MAP_SIZE)?;
816
817    // Allocate energy budget if adaptive mode is configured
818    let energy_ptr = if let Some(ref adaptive) = config.adaptive {
819        energy::init_energy_budget(config.global_energy, adaptive.per_mark_energy)?
820    } else {
821        std::ptr::null_mut()
822    };
823
824    // Store pointers in thread-local context
825    SHARED_STATS.with(|c| c.set(stats_ptr));
826    SHARED_RECIPE.with(|c| c.set(recipe_ptr));
827    EXPLORED_MAP_PTR.with(|c| c.set(explored_ptr));
828    COVERAGE_BITMAP_PTR.with(|c| c.set(bitmap_ptr));
829    ENERGY_BUDGET_PTR.with(|c| c.set(energy_ptr));
830
831    // Activate exploration context
832    context::with_ctx_mut(|ctx| {
833        ctx.active = true;
834        ctx.is_child = false;
835        ctx.depth = 0;
836        ctx.max_depth = config.max_depth;
837        ctx.current_seed = 0;
838        ctx.recipe.clear();
839        ctx.timelines_per_split = config.timelines_per_split;
840        ctx.adaptive = config.adaptive.clone();
841        ctx.parallelism = config.parallelism.clone();
842        ctx.warm_start = false;
843    });
844
845    Ok(())
846}
847
848/// Clean up the exploration framework.
849///
850/// Frees all shared memory and deactivates exploration.
851/// Call after the simulation loop completes.
852pub fn cleanup() {
853    // Deactivate
854    context::with_ctx_mut(|ctx| {
855        ctx.active = false;
856    });
857
858    // Free exploration-specific shared memory regions
859    // Safety: these pointers were allocated by init() via alloc_shared()
860    unsafe {
861        let stats_ptr = SHARED_STATS.with(|c| c.get());
862        if !stats_ptr.is_null() {
863            shared_mem::free_shared(
864                stats_ptr as *mut u8,
865                std::mem::size_of::<shared_stats::SharedStats>(),
866            );
867            SHARED_STATS.with(|c| c.set(std::ptr::null_mut()));
868        }
869
870        let recipe_ptr = SHARED_RECIPE.with(|c| c.get());
871        if !recipe_ptr.is_null() {
872            shared_mem::free_shared(
873                recipe_ptr as *mut u8,
874                std::mem::size_of::<shared_stats::SharedRecipe>(),
875            );
876            SHARED_RECIPE.with(|c| c.set(std::ptr::null_mut()));
877        }
878
879        let explored_ptr = EXPLORED_MAP_PTR.with(|c| c.get());
880        if !explored_ptr.is_null() {
881            shared_mem::free_shared(explored_ptr, coverage::COVERAGE_MAP_SIZE);
882            EXPLORED_MAP_PTR.with(|c| c.set(std::ptr::null_mut()));
883        }
884
885        let bitmap_ptr = COVERAGE_BITMAP_PTR.with(|c| c.get());
886        if !bitmap_ptr.is_null() {
887            shared_mem::free_shared(bitmap_ptr, coverage::COVERAGE_MAP_SIZE);
888            COVERAGE_BITMAP_PTR.with(|c| c.set(std::ptr::null_mut()));
889        }
890
891        let energy_ptr = ENERGY_BUDGET_PTR.with(|c| c.get());
892        if !energy_ptr.is_null() {
893            shared_mem::free_shared(
894                energy_ptr as *mut u8,
895                std::mem::size_of::<energy::EnergyBudget>(),
896            );
897            ENERGY_BUDGET_PTR.with(|c| c.set(std::ptr::null_mut()));
898        }
899    }
900
901    // Clean up sancov shared memory
902    sancov::cleanup_sancov_shared();
903
904    // Clean up assertion table and each-buckets (shared with init_assertions)
905    cleanup_assertions();
906}
907
908#[cfg(test)]
909mod tests {
910    use super::*;
911
912    #[test]
913    fn test_init_cleanup_cycle() {
914        let config = ExplorationConfig {
915            max_depth: 2,
916            timelines_per_split: 4,
917            global_energy: 100,
918            adaptive: None,
919            parallelism: None,
920        };
921
922        init(config).expect("init failed");
923        assert!(context::explorer_is_active());
924        assert!(!context::explorer_is_child());
925
926        // Stats should be available
927        let stats = get_exploration_stats().expect("stats should be available");
928        assert_eq!(stats.global_energy, 100);
929        assert_eq!(stats.total_timelines, 0);
930        assert_eq!(stats.fork_points, 0);
931        assert_eq!(stats.bug_found, 0);
932
933        cleanup();
934        assert!(!context::explorer_is_active());
935        assert!(get_exploration_stats().is_none());
936    }
937
938    #[test]
939    fn test_inactive_by_default() {
940        assert!(!context::explorer_is_active());
941        assert!(!context::explorer_is_child());
942    }
943
944    #[test]
945    fn test_assertion_bool_noop_when_inactive() {
946        // Should not panic when assertion table is not initialized
947        assertion_bool(AssertKind::Sometimes, true, true, "test_assertion");
948    }
949
950    #[test]
951    fn test_init_assertions_standalone() {
952        init_assertions().expect("init_assertions failed");
953
954        // Table pointer should be set
955        let ptr = context::get_assertion_table_ptr();
956        assert!(!ptr.is_null());
957
958        // Idempotent — second call should be no-op
959        init_assertions().expect("init_assertions second call failed");
960
961        cleanup_assertions();
962
963        // Should be null after cleanup
964        let ptr = context::get_assertion_table_ptr();
965        assert!(ptr.is_null());
966    }
967
968    #[test]
969    fn test_backward_compat_no_adaptive() {
970        let config = ExplorationConfig {
971            max_depth: 2,
972            timelines_per_split: 4,
973            global_energy: 100,
974            adaptive: None,
975            parallelism: None,
976        };
977        init(config).expect("init failed");
978
979        // Energy budget pointer should be null (old path)
980        let has_energy = context::ENERGY_BUDGET_PTR.with(|c| !c.get().is_null());
981        assert!(!has_energy);
982
983        cleanup();
984    }
985}