Skip to main content

moonpool_explorer/
lib.rs

1//! Multiverse exploration for deterministic simulation testing.
2//!
3//! This crate discovers rare bugs by splitting simulation timelines at key
4//! moments. When your simulation reaches an interesting state for the first
5//! time (like a retry path firing, or a timeout being hit), the explorer
6//! forks the process and re-runs from that point with different randomness,
7//! creating a tree of alternate timelines.
8//!
9//! # Glossary
10//!
11//! ```text
12//! Term              What it means
13//! ────────────────  ──────────────────────────────────────────────────────
14//! Seed              A u64 that completely determines a simulation's randomness.
15//!                   Same seed = same coin flips = same execution every time.
16//!
17//! Timeline          One complete simulation run. A seed + a sequence of
18//!                   splitpoints uniquely identifies a timeline.
19//!
20//! Splitpoint        A moment where the explorer decides to branch the
21//!                   multiverse. Happens when a `sometimes` assertion
22//!                   succeeds for the first time, a numeric watermark
23//!                   improves, or a frontier advances.
24//!
25//! Multiverse        The tree of all timelines explored from one root seed.
26//!                   Each splitpoint creates new children with different seeds.
27//!
28//! Coverage bitmap   A small bitfield (8192 bits) that records which assertion
29//!                   paths a timeline touched. Used to detect whether a
30//!                   new timeline discovered anything its siblings didn't.
31//!
32//! Explored map      The union of all coverage bitmaps across all timelines.
33//!                   Lives in shared memory. "Has this ever been seen?"
34//!
35//! Energy budget     A finite pool that limits how many timelines the explorer
36//!                   can spawn. Prevents runaway forking.
37//!
38//! Watermark         The best numeric value ever observed for a given assertion.
39//!                   Improving the watermark triggers a new splitpoint.
40//!
41//! Frontier          For compound boolean assertions: the max number of
42//!                   conditions simultaneously true. Advancing it triggers
43//!                   a splitpoint.
44//!
45//! Recipe            The sequence of splitpoints that leads to a specific
46//!                   timeline. If a bug is found, its recipe lets you
47//!                   replay exactly how to reach it.
48//!
49//! Mark              An assertion site that can trigger splitpoints.
50//!                   Each mark has a name, a shared-memory slot,
51//!                   and (in adaptive mode) its own energy allowance.
52//! ```
53//!
54//! # The big idea
55//!
56//! A normal simulation picks one seed and runs one timeline:
57//!
58//! ```text
59//! Seed 42 ──────────────────────────────────────────────────▶ done
60//!   RNG call #1    #2    #3    #4   ...   #500
61//! ```
62//!
63//! But bugs hide in rare states. Maybe a retry only fires 5% of the time,
64//! and the bug only appears when two retries happen in the same run. With
65//! single-seed testing, you might run thousands of seeds and never hit it.
66//!
67//! Multiverse exploration changes the strategy. Instead of running many
68//! independent seeds, it dives deeper into promising seeds:
69//!
70//! ```text
71//! Seed 42 ─────────────┬── splitpoint! ─────────────────────▶ done
72//!   RNG call #1 ... #200│
73//!                       │
74//!                       ├── Timeline A (new seed) ──────────▶ done
75//!                       ├── Timeline B (new seed) ──┬───────▶ done
76//!                       │                           │
77//!                       │                    nested splitpoint!
78//!                       │                           │
79//!                       │                           ├── Timeline B1 ──▶ done
80//!                       │                           └── Timeline B2 ──▶ BUG!
81//!                       │
82//!                       └── Timeline C (new seed) ──────────▶ done
83//! ```
84//!
85//! The key insight: if a seed reached an interesting state (the retry fired),
86//! running from that point with different randomness is more likely to find
87//! a second interesting event than starting from scratch.
88//!
89//! # How the simulation seed works
90//!
91//! Every decision in the simulation (latency, failure, timeout) comes from
92//! a deterministic random number generator (RNG) seeded with a single `u64`.
93//! The RNG is a counter: every call to it increments a call count.
94//!
95//! ```text
96//! seed = 42
97//! RNG call #1 → 0.73  (used for: network latency)
98//! RNG call #2 → 0.12  (used for: should this connection fail?)
99//! RNG call #3 → 0.89  (used for: timer jitter)
100//! ...
101//! RNG call #N → 0.41  (used for: partition probability)
102//! ```
103//!
104//! Same seed = same sequence = same execution. This is what makes bugs
105//! reproducible.
106//!
107//! # What triggers a splitpoint?
108//!
109//! Not every assertion triggers exploration. Only *discovery* assertions do,
110//! and only when they discover something new:
111//!
112//! ```text
113//! Assertion kind          Triggers a splitpoint when...
114//! ──────────────────────  ────────────────────────────────────────────
115//! assert_sometimes!       The condition is true for the FIRST time
116//! assert_reachable!       The code path is reached for the FIRST time
117//! assert_sometimes_gt!    The observed value beats the previous watermark
118//! assert_sometimes_all!   More conditions are true simultaneously than ever
119//! assert_sometimes_each!  A new identity-key combination is seen, or
120//!                         the quality score improves for a known one
121//!
122//! assert_always!          NEVER (invariant, not a discovery)
123//! assert_unreachable!     NEVER (safety check, not a discovery)
124//! ```
125//!
126//! The "first time" check uses a CAS (compare-and-swap) on a `split_triggered`
127//! flag in shared memory. Once a mark has triggered, it won't trigger again
128//! (except for numeric watermarks and frontiers, which can trigger multiple
129//! times as they improve).
130//!
131//! # How a splitpoint works (step by step)
132//!
133//! When `assert_sometimes!(retry_fired, "server retried request")` is `true`
134//! for the first time at RNG call #200 with seed 42:
135//!
136//! ```text
137//!  Step 1: CAS split_triggered from 0 → 1        (first-time guard)
138//!  Step 2: Record splitpoint position              (RNG call count = 200)
139//!  Step 3: Check energy budget                     (can we afford to split?)
140//!  Step 4: Save parent's coverage bitmap to stack  (we'll restore it later)
141//!
142//!  For each new timeline (e.g., timelines_per_split = 3):
143//!  ┌──────────────────────────────────────────────────────────────────┐
144//!  │ Step 5: Clear child coverage bitmap                             │
145//!  │ Step 6: Compute child seed = FNV-1a(parent_seed, mark, index)   │
146//!  │ Step 7: fork()  ←── OS-level process fork (copy-on-write)       │
147//!  │                                                                  │
148//!  │   CHILD (pid=0):                                                │
149//!  │     - Reseed RNG with child_seed                                │
150//!  │     - Set is_child = true, depth += 1                           │
151//!  │     - Append (200, child_seed) to recipe                        │
152//!  │     - Return from split function                                │
153//!  │     - Continue simulation with NEW randomness ─────▶ eventually │
154//!  │       calls exit_child(0) or exit_child(42) if bug              │
155//!  │                                                                  │
156//!  │   PARENT (pid>0):                                               │
157//!  │     Sequential mode:                                            │
158//!  │       - waitpid(pid) — blocks until THIS child finishes         │
159//!  │       - Merge child's coverage bitmap into explored map         │
160//!  │       - If child exited with code 42 → record bug recipe        │
161//!  │       - Loop to next child                                      │
162//!  │     Parallel mode:                                              │
163//!  │       - Track child in active set                               │
164//!  │       - When slots are full → reap_one() via waitpid(-1)        │
165//!  │       - Continue forking until all children spawned             │
166//!  │       - Drain remaining active children at the end              │
167//!  └──────────────────────────────────────────────────────────────────┘
168//!
169//!  Step 8: Restore parent's coverage bitmap
170//!  Step 9: Parent continues its own simulation normally
171//! ```
172//!
173//! The child seed is deterministic: `FNV-1a(parent_seed + mark_name + child_index)`.
174//! So the multiverse tree is fully reproducible.
175//!
176//! # The coverage bitmap: "did this timeline find anything new?"
177//!
178//! Each timeline gets a small bitmap (1024 bytes = 8192 bits). When an
179//! assertion fires, it sets a bit at position `hash(assertion_name) % 8192`:
180//!
181//! ```text
182//! Bitmap for Timeline A:
183//! byte 0          byte 1          byte 2
184//! ┌─┬─┬─┬─┬─┬─┬─┬─┐┌─┬─┬─┬─┬─┬─┬─┬─┐┌─┬─┬─┬─┬─┬─┬─┬─┐
185//! │0│0│1│0│0│0│0│0││0│0│0│0│0│1│0│0││0│0│0│0│0│0│0│0│ ...
186//! └─┴─┴─┴─┴─┴─┴─┴─┘└─┴─┴─┴─┴─┴─┴─┴─┘└─┴─┴─┴─┴─┴─┴─┴─┘
187//!       ▲                       ▲
188//!       bit 2                   bit 13
189//!       (retry_fired)           (timeout_hit)
190//! ```
191//!
192//! The **explored map** is the union (OR) of all bitmaps across all timelines.
193//! It lives in `MAP_SHARED` memory so all forked processes can see it:
194//!
195//! ```text
196//! After Timeline A:   explored = 00100000 00000100 00000000 ...
197//! After Timeline B:   explored = 00100000 00000110 00000000 ...
198//!                                                 ▲
199//!                                      Timeline B found bit 14!
200//!                                      (that's new coverage)
201//! ```
202//!
203//! **How "has new bits" works:**
204//!
205//! For each byte: `(child_byte & !explored_byte) != 0`
206//!
207//! ```text
208//! child    = 00000110     (bits 1, 2 set)
209//! explored = 00000100     (bit 2 already known)
210//! !explored = 11111011
211//! child & !explored = 00000010  ← bit 1 is NEW!  → has_new_bits = true
212//! ```
213//!
214//! This is used in **adaptive mode** to decide whether a mark is still
215//! productive (finding new paths) or barren (just repeating known paths).
216//!
217//! # The energy budget: "how much exploring can we do?"
218//!
219//! Without a limit, the explorer could fork forever (exponential blowup).
220//! The energy budget caps the total number of timelines spawned.
221//!
222//! ## Fixed-count mode (simple)
223//!
224//! One global counter. Each timeline costs 1 energy. When it reaches 0,
225//! no more timelines are spawned:
226//!
227//! ```text
228//! global_energy: 10
229//!
230//! Split at mark A: spawn 3 timelines → energy = 7
231//! Split at mark B: spawn 3 timelines → energy = 4
232//! Split at mark C: spawn 3 timelines → energy = 1
233//! Split at mark D: spawn 1 timeline  → energy = 0
234//! Split at mark E: no energy left, skip
235//! ```
236//!
237//! Configure with:
238//! ```ignore
239//! ExplorationConfig {
240//!     max_depth: 2,
241//!     timelines_per_split: 3,
242//!     global_energy: 50,
243//!     adaptive: None,      // ← fixed-count mode
244//!     parallelism: None,   // ← sequential (or Some(Parallelism::MaxCores))
245//! }
246//! ```
247//!
248//! ## Adaptive mode (smart)
249//!
250//! A 3-level energy system that automatically gives more budget to productive
251//! marks and takes budget away from barren ones:
252//!
253//! ```text
254//! ┌──────────────────────────────────────────────────────────────┐
255//! │                    GLOBAL ENERGY (100)                       │
256//! │  Every timeline consumes 1 from here first.                 │
257//! │  When this hits 0, ALL exploration stops.                   │
258//! ├──────────────────────────────────────────────────────────────┤
259//! │                                                              │
260//! │  ┌─────────────────┐  ┌─────────────────┐  ┌────────────┐  │
261//! │  │  Mark A: 15     │  │  Mark B: 15     │  │ Mark C: 15 │  │
262//! │  │  (productive)   │  │  (barren)       │  │ (new)      │  │
263//! │  │  Used: 15/15    │  │  Used: 3/15     │  │ Used: 0/15 │  │
264//! │  │  Needs more! ───┼──┼─── Returns 12 ──┼──┤            │  │
265//! │  └────────┬────────┘  └─────────────────┘  └────────────┘  │
266//! │           │                     │                            │
267//! │           ▼                     ▼                            │
268//! │  ┌──────────────────────────────────────────────────────┐   │
269//! │  │           REALLOCATION POOL: 12                      │   │
270//! │  │  Energy returned by barren marks.                    │   │
271//! │  │  Productive marks draw from here when their          │   │
272//! │  │  per-mark budget runs out.                           │   │
273//! │  └──────────────────────────────────────────────────────┘   │
274//! └──────────────────────────────────────────────────────────────┘
275//! ```
276//!
277//! **How it decides if a mark is productive or barren:**
278//!
279//! Timelines are spawned in batches (e.g., 4 at a time). After each batch,
280//! the explorer checks the coverage bitmap:
281//!
282//! ```text
283//! Mark "retry_path":
284//!   Batch 1: spawn 4 timelines → check bitmap → 2 found new bits ✓
285//!   Batch 2: spawn 4 timelines → check bitmap → 1 found new bits ✓
286//!   Batch 3: spawn 4 timelines → check bitmap → 0 found new bits ✗
287//!     → Mark is BARREN. Return remaining per-mark energy to pool.
288//!     → Stop splitting at this mark.
289//!
290//! Mark "partition_heal":
291//!   Batch 1: spawn 4 timelines → check bitmap → 3 found new bits ✓
292//!   Batch 2: spawn 4 timelines → check bitmap → 2 found new bits ✓
293//!   ...continues until per-mark budget exhausted...
294//!   ...draws from reallocation pool for more...
295//!   Batch 6: spawn 4 timelines → hits max_timelines (20) → stop
296//! ```
297//!
298//! **The 3-level energy decrement (for each timeline):**
299//!
300//! ```text
301//! 1. Decrement global_remaining by 1
302//!    └─ If global is at 0 → STOP (hard cap, game over)
303//!
304//! 2. Decrement per_mark[slot] by 1
305//!    └─ If per_mark has budget → OK, continue
306//!    └─ If per_mark is at 0 → try step 3
307//!
308//! 3. Decrement realloc_pool by 1
309//!    └─ If pool has budget → OK, continue
310//!    └─ If pool is at 0 → UNDO global decrement, STOP for this mark
311//! ```
312//!
313//! Configure with:
314//! ```ignore
315//! ExplorationConfig {
316//!     max_depth: 3,
317//!     timelines_per_split: 4,  // ignored in adaptive mode
318//!     global_energy: 200,
319//!     adaptive: Some(AdaptiveConfig {
320//!         batch_size: 4,        // timelines per batch before checking yield
321//!         min_timelines: 4,     // minimum timelines even if barren
322//!         max_timelines: 20,    // hard cap per mark
323//!         per_mark_energy: 15,  // initial budget per mark
324//!     }),
325//!     parallelism: Some(Parallelism::MaxCores),  // use all CPU cores
326//! }
327//! ```
328//!
329//! # Complete walkthrough: from seed to bug
330//!
331//! Say you're testing a distributed lock service with 3 nodes. Your test has:
332//!
333//! ```ignore
334//! assert_sometimes!(lock_acquired_after_retry, "lock retry succeeded");
335//! assert_sometimes!(saw_split_brain, "split brain detected");
336//! assert_always!(no_double_grant, "lock never granted to two nodes");
337//! ```
338//!
339//! **Iteration 1: Seed 42, exploration enabled**
340//!
341//! ```text
342//! Root timeline (seed=42, depth=0)
343//! │
344//! │  RNG#1..#150: normal execution, no interesting events
345//! │
346//! │  RNG#151: lock_acquired_after_retry = true  ← FIRST TIME!
347//! │           CAS split_triggered 0→1 succeeds
348//! │           Splitpoint triggered at RNG call #151
349//! │
350//! ├── Timeline T0 (seed=FNV(42,"lock retry",0), depth=1)
351//! │   │  Reseed RNG, continue from same program state
352//! │   │  Different randomness → different network delays
353//! │   │  RNG#1..#80: saw_split_brain = true  ← FIRST TIME (for children)
354//! │   │  CAS split_triggered 0→1 succeeds
355//! │   │  Nested splitpoint at RNG call #80!
356//! │   │
357//! │   ├── Timeline T0-0 (depth=2)
358//! │   │     no_double_grant = false  ← BUG!
359//! │   │     exit_child(42)  ← special code for "bug found"
360//! │   │
361//! │   ├── Timeline T0-1 (depth=2)
362//! │   │     runs to completion, no bug
363//! │   │     exit_child(0)
364//! │   │
365//! │   └── Timeline T0-2 (depth=2)
366//! │         runs to completion, no bug
367//! │         exit_child(0)
368//! │   │
369//! │   │  T0 continues after children finish
370//! │   │  exit_child(0)
371//! │
372//! ├── Timeline T1 (seed=FNV(42,"lock retry",1), depth=1)
373//! │     runs to completion, nothing new
374//! │     exit_child(0)
375//! │
376//! └── Timeline T2 (seed=FNV(42,"lock retry",2), depth=1)
377//!       runs to completion, nothing new
378//!       exit_child(0)
379//!
380//! Root continues after all children finish.
381//! Bug found! Recipe saved.
382//! ```
383//!
384//! **The bug recipe** is the path from root to the buggy timeline:
385//!
386//! ```text
387//! Recipe: [(151, seed_T0), (80, seed_T0_0)]
388//! Formatted: "151@<seed_T0> -> 80@<seed_T0_0>"
389//! ```
390//!
391//! To replay: set the root seed to 42, install RNG breakpoints at
392//! call count 151 (reseed to seed_T0) and then at call count 80
393//! (reseed to seed_T0_0). The simulation will follow the exact same
394//! path to the bug every time.
395//!
396//! # How fork() makes this work
397//!
398//! The explorer uses Unix `fork()` to create timeline branches. This is
399//! cheap because of copy-on-write (COW): the child gets a copy of the
400//! parent's entire memory without actually copying it. Pages are only
401//! copied when one side writes to them.
402//!
403//! ```text
404//! Parent process memory:
405//! ┌──────────────────────────────────────────────────┐
406//! │  Simulation state (actors, network, timers)      │  ← COW pages
407//! │  RNG state (will be reseeded in child)           │  ← COW pages
408//! ├──────────────────────────────────────────────────┤
409//! │  MAP_SHARED memory:                              │  ← truly shared
410//! │    - Assertion table (128 slots)                 │
411//! │    - Coverage bitmap pool (1 slot per core)      │
412//! │    - Explored map (union of all coverage)        │
413//! │    - Energy budget                               │
414//! │    - Fork stats + bug recipe                     │
415//! └──────────────────────────────────────────────────┘
416//!                      │
417//!                    fork()
418//!                      │
419//!            ┌─────────┴─────────┐
420//!            ▼                   ▼
421//!     Parent (reaps)      Child (continues)
422//!     - waitpid()         - rng_reseed(new_seed)
423//!     - merge coverage    - run simulation
424//!     - check exit code   - exit_child(0 or 42)
425//! ```
426//!
427//! `MAP_SHARED` memory is the only communication channel between parent
428//! and child. Assertion counters, coverage bits, energy budgets, and bug
429//! recipes all live there. This is why the crate only depends on `libc`.
430//!
431//! # Parallel exploration (multi-core)
432//!
433//! By default, the fork loop is sequential: fork one child, wait for it,
434//! merge coverage, repeat. This uses only one CPU core.
435//!
436//! With `parallelism: Some(Parallelism::MaxCores)`, the explorer runs
437//! multiple children concurrently using a **sliding window** capped at
438//! the number of available CPU cores:
439//!
440//! ```text
441//! Sequential (parallelism: None):
442//!
443//!   fork child 0 --- waitpid --- fork child 1 --- waitpid --- ...
444//!   core 0 busy     core 0 idle  core 0 busy     core 0 idle
445//!
446//! Parallel (parallelism: Some(Parallelism::MaxCores)):
447//!
448//!   fork child 0 ──────────────────────── reap
449//!   fork child 1 ─────────────────── reap
450//!   fork child 2 ────────────── reap
451//!   fork child 3 ─────── reap
452//!   all cores busy until drained
453//! ```
454//!
455//! The parent uses `waitpid(-1)` to reap whichever child finishes first,
456//! then recycles that slot for the next child.
457//!
458//! ## How the bitmap pool works
459//!
460//! In sequential mode, children share one coverage bitmap (cleared between
461//! each child). In parallel mode, concurrent children would clobber each
462//! other's bitmap, so each child gets its own **bitmap pool slot**:
463//!
464//! ```text
465//! Bitmap pool (MAP_SHARED, allocated lazily on first parallel split):
466//! ┌─────────────┬─────────────┬─────────────┬─────────────┐
467//! │  Slot 0     │  Slot 1     │  Slot 2     │  Slot 3     │
468//! │  1024 bytes │  1024 bytes │  1024 bytes │  1024 bytes │
469//! │  child A    │  child B    │  child C    │  (free)     │
470//! └─────────────┴─────────────┴─────────────┴─────────────┘
471//! ```
472//!
473//! When a child finishes (`waitpid(-1)` returns its PID), the parent:
474//! 1. Looks up which slot that child used
475//! 2. Merges the slot's bitmap into the explored map
476//! 3. Recycles the slot for the next child
477//!
478//! Children that themselves become parents (nested splits) allocate their
479//! own fresh pool -- the pool pointer is reset to null in each child at
480//! fork time.
481//!
482//! ## Parallelism variants
483//!
484//! ```text
485//! Variant              Slot count
486//! ───────────────────  ─────────────────────────────────────
487//! MaxCores             All available CPU cores
488//! HalfCores            Half of available cores (rounded up, min 1)
489//! Cores(n)             Exactly n cores
490//! MaxCoresMinus(n)     All cores minus n (min 1)
491//! ```
492//!
493//! ## Concurrency safety
494//!
495//! All `MAP_SHARED` state uses atomic operations that are safe across
496//! `fork()` boundaries:
497//! - **Assertion slots**: CAS for first-time discovery, `fetch_add` for counters
498//! - **Energy budget**: `fetch_sub`/`fetch_add` with rollback on failure
499//! - **Bug recipe**: CAS first-bug-wins -- only the first bug is recorded
500//! - **Stats counters**: `fetch_add` -- concurrent increments are safe
501//! - **Explored map merge**: Done by the parent AFTER `waitpid`, never concurrent
502//!
503//! # Architecture
504//!
505//! ```text
506//! moonpool-explorer (this crate)  ── leaf, only depends on libc
507//!      ▲
508//! moonpool-sim                    ── wires RNG hooks at init
509//! ```
510//!
511//! Communication with the simulation's RNG uses two function pointers
512//! set via [`set_rng_hooks`]:
513//! - `get_count: fn() -> u64` — how many RNG calls have happened
514//! - `reseed: fn(u64)` — replace the RNG seed and reset the counter
515//!
516//! This crate has zero knowledge of moonpool internals. It doesn't know
517//! about actors, networks, or storage. It only knows about RNG call counts,
518//! seeds, and assertion slot positions.
519//!
520//! # Module overview
521//!
522//! ```text
523//! lib.rs             ── ExplorationConfig, init/cleanup, this documentation
524//! split_loop.rs      ── The fork loop: sequential + parallel sliding window,
525//!                       adaptive batching, Parallelism enum
526//! assertion_slots.rs ── Shared-memory assertion table (128 slots)
527//! each_buckets.rs    ── Per-value bucketed assertions (256 buckets)
528//! coverage.rs        ── CoverageBitmap + ExploredMap (8192-bit bitmaps)
529//! energy.rs          ── 3-level energy budget (global + per-mark + realloc pool)
530//! context.rs         ── Thread-local state, RNG hooks, bitmap pool pointers
531//! shared_stats.rs    ── Cross-process counters (timelines, fork_points, bugs)
532//! shared_mem.rs      ── mmap(MAP_SHARED|MAP_ANONYMOUS) allocation
533//! replay.rs          ── Recipe formatting and parsing ("151@seed -> 80@seed")
534//! ```
535//!
536//! # Usage
537//!
538//! ```ignore
539//! // In moonpool-sim's runner:
540//! moonpool_explorer::set_rng_hooks(get_rng_call_count, |seed| {
541//!     set_sim_seed(seed);
542//!     reset_rng_call_count();
543//! });
544//!
545//! moonpool_explorer::init(ExplorationConfig {
546//!     max_depth: 2,
547//!     timelines_per_split: 4,
548//!     global_energy: 100,
549//!     adaptive: None,
550//!     parallelism: Some(Parallelism::MaxCores),
551//! })?;
552//!
553//! // ... run simulation ...
554//!
555//! moonpool_explorer::cleanup();
556//! ```
557
558#![deny(missing_docs)]
559#![deny(clippy::unwrap_used)]
560
561pub mod assertion_slots;
562pub mod context;
563pub mod coverage;
564pub mod each_buckets;
565pub mod energy;
566pub mod replay;
567pub mod shared_mem;
568pub mod shared_stats;
569pub mod split_loop;
570
571// Re-exports for the public API
572pub use assertion_slots::{
573    ASSERTION_TABLE_MEM_SIZE, AssertCmp, AssertKind, AssertionSlot, AssertionSlotSnapshot,
574    assertion_bool, assertion_numeric, assertion_read_all, assertion_sometimes_all, msg_hash,
575};
576pub use context::{explorer_is_child, get_assertion_table_ptr, set_rng_hooks};
577pub use each_buckets::{
578    EachBucket, assertion_sometimes_each, each_bucket_read_all, unpack_quality,
579};
580pub use replay::{ParseTimelineError, format_timeline, parse_timeline};
581pub use shared_stats::{ExplorationStats, get_bug_recipe, get_exploration_stats};
582pub use split_loop::{AdaptiveConfig, Parallelism, exit_child};
583
584use context::{
585    ASSERTION_TABLE, COVERAGE_BITMAP_PTR, EACH_BUCKET_PTR, ENERGY_BUDGET_PTR, EXPLORED_MAP_PTR,
586    SHARED_RECIPE, SHARED_STATS,
587};
588
589/// Configuration for exploration.
590#[derive(Debug, Clone)]
591pub struct ExplorationConfig {
592    /// Maximum fork depth (0 = no forking).
593    pub max_depth: u32,
594    /// Number of children to fork at each discovery point (fixed-count mode).
595    pub timelines_per_split: u32,
596    /// Global energy budget (total number of fork operations allowed).
597    pub global_energy: i64,
598    /// Optional adaptive forking configuration.
599    /// When `None`, uses fixed `timelines_per_split` (backward compatible).
600    /// When `Some`, uses coverage-yield-driven batch forking with 3-level energy.
601    pub adaptive: Option<split_loop::AdaptiveConfig>,
602    /// Optional parallelism for multi-core exploration.
603    /// When `None`, children are forked and reaped sequentially (one core).
604    /// When `Some`, a sliding window of concurrent children uses multiple cores.
605    pub parallelism: Option<split_loop::Parallelism>,
606}
607
608/// Initialize assertion table and each-bucket shared memory only.
609///
610/// This allocates the shared memory regions needed for assertion tracking
611/// without requiring a full exploration context. Idempotent (no-op if
612/// already initialized).
613///
614/// # Errors
615///
616/// Returns an error if shared memory allocation fails.
617pub fn init_assertions() -> Result<(), std::io::Error> {
618    let current = ASSERTION_TABLE.with(|c| c.get());
619    if !current.is_null() {
620        return Ok(()); // Already initialized
621    }
622
623    let table_ptr = shared_mem::alloc_shared(assertion_slots::ASSERTION_TABLE_MEM_SIZE)?;
624    let each_bucket_ptr = shared_mem::alloc_shared(each_buckets::EACH_BUCKET_MEM_SIZE)?;
625
626    ASSERTION_TABLE.with(|c| c.set(table_ptr));
627    EACH_BUCKET_PTR.with(|c| c.set(each_bucket_ptr));
628
629    Ok(())
630}
631
632/// Free assertion table and each-bucket shared memory.
633///
634/// Nulls the pointers after freeing. No-op if not initialized.
635pub fn cleanup_assertions() {
636    unsafe {
637        let table_ptr = ASSERTION_TABLE.with(|c| c.get());
638        if !table_ptr.is_null() {
639            shared_mem::free_shared(table_ptr, assertion_slots::ASSERTION_TABLE_MEM_SIZE);
640            ASSERTION_TABLE.with(|c| c.set(std::ptr::null_mut()));
641        }
642
643        let each_bucket_ptr = EACH_BUCKET_PTR.with(|c| c.get());
644        if !each_bucket_ptr.is_null() {
645            shared_mem::free_shared(each_bucket_ptr, each_buckets::EACH_BUCKET_MEM_SIZE);
646            EACH_BUCKET_PTR.with(|c| c.set(std::ptr::null_mut()));
647        }
648    }
649}
650
651/// Zero assertion table memory for between-run resets.
652///
653/// No-op if not initialized.
654pub fn reset_assertions() {
655    let table_ptr = ASSERTION_TABLE.with(|c| c.get());
656    if !table_ptr.is_null() {
657        unsafe {
658            std::ptr::write_bytes(table_ptr, 0, assertion_slots::ASSERTION_TABLE_MEM_SIZE);
659        }
660    }
661
662    let each_bucket_ptr = EACH_BUCKET_PTR.with(|c| c.get());
663    if !each_bucket_ptr.is_null() {
664        unsafe {
665            std::ptr::write_bytes(each_bucket_ptr, 0, each_buckets::EACH_BUCKET_MEM_SIZE);
666        }
667    }
668}
669
670/// Prepare the exploration framework for the next seed in multi-seed exploration.
671///
672/// Preserves the explored map (cumulative coverage) and assertion watermarks/frontiers
673/// so subsequent seeds benefit from prior discovery context. Resets per-seed transient
674/// state: energy budgets, statistics, split triggers, pass/fail counters, coverage
675/// bitmap, and bug recipe.
676///
677/// Call this between seeds instead of [`reset_assertions`] when you want
678/// coverage-preserving multi-seed exploration.
679pub fn prepare_next_seed(per_seed_energy: i64) {
680    // 1. PRESERVE explored map — do nothing
681
682    // 2. Selectively reset assertion table slots:
683    //    PRESERVE: watermark, split_watermark, frontier (quality context)
684    //    RESET: split_triggered, pass_count, fail_count (per-seed transient)
685    let table_ptr = ASSERTION_TABLE.with(|c| c.get());
686    if !table_ptr.is_null() {
687        unsafe {
688            let count_ptr = table_ptr as *const std::sync::atomic::AtomicU32;
689            let count = (*count_ptr)
690                .load(std::sync::atomic::Ordering::Relaxed)
691                .min(assertion_slots::MAX_ASSERTION_SLOTS as u32) as usize;
692            let base = table_ptr.add(8) as *mut assertion_slots::AssertionSlot;
693            for i in 0..count {
694                let slot = &mut *base.add(i);
695                slot.split_triggered = 0;
696                slot.pass_count = 0;
697                slot.fail_count = 0;
698            }
699        }
700    }
701
702    // 3. Selectively reset each-bucket slots:
703    //    PRESERVE: best_score (quality watermark), key_values, msg
704    //    RESET: split_triggered, pass_count (per-seed transient)
705    let each_ptr = EACH_BUCKET_PTR.with(|c| c.get());
706    if !each_ptr.is_null() {
707        unsafe {
708            let count_ptr = each_ptr as *const std::sync::atomic::AtomicU32;
709            let count = (*count_ptr)
710                .load(std::sync::atomic::Ordering::Relaxed)
711                .min(each_buckets::MAX_EACH_BUCKETS as u32) as usize;
712            let base = each_ptr.add(8) as *mut each_buckets::EachBucket;
713            for i in 0..count {
714                let bucket = &mut *base.add(i);
715                bucket.split_triggered = 0;
716                bucket.pass_count = 0;
717            }
718        }
719    }
720
721    // 4. Zero coverage bitmap (per-timeline, NOT the explored map)
722    let bm_ptr = COVERAGE_BITMAP_PTR.with(|c| c.get());
723    if !bm_ptr.is_null() {
724        unsafe {
725            std::ptr::write_bytes(bm_ptr, 0, coverage::COVERAGE_MAP_SIZE);
726        }
727    }
728
729    // 5. Reset energy budget with fresh per-seed energy
730    let energy_ptr = ENERGY_BUDGET_PTR.with(|c| c.get());
731    if !energy_ptr.is_null() {
732        unsafe {
733            energy::reset_energy_budget(energy_ptr, per_seed_energy);
734        }
735    }
736
737    // 6. Reset shared stats
738    let stats_ptr = SHARED_STATS.with(|c| c.get());
739    if !stats_ptr.is_null() {
740        unsafe {
741            shared_stats::reset_shared_stats(stats_ptr, per_seed_energy);
742        }
743    }
744
745    // 7. Reset bug recipe
746    let recipe_ptr = SHARED_RECIPE.with(|c| c.get());
747    if !recipe_ptr.is_null() {
748        unsafe {
749            shared_stats::reset_shared_recipe(recipe_ptr);
750        }
751    }
752
753    // 8. Reset context for new seed, mark as warm start
754    context::with_ctx_mut(|ctx| {
755        ctx.is_child = false;
756        ctx.depth = 0;
757        ctx.current_seed = 0;
758        ctx.recipe.clear();
759        ctx.warm_start = true;
760    });
761}
762
763/// Count the number of bits set in the explored coverage map.
764///
765/// Returns `None` if the explored map is not initialized (exploration not active).
766/// The result represents how many unique assertion paths have been explored
767/// across all timelines.
768///
769/// Must be called before [`cleanup`] which frees the explored map memory.
770pub fn explored_map_bits_set() -> Option<u32> {
771    let ptr = EXPLORED_MAP_PTR.with(|c| c.get());
772    if ptr.is_null() {
773        return None;
774    }
775    // Safety: ptr was allocated during init() with COVERAGE_MAP_SIZE bytes.
776    let vm = unsafe { coverage::ExploredMap::new(ptr) };
777    Some(vm.count_bits_set())
778}
779
780/// Initialize the exploration framework.
781///
782/// Allocates shared memory for cross-process state and activates exploration.
783/// Must be called after [`set_rng_hooks`] and before the simulation loop.
784///
785/// # Errors
786///
787/// Returns an error if shared memory allocation fails.
788pub fn init(config: ExplorationConfig) -> Result<(), std::io::Error> {
789    // Initialize assertion table first (idempotent)
790    init_assertions()?;
791
792    // Allocate exploration-specific shared memory regions
793    let stats_ptr = shared_stats::init_shared_stats(config.global_energy)?;
794    let recipe_ptr = shared_stats::init_shared_recipe()?;
795    let explored_ptr = shared_mem::alloc_shared(coverage::COVERAGE_MAP_SIZE)?;
796    let bitmap_ptr = shared_mem::alloc_shared(coverage::COVERAGE_MAP_SIZE)?;
797
798    // Allocate energy budget if adaptive mode is configured
799    let energy_ptr = if let Some(ref adaptive) = config.adaptive {
800        energy::init_energy_budget(config.global_energy, adaptive.per_mark_energy)?
801    } else {
802        std::ptr::null_mut()
803    };
804
805    // Store pointers in thread-local context
806    SHARED_STATS.with(|c| c.set(stats_ptr));
807    SHARED_RECIPE.with(|c| c.set(recipe_ptr));
808    EXPLORED_MAP_PTR.with(|c| c.set(explored_ptr));
809    COVERAGE_BITMAP_PTR.with(|c| c.set(bitmap_ptr));
810    ENERGY_BUDGET_PTR.with(|c| c.set(energy_ptr));
811
812    // Activate exploration context
813    context::with_ctx_mut(|ctx| {
814        ctx.active = true;
815        ctx.is_child = false;
816        ctx.depth = 0;
817        ctx.max_depth = config.max_depth;
818        ctx.current_seed = 0;
819        ctx.recipe.clear();
820        ctx.timelines_per_split = config.timelines_per_split;
821        ctx.adaptive = config.adaptive.clone();
822        ctx.parallelism = config.parallelism.clone();
823        ctx.warm_start = false;
824    });
825
826    Ok(())
827}
828
829/// Clean up the exploration framework.
830///
831/// Frees all shared memory and deactivates exploration.
832/// Call after the simulation loop completes.
833pub fn cleanup() {
834    // Deactivate
835    context::with_ctx_mut(|ctx| {
836        ctx.active = false;
837    });
838
839    // Free exploration-specific shared memory regions
840    // Safety: these pointers were allocated by init() via alloc_shared()
841    unsafe {
842        let stats_ptr = SHARED_STATS.with(|c| c.get());
843        if !stats_ptr.is_null() {
844            shared_mem::free_shared(
845                stats_ptr as *mut u8,
846                std::mem::size_of::<shared_stats::SharedStats>(),
847            );
848            SHARED_STATS.with(|c| c.set(std::ptr::null_mut()));
849        }
850
851        let recipe_ptr = SHARED_RECIPE.with(|c| c.get());
852        if !recipe_ptr.is_null() {
853            shared_mem::free_shared(
854                recipe_ptr as *mut u8,
855                std::mem::size_of::<shared_stats::SharedRecipe>(),
856            );
857            SHARED_RECIPE.with(|c| c.set(std::ptr::null_mut()));
858        }
859
860        let explored_ptr = EXPLORED_MAP_PTR.with(|c| c.get());
861        if !explored_ptr.is_null() {
862            shared_mem::free_shared(explored_ptr, coverage::COVERAGE_MAP_SIZE);
863            EXPLORED_MAP_PTR.with(|c| c.set(std::ptr::null_mut()));
864        }
865
866        let bitmap_ptr = COVERAGE_BITMAP_PTR.with(|c| c.get());
867        if !bitmap_ptr.is_null() {
868            shared_mem::free_shared(bitmap_ptr, coverage::COVERAGE_MAP_SIZE);
869            COVERAGE_BITMAP_PTR.with(|c| c.set(std::ptr::null_mut()));
870        }
871
872        let energy_ptr = ENERGY_BUDGET_PTR.with(|c| c.get());
873        if !energy_ptr.is_null() {
874            shared_mem::free_shared(
875                energy_ptr as *mut u8,
876                std::mem::size_of::<energy::EnergyBudget>(),
877            );
878            ENERGY_BUDGET_PTR.with(|c| c.set(std::ptr::null_mut()));
879        }
880    }
881
882    // Clean up assertion table and each-buckets (shared with init_assertions)
883    cleanup_assertions();
884}
885
886#[cfg(test)]
887mod tests {
888    use super::*;
889
890    #[test]
891    fn test_init_cleanup_cycle() {
892        let config = ExplorationConfig {
893            max_depth: 2,
894            timelines_per_split: 4,
895            global_energy: 100,
896            adaptive: None,
897            parallelism: None,
898        };
899
900        init(config).expect("init failed");
901        assert!(context::explorer_is_active());
902        assert!(!context::explorer_is_child());
903
904        // Stats should be available
905        let stats = get_exploration_stats().expect("stats should be available");
906        assert_eq!(stats.global_energy, 100);
907        assert_eq!(stats.total_timelines, 0);
908        assert_eq!(stats.fork_points, 0);
909        assert_eq!(stats.bug_found, 0);
910
911        cleanup();
912        assert!(!context::explorer_is_active());
913        assert!(get_exploration_stats().is_none());
914    }
915
916    #[test]
917    fn test_inactive_by_default() {
918        assert!(!context::explorer_is_active());
919        assert!(!context::explorer_is_child());
920    }
921
922    #[test]
923    fn test_assertion_bool_noop_when_inactive() {
924        // Should not panic when assertion table is not initialized
925        assertion_bool(AssertKind::Sometimes, true, true, "test_assertion");
926    }
927
928    #[test]
929    fn test_init_assertions_standalone() {
930        init_assertions().expect("init_assertions failed");
931
932        // Table pointer should be set
933        let ptr = context::get_assertion_table_ptr();
934        assert!(!ptr.is_null());
935
936        // Idempotent — second call should be no-op
937        init_assertions().expect("init_assertions second call failed");
938
939        cleanup_assertions();
940
941        // Should be null after cleanup
942        let ptr = context::get_assertion_table_ptr();
943        assert!(ptr.is_null());
944    }
945
946    #[test]
947    fn test_backward_compat_no_adaptive() {
948        let config = ExplorationConfig {
949            max_depth: 2,
950            timelines_per_split: 4,
951            global_energy: 100,
952            adaptive: None,
953            parallelism: None,
954        };
955        init(config).expect("init failed");
956
957        // Energy budget pointer should be null (old path)
958        let has_energy = context::ENERGY_BUDGET_PTR.with(|c| !c.get().is_null());
959        assert!(!has_energy);
960
961        cleanup();
962    }
963}