moonpool_explorer/lib.rs
1//! Multiverse exploration for deterministic simulation testing.
2//!
3//! This crate discovers rare bugs by splitting simulation timelines at key
4//! moments. When your simulation reaches an interesting state for the first
5//! time (like a retry path firing, or a timeout being hit), the explorer
6//! forks the process and re-runs from that point with different randomness,
7//! creating a tree of alternate timelines.
8//!
9//! # Glossary
10//!
11//! ```text
12//! Term What it means
13//! ──────────────── ──────────────────────────────────────────────────────
14//! Seed A u64 that completely determines a simulation's randomness.
15//! Same seed = same coin flips = same execution every time.
16//!
17//! Timeline One complete simulation run. A seed + a sequence of
18//! splitpoints uniquely identifies a timeline.
19//!
20//! Splitpoint A moment where the explorer decides to branch the
21//! multiverse. Happens when a `sometimes` assertion
22//! succeeds for the first time, a numeric watermark
23//! improves, or a frontier advances.
24//!
25//! Multiverse The tree of all timelines explored from one root seed.
26//! Each splitpoint creates new children with different seeds.
27//!
28//! Coverage bitmap A small bitfield (8192 bits) that records which assertion
29//! paths a timeline touched. Used to detect whether a
30//! new timeline discovered anything its siblings didn't.
31//!
32//! Explored map The union of all coverage bitmaps across all timelines.
33//! Lives in shared memory. "Has this ever been seen?"
34//!
35//! Energy budget A finite pool that limits how many timelines the explorer
36//! can spawn. Prevents runaway forking.
37//!
38//! Watermark The best numeric value ever observed for a given assertion.
39//! Improving the watermark triggers a new splitpoint.
40//!
41//! Frontier For compound boolean assertions: the max number of
42//! conditions simultaneously true. Advancing it triggers
43//! a splitpoint.
44//!
45//! Recipe The sequence of splitpoints that leads to a specific
46//! timeline. If a bug is found, its recipe lets you
47//! replay exactly how to reach it.
48//!
49//! Mark An assertion site that can trigger splitpoints.
50//! Each mark has a name, a shared-memory slot,
51//! and (in adaptive mode) its own energy allowance.
52//! ```
53//!
54//! # The big idea
55//!
56//! A normal simulation picks one seed and runs one timeline:
57//!
58//! ```text
59//! Seed 42 ──────────────────────────────────────────────────▶ done
60//! RNG call #1 #2 #3 #4 ... #500
61//! ```
62//!
63//! But bugs hide in rare states. Maybe a retry only fires 5% of the time,
64//! and the bug only appears when two retries happen in the same run. With
65//! single-seed testing, you might run thousands of seeds and never hit it.
66//!
67//! Multiverse exploration changes the strategy. Instead of running many
68//! independent seeds, it dives deeper into promising seeds:
69//!
70//! ```text
71//! Seed 42 ─────────────┬── splitpoint! ─────────────────────▶ done
72//! RNG call #1 ... #200│
73//! │
74//! ├── Timeline A (new seed) ──────────▶ done
75//! ├── Timeline B (new seed) ──┬───────▶ done
76//! │ │
77//! │ nested splitpoint!
78//! │ │
79//! │ ├── Timeline B1 ──▶ done
80//! │ └── Timeline B2 ──▶ BUG!
81//! │
82//! └── Timeline C (new seed) ──────────▶ done
83//! ```
84//!
85//! The key insight: if a seed reached an interesting state (the retry fired),
86//! running from that point with different randomness is more likely to find
87//! a second interesting event than starting from scratch.
88//!
89//! # How the simulation seed works
90//!
91//! Every decision in the simulation (latency, failure, timeout) comes from
92//! a deterministic random number generator (RNG) seeded with a single `u64`.
93//! The RNG is a counter: every call to it increments a call count.
94//!
95//! ```text
96//! seed = 42
97//! RNG call #1 → 0.73 (used for: network latency)
98//! RNG call #2 → 0.12 (used for: should this connection fail?)
99//! RNG call #3 → 0.89 (used for: timer jitter)
100//! ...
101//! RNG call #N → 0.41 (used for: partition probability)
102//! ```
103//!
104//! Same seed = same sequence = same execution. This is what makes bugs
105//! reproducible.
106//!
107//! # What triggers a splitpoint?
108//!
109//! Not every assertion triggers exploration. Only *discovery* assertions do,
110//! and only when they discover something new:
111//!
112//! ```text
113//! Assertion kind Triggers a splitpoint when...
114//! ────────────────────── ────────────────────────────────────────────
115//! assert_sometimes! The condition is true for the FIRST time
116//! assert_reachable! The code path is reached for the FIRST time
117//! assert_sometimes_gt! The observed value beats the previous watermark
118//! assert_sometimes_all! More conditions are true simultaneously than ever
119//! assert_sometimes_each! A new identity-key combination is seen, or
120//! the quality score improves for a known one
121//!
122//! assert_always! NEVER (invariant, not a discovery)
123//! assert_unreachable! NEVER (safety check, not a discovery)
124//! ```
125//!
126//! The "first time" check uses a CAS (compare-and-swap) on a `split_triggered`
127//! flag in shared memory. Once a mark has triggered, it won't trigger again
128//! (except for numeric watermarks and frontiers, which can trigger multiple
129//! times as they improve).
130//!
131//! # How a splitpoint works (step by step)
132//!
133//! When `assert_sometimes!(retry_fired, "server retried request")` is `true`
134//! for the first time at RNG call #200 with seed 42:
135//!
136//! ```text
137//! Step 1: CAS split_triggered from 0 → 1 (first-time guard)
138//! Step 2: Record splitpoint position (RNG call count = 200)
139//! Step 3: Check energy budget (can we afford to split?)
140//! Step 4: Save parent's coverage bitmap to stack (we'll restore it later)
141//!
142//! For each new timeline (e.g., timelines_per_split = 3):
143//! ┌──────────────────────────────────────────────────────────────────┐
144//! │ Step 5: Clear child coverage bitmap │
145//! │ Step 6: Compute child seed = FNV-1a(parent_seed, mark, index) │
146//! │ Step 7: fork() ←── OS-level process fork (copy-on-write) │
147//! │ │
148//! │ CHILD (pid=0): │
149//! │ - Reseed RNG with child_seed │
150//! │ - Set is_child = true, depth += 1 │
151//! │ - Append (200, child_seed) to recipe │
152//! │ - Return from split function │
153//! │ - Continue simulation with NEW randomness ─────▶ eventually │
154//! │ calls exit_child(0) or exit_child(42) if bug │
155//! │ │
156//! │ PARENT (pid>0): │
157//! │ Sequential mode: │
158//! │ - waitpid(pid) — blocks until THIS child finishes │
159//! │ - Merge child's coverage bitmap into explored map │
160//! │ - If child exited with code 42 → record bug recipe │
161//! │ - Loop to next child │
162//! │ Parallel mode: │
163//! │ - Track child in active set │
164//! │ - When slots are full → reap_one() via waitpid(-1) │
165//! │ - Continue forking until all children spawned │
166//! │ - Drain remaining active children at the end │
167//! └──────────────────────────────────────────────────────────────────┘
168//!
169//! Step 8: Restore parent's coverage bitmap
170//! Step 9: Parent continues its own simulation normally
171//! ```
172//!
173//! The child seed is deterministic: `FNV-1a(parent_seed + mark_name + child_index)`.
174//! So the multiverse tree is fully reproducible.
175//!
176//! # The coverage bitmap: "did this timeline find anything new?"
177//!
178//! Each timeline gets a small bitmap (1024 bytes = 8192 bits). When an
179//! assertion fires, it sets a bit at position `hash(assertion_name) % 8192`:
180//!
181//! ```text
182//! Bitmap for Timeline A:
183//! byte 0 byte 1 byte 2
184//! ┌─┬─┬─┬─┬─┬─┬─┬─┐┌─┬─┬─┬─┬─┬─┬─┬─┐┌─┬─┬─┬─┬─┬─┬─┬─┐
185//! │0│0│1│0│0│0│0│0││0│0│0│0│0│1│0│0││0│0│0│0│0│0│0│0│ ...
186//! └─┴─┴─┴─┴─┴─┴─┴─┘└─┴─┴─┴─┴─┴─┴─┴─┘└─┴─┴─┴─┴─┴─┴─┴─┘
187//! ▲ ▲
188//! bit 2 bit 13
189//! (retry_fired) (timeout_hit)
190//! ```
191//!
192//! The **explored map** is the union (OR) of all bitmaps across all timelines.
193//! It lives in `MAP_SHARED` memory so all forked processes can see it:
194//!
195//! ```text
196//! After Timeline A: explored = 00100000 00000100 00000000 ...
197//! After Timeline B: explored = 00100000 00000110 00000000 ...
198//! ▲
199//! Timeline B found bit 14!
200//! (that's new coverage)
201//! ```
202//!
203//! **How "has new bits" works:**
204//!
205//! For each byte: `(child_byte & !explored_byte) != 0`
206//!
207//! ```text
208//! child = 00000110 (bits 1, 2 set)
209//! explored = 00000100 (bit 2 already known)
210//! !explored = 11111011
211//! child & !explored = 00000010 ← bit 1 is NEW! → has_new_bits = true
212//! ```
213//!
214//! This is used in **adaptive mode** to decide whether a mark is still
215//! productive (finding new paths) or barren (just repeating known paths).
216//!
217//! # The energy budget: "how much exploring can we do?"
218//!
219//! Without a limit, the explorer could fork forever (exponential blowup).
220//! The energy budget caps the total number of timelines spawned.
221//!
222//! ## Fixed-count mode (simple)
223//!
224//! One global counter. Each timeline costs 1 energy. When it reaches 0,
225//! no more timelines are spawned:
226//!
227//! ```text
228//! global_energy: 10
229//!
230//! Split at mark A: spawn 3 timelines → energy = 7
231//! Split at mark B: spawn 3 timelines → energy = 4
232//! Split at mark C: spawn 3 timelines → energy = 1
233//! Split at mark D: spawn 1 timeline → energy = 0
234//! Split at mark E: no energy left, skip
235//! ```
236//!
237//! Configure with:
238//! ```ignore
239//! ExplorationConfig {
240//! max_depth: 2,
241//! timelines_per_split: 3,
242//! global_energy: 50,
243//! adaptive: None, // ← fixed-count mode
244//! parallelism: None, // ← sequential (or Some(Parallelism::MaxCores))
245//! }
246//! ```
247//!
248//! ## Adaptive mode (smart)
249//!
250//! A 3-level energy system that automatically gives more budget to productive
251//! marks and takes budget away from barren ones:
252//!
253//! ```text
254//! ┌──────────────────────────────────────────────────────────────┐
255//! │ GLOBAL ENERGY (100) │
256//! │ Every timeline consumes 1 from here first. │
257//! │ When this hits 0, ALL exploration stops. │
258//! ├──────────────────────────────────────────────────────────────┤
259//! │ │
260//! │ ┌─────────────────┐ ┌─────────────────┐ ┌────────────┐ │
261//! │ │ Mark A: 15 │ │ Mark B: 15 │ │ Mark C: 15 │ │
262//! │ │ (productive) │ │ (barren) │ │ (new) │ │
263//! │ │ Used: 15/15 │ │ Used: 3/15 │ │ Used: 0/15 │ │
264//! │ │ Needs more! ───┼──┼─── Returns 12 ──┼──┤ │ │
265//! │ └────────┬────────┘ └─────────────────┘ └────────────┘ │
266//! │ │ │ │
267//! │ ▼ ▼ │
268//! │ ┌──────────────────────────────────────────────────────┐ │
269//! │ │ REALLOCATION POOL: 12 │ │
270//! │ │ Energy returned by barren marks. │ │
271//! │ │ Productive marks draw from here when their │ │
272//! │ │ per-mark budget runs out. │ │
273//! │ └──────────────────────────────────────────────────────┘ │
274//! └──────────────────────────────────────────────────────────────┘
275//! ```
276//!
277//! **How it decides if a mark is productive or barren:**
278//!
279//! Timelines are spawned in batches (e.g., 4 at a time). After each batch,
280//! the explorer checks the coverage bitmap:
281//!
282//! ```text
283//! Mark "retry_path":
284//! Batch 1: spawn 4 timelines → check bitmap → 2 found new bits ✓
285//! Batch 2: spawn 4 timelines → check bitmap → 1 found new bits ✓
286//! Batch 3: spawn 4 timelines → check bitmap → 0 found new bits ✗
287//! → Mark is BARREN. Return remaining per-mark energy to pool.
288//! → Stop splitting at this mark.
289//!
290//! Mark "partition_heal":
291//! Batch 1: spawn 4 timelines → check bitmap → 3 found new bits ✓
292//! Batch 2: spawn 4 timelines → check bitmap → 2 found new bits ✓
293//! ...continues until per-mark budget exhausted...
294//! ...draws from reallocation pool for more...
295//! Batch 6: spawn 4 timelines → hits max_timelines (20) → stop
296//! ```
297//!
298//! **The 3-level energy decrement (for each timeline):**
299//!
300//! ```text
301//! 1. Decrement global_remaining by 1
302//! └─ If global is at 0 → STOP (hard cap, game over)
303//!
304//! 2. Decrement per_mark[slot] by 1
305//! └─ If per_mark has budget → OK, continue
306//! └─ If per_mark is at 0 → try step 3
307//!
308//! 3. Decrement realloc_pool by 1
309//! └─ If pool has budget → OK, continue
310//! └─ If pool is at 0 → UNDO global decrement, STOP for this mark
311//! ```
312//!
313//! Configure with:
314//! ```ignore
315//! ExplorationConfig {
316//! max_depth: 3,
317//! timelines_per_split: 4, // ignored in adaptive mode
318//! global_energy: 200,
319//! adaptive: Some(AdaptiveConfig {
320//! batch_size: 4, // timelines per batch before checking yield
321//! min_timelines: 4, // minimum timelines even if barren
322//! max_timelines: 20, // hard cap per mark
323//! per_mark_energy: 15, // initial budget per mark
324//! }),
325//! parallelism: Some(Parallelism::MaxCores), // use all CPU cores
326//! }
327//! ```
328//!
329//! # Complete walkthrough: from seed to bug
330//!
331//! Say you're testing a distributed lock service with 3 nodes. Your test has:
332//!
333//! ```ignore
334//! assert_sometimes!(lock_acquired_after_retry, "lock retry succeeded");
335//! assert_sometimes!(saw_split_brain, "split brain detected");
336//! assert_always!(no_double_grant, "lock never granted to two nodes");
337//! ```
338//!
339//! **Iteration 1: Seed 42, exploration enabled**
340//!
341//! ```text
342//! Root timeline (seed=42, depth=0)
343//! │
344//! │ RNG#1..#150: normal execution, no interesting events
345//! │
346//! │ RNG#151: lock_acquired_after_retry = true ← FIRST TIME!
347//! │ CAS split_triggered 0→1 succeeds
348//! │ Splitpoint triggered at RNG call #151
349//! │
350//! ├── Timeline T0 (seed=FNV(42,"lock retry",0), depth=1)
351//! │ │ Reseed RNG, continue from same program state
352//! │ │ Different randomness → different network delays
353//! │ │ RNG#1..#80: saw_split_brain = true ← FIRST TIME (for children)
354//! │ │ CAS split_triggered 0→1 succeeds
355//! │ │ Nested splitpoint at RNG call #80!
356//! │ │
357//! │ ├── Timeline T0-0 (depth=2)
358//! │ │ no_double_grant = false ← BUG!
359//! │ │ exit_child(42) ← special code for "bug found"
360//! │ │
361//! │ ├── Timeline T0-1 (depth=2)
362//! │ │ runs to completion, no bug
363//! │ │ exit_child(0)
364//! │ │
365//! │ └── Timeline T0-2 (depth=2)
366//! │ runs to completion, no bug
367//! │ exit_child(0)
368//! │ │
369//! │ │ T0 continues after children finish
370//! │ │ exit_child(0)
371//! │
372//! ├── Timeline T1 (seed=FNV(42,"lock retry",1), depth=1)
373//! │ runs to completion, nothing new
374//! │ exit_child(0)
375//! │
376//! └── Timeline T2 (seed=FNV(42,"lock retry",2), depth=1)
377//! runs to completion, nothing new
378//! exit_child(0)
379//!
380//! Root continues after all children finish.
381//! Bug found! Recipe saved.
382//! ```
383//!
384//! **The bug recipe** is the path from root to the buggy timeline:
385//!
386//! ```text
387//! Recipe: [(151, seed_T0), (80, seed_T0_0)]
388//! Formatted: "151@<seed_T0> -> 80@<seed_T0_0>"
389//! ```
390//!
391//! To replay: set the root seed to 42, install RNG breakpoints at
392//! call count 151 (reseed to seed_T0) and then at call count 80
393//! (reseed to seed_T0_0). The simulation will follow the exact same
394//! path to the bug every time.
395//!
396//! # How fork() makes this work
397//!
398//! The explorer uses Unix `fork()` to create timeline branches. This is
399//! cheap because of copy-on-write (COW): the child gets a copy of the
400//! parent's entire memory without actually copying it. Pages are only
401//! copied when one side writes to them.
402//!
403//! ```text
404//! Parent process memory:
405//! ┌──────────────────────────────────────────────────┐
406//! │ Simulation state (actors, network, timers) │ ← COW pages
407//! │ RNG state (will be reseeded in child) │ ← COW pages
408//! ├──────────────────────────────────────────────────┤
409//! │ MAP_SHARED memory: │ ← truly shared
410//! │ - Assertion table (128 slots) │
411//! │ - Coverage bitmap pool (1 slot per core) │
412//! │ - Explored map (union of all coverage) │
413//! │ - Energy budget │
414//! │ - Fork stats + bug recipe │
415//! └──────────────────────────────────────────────────┘
416//! │
417//! fork()
418//! │
419//! ┌─────────┴─────────┐
420//! ▼ ▼
421//! Parent (reaps) Child (continues)
422//! - waitpid() - rng_reseed(new_seed)
423//! - merge coverage - run simulation
424//! - check exit code - exit_child(0 or 42)
425//! ```
426//!
427//! `MAP_SHARED` memory is the only communication channel between parent
428//! and child. Assertion counters, coverage bits, energy budgets, and bug
429//! recipes all live there. This is why the crate only depends on `libc`.
430//!
431//! # Parallel exploration (multi-core)
432//!
433//! By default, the fork loop is sequential: fork one child, wait for it,
434//! merge coverage, repeat. This uses only one CPU core.
435//!
436//! With `parallelism: Some(Parallelism::MaxCores)`, the explorer runs
437//! multiple children concurrently using a **sliding window** capped at
438//! the number of available CPU cores:
439//!
440//! ```text
441//! Sequential (parallelism: None):
442//!
443//! fork child 0 --- waitpid --- fork child 1 --- waitpid --- ...
444//! core 0 busy core 0 idle core 0 busy core 0 idle
445//!
446//! Parallel (parallelism: Some(Parallelism::MaxCores)):
447//!
448//! fork child 0 ──────────────────────── reap
449//! fork child 1 ─────────────────── reap
450//! fork child 2 ────────────── reap
451//! fork child 3 ─────── reap
452//! all cores busy until drained
453//! ```
454//!
455//! The parent uses `waitpid(-1)` to reap whichever child finishes first,
456//! then recycles that slot for the next child.
457//!
458//! ## How the bitmap pool works
459//!
460//! In sequential mode, children share one coverage bitmap (cleared between
461//! each child). In parallel mode, concurrent children would clobber each
462//! other's bitmap, so each child gets its own **bitmap pool slot**:
463//!
464//! ```text
465//! Bitmap pool (MAP_SHARED, allocated lazily on first parallel split):
466//! ┌─────────────┬─────────────┬─────────────┬─────────────┐
467//! │ Slot 0 │ Slot 1 │ Slot 2 │ Slot 3 │
468//! │ 1024 bytes │ 1024 bytes │ 1024 bytes │ 1024 bytes │
469//! │ child A │ child B │ child C │ (free) │
470//! └─────────────┴─────────────┴─────────────┴─────────────┘
471//! ```
472//!
473//! When a child finishes (`waitpid(-1)` returns its PID), the parent:
474//! 1. Looks up which slot that child used
475//! 2. Merges the slot's bitmap into the explored map
476//! 3. Recycles the slot for the next child
477//!
478//! Children that themselves become parents (nested splits) allocate their
479//! own fresh pool -- the pool pointer is reset to null in each child at
480//! fork time.
481//!
482//! ## Parallelism variants
483//!
484//! ```text
485//! Variant Slot count
486//! ─────────────────── ─────────────────────────────────────
487//! MaxCores All available CPU cores
488//! HalfCores Half of available cores (rounded up, min 1)
489//! Cores(n) Exactly n cores
490//! MaxCoresMinus(n) All cores minus n (min 1)
491//! ```
492//!
493//! ## Concurrency safety
494//!
495//! All `MAP_SHARED` state uses atomic operations that are safe across
496//! `fork()` boundaries:
497//! - **Assertion slots**: CAS for first-time discovery, `fetch_add` for counters
498//! - **Energy budget**: `fetch_sub`/`fetch_add` with rollback on failure
499//! - **Bug recipe**: CAS first-bug-wins -- only the first bug is recorded
500//! - **Stats counters**: `fetch_add` -- concurrent increments are safe
501//! - **Explored map merge**: Done by the parent AFTER `waitpid`, never concurrent
502//!
503//! # Architecture
504//!
505//! ```text
506//! moonpool-explorer (this crate) ── leaf, only depends on libc
507//! ▲
508//! moonpool-sim ── wires RNG hooks at init
509//! ```
510//!
511//! Communication with the simulation's RNG uses two function pointers
512//! set via [`set_rng_hooks`]:
513//! - `get_count: fn() -> u64` — how many RNG calls have happened
514//! - `reseed: fn(u64)` — replace the RNG seed and reset the counter
515//!
516//! This crate has zero knowledge of moonpool internals. It doesn't know
517//! about actors, networks, or storage. It only knows about RNG call counts,
518//! seeds, and assertion slot positions.
519//!
520//! # Module overview
521//!
522//! ```text
523//! lib.rs ── ExplorationConfig, init/cleanup, this documentation
524//! split_loop.rs ── The fork loop: sequential + parallel sliding window,
525//! adaptive batching, Parallelism enum
526//! assertion_slots.rs ── Shared-memory assertion table (128 slots)
527//! each_buckets.rs ── Per-value bucketed assertions (256 buckets)
528//! coverage.rs ── CoverageBitmap + ExploredMap (8192-bit bitmaps)
529//! energy.rs ── 3-level energy budget (global + per-mark + realloc pool)
530//! context.rs ── Thread-local state, RNG hooks, bitmap pool pointers
531//! shared_stats.rs ── Cross-process counters (timelines, fork_points, bugs)
532//! shared_mem.rs ── mmap(MAP_SHARED|MAP_ANONYMOUS) allocation
533//! replay.rs ── Recipe formatting and parsing ("151@seed -> 80@seed")
534//! ```
535//!
536//! # Usage
537//!
538//! ```ignore
539//! // In moonpool-sim's runner:
540//! moonpool_explorer::set_rng_hooks(get_rng_call_count, |seed| {
541//! set_sim_seed(seed);
542//! reset_rng_call_count();
543//! });
544//!
545//! moonpool_explorer::init(ExplorationConfig {
546//! max_depth: 2,
547//! timelines_per_split: 4,
548//! global_energy: 100,
549//! adaptive: None,
550//! parallelism: Some(Parallelism::MaxCores),
551//! })?;
552//!
553//! // ... run simulation ...
554//!
555//! moonpool_explorer::cleanup();
556//! ```
557
558#![deny(missing_docs)]
559#![deny(clippy::unwrap_used)]
560
561pub mod assertion_slots;
562pub mod context;
563pub mod coverage;
564pub mod each_buckets;
565pub mod energy;
566pub mod replay;
567pub mod sancov;
568pub mod shared_mem;
569pub mod shared_stats;
570pub mod simulations;
571pub mod split_loop;
572
573// Re-exports for the public API
574pub use assertion_slots::{
575 ASSERTION_TABLE_MEM_SIZE, AssertCmp, AssertKind, AssertionSlot, AssertionSlotSnapshot,
576 assertion_bool, assertion_numeric, assertion_read_all, assertion_sometimes_all, msg_hash,
577};
578pub use context::{explorer_is_child, get_assertion_table_ptr, set_rng_hooks};
579pub use each_buckets::{
580 EachBucket, assertion_sometimes_each, each_bucket_read_all, unpack_quality,
581};
582pub use replay::{ParseTimelineError, format_timeline, parse_timeline};
583pub use sancov::{sancov_edge_count, sancov_edges_covered, sancov_is_available};
584pub use shared_stats::{ExplorationStats, get_bug_recipe, get_exploration_stats};
585pub use split_loop::{AdaptiveConfig, Parallelism, exit_child};
586
587use context::{
588 ASSERTION_TABLE, COVERAGE_BITMAP_PTR, EACH_BUCKET_PTR, ENERGY_BUDGET_PTR, EXPLORED_MAP_PTR,
589 SHARED_RECIPE, SHARED_STATS,
590};
591
592/// Configuration for exploration.
593#[derive(Debug, Clone)]
594pub struct ExplorationConfig {
595 /// Maximum fork depth (0 = no forking).
596 pub max_depth: u32,
597 /// Number of children to fork at each discovery point (fixed-count mode).
598 pub timelines_per_split: u32,
599 /// Global energy budget (total number of fork operations allowed).
600 pub global_energy: i64,
601 /// Optional adaptive forking configuration.
602 /// When `None`, uses fixed `timelines_per_split` (backward compatible).
603 /// When `Some`, uses coverage-yield-driven batch forking with 3-level energy.
604 pub adaptive: Option<split_loop::AdaptiveConfig>,
605 /// Optional parallelism for multi-core exploration.
606 /// When `None`, children are forked and reaped sequentially (one core).
607 /// When `Some`, a sliding window of concurrent children uses multiple cores.
608 pub parallelism: Option<split_loop::Parallelism>,
609}
610
611/// Initialize assertion table and each-bucket shared memory only.
612///
613/// This allocates the shared memory regions needed for assertion tracking
614/// without requiring a full exploration context. Idempotent (no-op if
615/// already initialized).
616///
617/// # Errors
618///
619/// Returns an error if shared memory allocation fails.
620pub fn init_assertions() -> Result<(), std::io::Error> {
621 let current = ASSERTION_TABLE.with(|c| c.get());
622 if !current.is_null() {
623 return Ok(()); // Already initialized
624 }
625
626 let table_ptr = shared_mem::alloc_shared(assertion_slots::ASSERTION_TABLE_MEM_SIZE)?;
627 let each_bucket_ptr = shared_mem::alloc_shared(each_buckets::EACH_BUCKET_MEM_SIZE)?;
628
629 ASSERTION_TABLE.with(|c| c.set(table_ptr));
630 EACH_BUCKET_PTR.with(|c| c.set(each_bucket_ptr));
631
632 Ok(())
633}
634
635/// Free assertion table and each-bucket shared memory.
636///
637/// Nulls the pointers after freeing. No-op if not initialized.
638pub fn cleanup_assertions() {
639 unsafe {
640 let table_ptr = ASSERTION_TABLE.with(|c| c.get());
641 if !table_ptr.is_null() {
642 shared_mem::free_shared(table_ptr, assertion_slots::ASSERTION_TABLE_MEM_SIZE);
643 ASSERTION_TABLE.with(|c| c.set(std::ptr::null_mut()));
644 }
645
646 let each_bucket_ptr = EACH_BUCKET_PTR.with(|c| c.get());
647 if !each_bucket_ptr.is_null() {
648 shared_mem::free_shared(each_bucket_ptr, each_buckets::EACH_BUCKET_MEM_SIZE);
649 EACH_BUCKET_PTR.with(|c| c.set(std::ptr::null_mut()));
650 }
651 }
652}
653
654/// Zero assertion table memory for between-run resets.
655///
656/// No-op if not initialized.
657pub fn reset_assertions() {
658 let table_ptr = ASSERTION_TABLE.with(|c| c.get());
659 if !table_ptr.is_null() {
660 unsafe {
661 std::ptr::write_bytes(table_ptr, 0, assertion_slots::ASSERTION_TABLE_MEM_SIZE);
662 }
663 }
664
665 let each_bucket_ptr = EACH_BUCKET_PTR.with(|c| c.get());
666 if !each_bucket_ptr.is_null() {
667 unsafe {
668 std::ptr::write_bytes(each_bucket_ptr, 0, each_buckets::EACH_BUCKET_MEM_SIZE);
669 }
670 }
671}
672
673/// Prepare the exploration framework for the next seed in multi-seed exploration.
674///
675/// Preserves the explored map (cumulative coverage) and assertion watermarks/frontiers
676/// so subsequent seeds benefit from prior discovery context. Resets per-seed transient
677/// state: energy budgets, statistics, split triggers, pass/fail counters, coverage
678/// bitmap, and bug recipe.
679///
680/// Call this between seeds instead of [`reset_assertions`] when you want
681/// coverage-preserving multi-seed exploration.
682pub fn prepare_next_seed(per_seed_energy: i64) {
683 // 1. PRESERVE explored map — do nothing
684
685 // 2. Selectively reset assertion table slots:
686 // PRESERVE: watermark, split_watermark, frontier (quality context),
687 // pass_count, fail_count (cumulative — needed by validate_assertion_contracts)
688 // RESET: split_triggered (per-seed, controls re-forking)
689 let table_ptr = ASSERTION_TABLE.with(|c| c.get());
690 if !table_ptr.is_null() {
691 unsafe {
692 let count_ptr = table_ptr as *const std::sync::atomic::AtomicU32;
693 let count = (*count_ptr)
694 .load(std::sync::atomic::Ordering::Relaxed)
695 .min(assertion_slots::MAX_ASSERTION_SLOTS as u32) as usize;
696 let base = table_ptr.add(8) as *mut assertion_slots::AssertionSlot;
697 for i in 0..count {
698 let slot = &mut *base.add(i);
699 // Skip tombstones (msg_hash == 0) left by the duplicate-slot race fix.
700 if slot.msg_hash == 0 {
701 continue;
702 }
703 slot.split_triggered = 0;
704 // pass_count/fail_count accumulate across seeds — needed by
705 // validate_assertion_contracts() to avoid false "was never reached"
706 // when a seed doesn't reach a guarded assertion (e.g. invariants
707 // behind `if let Some(model)`). Forking uses split_triggered,
708 // not counts.
709 }
710 }
711 }
712
713 // 3. Selectively reset each-bucket slots:
714 // PRESERVE: best_score (quality watermark), key_values, msg, pass_count
715 // RESET: split_triggered (per-seed transient)
716 let each_ptr = EACH_BUCKET_PTR.with(|c| c.get());
717 if !each_ptr.is_null() {
718 unsafe {
719 let count_ptr = each_ptr as *const std::sync::atomic::AtomicU32;
720 let count = (*count_ptr)
721 .load(std::sync::atomic::Ordering::Relaxed)
722 .min(each_buckets::MAX_EACH_BUCKETS as u32) as usize;
723 let base = each_ptr.add(8) as *mut each_buckets::EachBucket;
724 for i in 0..count {
725 let bucket = &mut *base.add(i);
726 bucket.split_triggered = 0;
727 // pass_count accumulates across seeds (same rationale as assertion slots)
728 }
729 }
730 }
731
732 // 4. Zero coverage bitmap (per-timeline, NOT the explored map)
733 let bm_ptr = COVERAGE_BITMAP_PTR.with(|c| c.get());
734 if !bm_ptr.is_null() {
735 unsafe {
736 std::ptr::write_bytes(bm_ptr, 0, coverage::COVERAGE_MAP_SIZE);
737 }
738 }
739
740 // 4b. Clear sancov transfer buffer and reset BSS counters.
741 // History map is preserved (cumulative, like the explored map).
742 sancov::clear_transfer_buffer();
743 sancov::reset_bss_counters();
744
745 // 5. Reset energy budget with fresh per-seed energy
746 let energy_ptr = ENERGY_BUDGET_PTR.with(|c| c.get());
747 if !energy_ptr.is_null() {
748 unsafe {
749 energy::reset_energy_budget(energy_ptr, per_seed_energy);
750 }
751 }
752
753 // 6. Reset shared stats
754 let stats_ptr = SHARED_STATS.with(|c| c.get());
755 if !stats_ptr.is_null() {
756 unsafe {
757 shared_stats::reset_shared_stats(stats_ptr, per_seed_energy);
758 }
759 }
760
761 // 7. Reset bug recipe
762 let recipe_ptr = SHARED_RECIPE.with(|c| c.get());
763 if !recipe_ptr.is_null() {
764 unsafe {
765 shared_stats::reset_shared_recipe(recipe_ptr);
766 }
767 }
768
769 // 8. Reset context for new seed, mark as warm start
770 context::with_ctx_mut(|ctx| {
771 ctx.is_child = false;
772 ctx.depth = 0;
773 ctx.current_seed = 0;
774 ctx.recipe.clear();
775 ctx.warm_start = true;
776 });
777}
778
779/// Count the number of bits set in the explored coverage map.
780///
781/// Returns `None` if the explored map is not initialized (exploration not active).
782/// The result represents how many unique assertion paths have been explored
783/// across all timelines.
784///
785/// Must be called before [`cleanup`] which frees the explored map memory.
786pub fn explored_map_bits_set() -> Option<u32> {
787 let ptr = EXPLORED_MAP_PTR.with(|c| c.get());
788 if ptr.is_null() {
789 return None;
790 }
791 // Safety: ptr was allocated during init() with COVERAGE_MAP_SIZE bytes.
792 let vm = unsafe { coverage::ExploredMap::new(ptr) };
793 Some(vm.count_bits_set())
794}
795
796/// Initialize the exploration framework.
797///
798/// Allocates shared memory for cross-process state and activates exploration.
799/// Must be called after [`set_rng_hooks`] and before the simulation loop.
800///
801/// # Errors
802///
803/// Returns an error if shared memory allocation fails.
804pub fn init(config: ExplorationConfig) -> Result<(), std::io::Error> {
805 // Initialize assertion table first (idempotent)
806 init_assertions()?;
807
808 // Initialize sancov shared memory (no-op if sancov unavailable)
809 sancov::init_sancov_shared()?;
810
811 // Allocate exploration-specific shared memory regions
812 let stats_ptr = shared_stats::init_shared_stats(config.global_energy)?;
813 let recipe_ptr = shared_stats::init_shared_recipe()?;
814 let explored_ptr = shared_mem::alloc_shared(coverage::COVERAGE_MAP_SIZE)?;
815 let bitmap_ptr = shared_mem::alloc_shared(coverage::COVERAGE_MAP_SIZE)?;
816
817 // Allocate energy budget if adaptive mode is configured
818 let energy_ptr = if let Some(ref adaptive) = config.adaptive {
819 energy::init_energy_budget(config.global_energy, adaptive.per_mark_energy)?
820 } else {
821 std::ptr::null_mut()
822 };
823
824 // Store pointers in thread-local context
825 SHARED_STATS.with(|c| c.set(stats_ptr));
826 SHARED_RECIPE.with(|c| c.set(recipe_ptr));
827 EXPLORED_MAP_PTR.with(|c| c.set(explored_ptr));
828 COVERAGE_BITMAP_PTR.with(|c| c.set(bitmap_ptr));
829 ENERGY_BUDGET_PTR.with(|c| c.set(energy_ptr));
830
831 // Activate exploration context
832 context::with_ctx_mut(|ctx| {
833 ctx.active = true;
834 ctx.is_child = false;
835 ctx.depth = 0;
836 ctx.max_depth = config.max_depth;
837 ctx.current_seed = 0;
838 ctx.recipe.clear();
839 ctx.timelines_per_split = config.timelines_per_split;
840 ctx.adaptive = config.adaptive.clone();
841 ctx.parallelism = config.parallelism.clone();
842 ctx.warm_start = false;
843 });
844
845 Ok(())
846}
847
848/// Clean up the exploration framework.
849///
850/// Frees all shared memory and deactivates exploration.
851/// Call after the simulation loop completes.
852pub fn cleanup() {
853 // Deactivate
854 context::with_ctx_mut(|ctx| {
855 ctx.active = false;
856 });
857
858 // Free exploration-specific shared memory regions
859 // Safety: these pointers were allocated by init() via alloc_shared()
860 unsafe {
861 let stats_ptr = SHARED_STATS.with(|c| c.get());
862 if !stats_ptr.is_null() {
863 shared_mem::free_shared(
864 stats_ptr as *mut u8,
865 std::mem::size_of::<shared_stats::SharedStats>(),
866 );
867 SHARED_STATS.with(|c| c.set(std::ptr::null_mut()));
868 }
869
870 let recipe_ptr = SHARED_RECIPE.with(|c| c.get());
871 if !recipe_ptr.is_null() {
872 shared_mem::free_shared(
873 recipe_ptr as *mut u8,
874 std::mem::size_of::<shared_stats::SharedRecipe>(),
875 );
876 SHARED_RECIPE.with(|c| c.set(std::ptr::null_mut()));
877 }
878
879 let explored_ptr = EXPLORED_MAP_PTR.with(|c| c.get());
880 if !explored_ptr.is_null() {
881 shared_mem::free_shared(explored_ptr, coverage::COVERAGE_MAP_SIZE);
882 EXPLORED_MAP_PTR.with(|c| c.set(std::ptr::null_mut()));
883 }
884
885 let bitmap_ptr = COVERAGE_BITMAP_PTR.with(|c| c.get());
886 if !bitmap_ptr.is_null() {
887 shared_mem::free_shared(bitmap_ptr, coverage::COVERAGE_MAP_SIZE);
888 COVERAGE_BITMAP_PTR.with(|c| c.set(std::ptr::null_mut()));
889 }
890
891 let energy_ptr = ENERGY_BUDGET_PTR.with(|c| c.get());
892 if !energy_ptr.is_null() {
893 shared_mem::free_shared(
894 energy_ptr as *mut u8,
895 std::mem::size_of::<energy::EnergyBudget>(),
896 );
897 ENERGY_BUDGET_PTR.with(|c| c.set(std::ptr::null_mut()));
898 }
899 }
900
901 // Clean up sancov shared memory
902 sancov::cleanup_sancov_shared();
903
904 // Clean up assertion table and each-buckets (shared with init_assertions)
905 cleanup_assertions();
906}
907
908#[cfg(test)]
909mod tests {
910 use super::*;
911
912 #[test]
913 fn test_init_cleanup_cycle() {
914 let config = ExplorationConfig {
915 max_depth: 2,
916 timelines_per_split: 4,
917 global_energy: 100,
918 adaptive: None,
919 parallelism: None,
920 };
921
922 init(config).expect("init failed");
923 assert!(context::explorer_is_active());
924 assert!(!context::explorer_is_child());
925
926 // Stats should be available
927 let stats = get_exploration_stats().expect("stats should be available");
928 assert_eq!(stats.global_energy, 100);
929 assert_eq!(stats.total_timelines, 0);
930 assert_eq!(stats.fork_points, 0);
931 assert_eq!(stats.bug_found, 0);
932
933 cleanup();
934 assert!(!context::explorer_is_active());
935 assert!(get_exploration_stats().is_none());
936 }
937
938 #[test]
939 fn test_inactive_by_default() {
940 assert!(!context::explorer_is_active());
941 assert!(!context::explorer_is_child());
942 }
943
944 #[test]
945 fn test_assertion_bool_noop_when_inactive() {
946 // Should not panic when assertion table is not initialized
947 assertion_bool(AssertKind::Sometimes, true, true, "test_assertion");
948 }
949
950 #[test]
951 fn test_init_assertions_standalone() {
952 init_assertions().expect("init_assertions failed");
953
954 // Table pointer should be set
955 let ptr = context::get_assertion_table_ptr();
956 assert!(!ptr.is_null());
957
958 // Idempotent — second call should be no-op
959 init_assertions().expect("init_assertions second call failed");
960
961 cleanup_assertions();
962
963 // Should be null after cleanup
964 let ptr = context::get_assertion_table_ptr();
965 assert!(ptr.is_null());
966 }
967
968 #[test]
969 fn test_backward_compat_no_adaptive() {
970 let config = ExplorationConfig {
971 max_depth: 2,
972 timelines_per_split: 4,
973 global_energy: 100,
974 adaptive: None,
975 parallelism: None,
976 };
977 init(config).expect("init failed");
978
979 // Energy budget pointer should be null (old path)
980 let has_energy = context::ENERGY_BUDGET_PTR.with(|c| !c.get().is_null());
981 assert!(!has_energy);
982
983 cleanup();
984 }
985}