1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
//! Shuttle model-checked analog of `resize_leak_probe.rs`: same
//! create/drop counting oracle, but the interleaving between a table
//! resizer and concurrent writers is *searched for* by shuttle's scheduler
//! instead of hoped for from `std::thread` timing.
//!
//! # What this catches
//!
//! `HashMap::try_resize` clones every live entry into a new table, installs
//! it, then retires the old one. A writer that captured the old table
//! pointer just before the swap (and hasn't finished its own CAS against
//! that table yet) must remain protected until it is done -- otherwise the
//! resize's retirement can free the old table's chains out from under it,
//! a use-after-free that this crate's `Counted` create/drop oracle surfaces
//! as a leak (create without a matching drop) or a double-free (two drops).
//! This is exactly the bug closed by the `hashmap.rs` proxy-birth-epoch fix
//! (commit `30bc947`): see the revert-validation note in the task report.
//!
//! # Why shuttle needs the `shuttle` feature, not just the public API
//!
//! `HashMap`'s table and node pointers are `kovan::Atomic<T>`. Under the
//! `shuttle` feature (`kovan-map/shuttle`, which cascades `kovan/shuttle`),
//! that becomes shuttle-instrumented, so the scheduler gets a yield point
//! at every pointer load/store/CAS -- precisely where this race lives.
//! Two more things had to be fixed for that instrumentation to actually pay
//! off (see `kovan/src/guard.rs` and `kovan-map/src/hashmap.rs` for the
//! full reasoning in place):
//! - `HANDLE`'s `thread_local!` (`kovan/src/guard.rs`): shuttle's "threads"
//! are cooperative coroutines multiplexed onto one real OS thread, so a
//! plain `std::thread_local!` handle is shared by every kovan-based
//! "thread" a test spawns -- collapsing every per-thread epoch slot this
//! algorithm depends on into one, and making any interleaving-dependent
//! bug in it unreachable regardless of scheduler or iteration count.
//! Shadowing the macro with `shuttle::thread_local!` under the feature
//! fixes this; it was required before this test could reproduce anything.
//! - `resize_spin_hint` (`kovan-map/src/hashmap.rs`): `wait_for_resize`'s
//! spin needs an explicit yield (`shuttle::hint::spin_loop`, which calls
//! `shuttle::thread::yield_now`), not just an instrumented load, or a
//! scheduler can legitimately keep re-running the same spinning thread
//! forever and exceed shuttle's step budget. The `resizing` flag itself
//! stays a plain `AtomicBool` under every build, shuttle included --
//! swapping its type is unnecessary (the yield alone gives the resizer
//! its turn) and, empirically, actively harmful (an earlier version of
//! this test that did swap it hit an unrelated, unexplained shuttle-only
//! heap corruption, reproducible even with a resize that never triggers).
//!
//! # Reliability of the repro (read before tightening this test)
//!
//! This is a narrow race: it needs `EPOCH_FREQ` (128) `retire()` calls to
//! actually move kovan's global epoch, *and* a specific relative timing
//! across all four writer threads' churn, not just one or two well-placed
//! preemptions. `shuttle::check_random` found it far more often than
//! `shuttle::check_pct` at every depth tried (1 through 20) -- this looks
//! like a case where the bug's real "depth" (simultaneous-ordering
//! constraints) is higher than PCT's shallow-preemption bias handles well,
//! so uniform random sampling covers the space better.
//!
//! Even with `check_random`, this is a probabilistic search, not an
//! exhaustive one, and the observed hit rate at the iteration count below
//! is well under 100% per run: reverting the fix and re-running this test
//! repeatedly during development surfaced the bug (a hard allocator abort:
//! "corrupted double-linked list", "double free or corruption (!prev)",
//! "corrupted size vs. prev_size", "free(): double free detected in
//! tcache", ...) on a minority of runs, not on every one, at 20000
//! iterations -- and stayed clean through a run at 150000. Shuttle's own
//! docs are explicit that it is a sound-scheduling, *not* exhaustive,
//! checker for exactly this reason. Treat a clean run as "did not happen
//! to find it this time", not proof of absence; treat a red run here as a
//! real finding, not flakiness to retry away.
//!
//! # Replaying a failure
//!
//! On an assertion-based failure shuttle prints a line like:
//! `test panicked in task "task-0" with schedule: "910102ccdedf9592aba2afd70104"`,
//! replayable with `shuttle::replay(|| { .. }, "<the printed schedule>")`.
//! This bug tends to manifest as a hard allocator abort (SIGABRT: "double
//! free or corruption", "corrupted size vs. prev_size", ...) instead, since
//! the corruption is severe enough to trip glibc before shuttle's panic
//! hook gets a chance to print the schedule; in that case there is no
//! schedule string to replay, only the process exit code / crash log.
use Arc;
use ;
/// Per-iteration create/drop counters. Built fresh inside the shuttle
/// closure on every iteration (never a global `static`), so iterations --
/// and other `#[test]` functions running concurrently in this binary --
/// can never cross-contaminate each other's counts.
/// Number of insert/remove cycles each thread runs. Each thread churns a
/// small, thread-private key window hard enough on its own to cross
/// kovan's `EPOCH_FREQ` (128 `retire()` calls per thread advances the
/// global epoch once) several times over, so the proxy-birth-epoch gap the
/// fixed bug depended on actually opens up during the run -- not just a
/// single insert each, which never advances the epoch at all and makes the
/// bug structurally unreachable regardless of scheduling.
const OPS_PER_THREAD: u64 = 200;
/// Distinct keys each thread cycles through (insert then remove), chosen
/// so four threads collectively push the table (`MIN_CAPACITY` == 64, grow
/// past 0.75 load factor == 49 live entries) through multiple resizes.
const KEYS_PER_THREAD: u64 = 20;
const THREADS: u64 = 4;