registry-io 1.0.0

High-performance event/callback registry for Rust. Sync-first with optional async. Lock-free reads, zero-allocation hot path, sub-50ns notify target. Designed as the foundation primitive for portfolio crates needing fast in-process notification.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
# registry-io — Architecture

This document walks through how `registry-io` is built internally:
the storage model, the hot and slow paths, how the async side
mirrors the sync side, the trade-offs in each design decision, and
the file-tree map so a new contributor can find anything in under
30 seconds.

If you only want the public surface, see [`API.md`](./API.md). For
measured costs, see [`PERFORMANCE.md`](./PERFORMANCE.md).

---

## Big picture

```
┌─────────────────────────── SyncRegistry<E> ─────────────────────────┐
│                                                                     │
│  ArcSwap<Vec<HandlerEntry<E>>>     ◄─── lock-free snapshot read     │
│  ┌──────────────────────────┐                                       │
│  │ HandlerEntry { id, prio, │                                       │
│  │   handler: Arc<dyn Fn> } │   ◄─── one entry per registered       │
│  │ HandlerEntry { … }       │       handler, priority-sorted        │
│  │ HandlerEntry { … }       │                                       │
│  └──────────────────────────┘                                       │
│                                                                     │
│  HandlerIdGenerator { next: AtomicU64 }   ◄─── monotonic id counter │
│  ArcSwapOption<PanicCallbackHolder>       ◄─── optional on_panic    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

  notify(&event):
    1. snapshot = handlers.load()      // atomic acquire load
    2. for entry in snapshot.iter() {
         catch_unwind(|| handler(event))
         if Err(payload) { handle_panic(id, payload) }
       }

  register(F):
    1. id = next id
    2. handlers.rcu(|current| {
         clone → insert at priority position → return new Arc<Vec>
       })
```

The async side is structurally identical, with `Arc<dyn Fn(&E) ->
BoxFuture<()>>` replacing `Arc<dyn Fn(&E)>` and the notify hot path
running each future through a panic-catching adapter (`CatchUnwind`)
plus an optional concurrent combinator (`JoinAll`).

---

## File-tree map

```
src/
├── lib.rs                 — crate root; module declarations + re-exports
├── handler_id.rs          — opaque HandlerId + monotonic generator
├── panic.rs               — PanicInfo<'a>, PanicCallbackHolder (shared)
├── future_ext.rs          — CatchUnwind, JoinAll  (feature: async)
├── sync/
│   ├── mod.rs             — SyncRegistry<E>
│   └── guard.rs           — HandlerGuard<E>  (RAII unregister)
└── async_registry/
    ├── mod.rs             — AsyncRegistry<E>   (feature: async)
    └── guard.rs           — AsyncHandlerGuard<E>

tests/
├── smoke.rs               — minimal end-to-end
├── sync_registry.rs       — sync core, 16 tests
├── priority.rs            — sync priority ordering, 6 tests
├── panic_isolation.rs     — sync panic isolation, 9 tests
├── guards.rs              — HandlerGuard RAII, 8 tests
├── concurrent.rs          — sync multi-thread, 6 tests
├── async_registry.rs      — async core, 10 tests   (feature: async)
├── async_priority.rs      — async priority, 3 tests
├── async_panic.rs         — async panic isolation, 11 tests
├── async_guards.rs        — AsyncHandlerGuard, 7 tests
├── proptest_invariants.rs — property tests, 6 properties
├── leak_check.rs          — Arc::strong_count canary, 3 scenarios
└── zero_alloc.rs          — dhat zero-allocation verification

benches/
├── sync_notify.rs           — notify by handler count, by thread count
├── register_unregister.rs   — slow path latency at N = 0/16/100/1000
├── contention.rs            — {1,4,16,64} threads × {1,4,16} handlers
└── async_notify.rs          — concurrent + sequential at N = 0/1/4/16

examples/
├── basic.rs                          — register / notify / unregister
├── priority.rs                       — priority ordering
├── guards.rs                         — HandlerGuard RAII
├── panic_isolation.rs                — panic-isolation + on_panic
├── concurrent.rs                     — 16 threads, lock-free contention
├── async_basic.rs                    — async fn handlers (feat: async)
├── async_concurrent_vs_sequential.rs — dispatch-mode comparison
├── pattern_hot_reload.rs             — config-lib-style hot reload
├── pattern_audit_fanout.rs           — audit-log fan-out
├── pattern_metric_event.rs           — metric-event collection
└── pattern_transaction_hooks.rs      — priority-ordered tx hooks

fuzz/
├── Cargo.toml
└── fuzz_targets/
    ├── handler_churn.rs       — random op sequences
    └── event_payload.rs       — adversarial event values
```

---

## Storage model

### `ArcSwap<Vec<HandlerEntry<E>>>`

The single most important design decision: handlers live in a `Vec`
held behind an
[`arc_swap::ArcSwap`](https://docs.rs/arc-swap/latest/arc_swap/struct.ArcSwap.html).

`ArcSwap` is a published primitive for **wait-free reads** and
**atomic copy-on-write writes** of an `Arc<T>`. A read just loads an
`Arc<T>` from an atomic pointer; a write produces a new `Arc<T>` and
compare-and-swaps it into the slot.

Properties this gives us:

- **Readers never block writers, writers never block readers.** A
  thread firing `notify` and a thread calling `register` proceed in
  parallel with no coordination.
- **Each `notify` sees a consistent snapshot.** The `Vec` it iterates
  over cannot be mutated mid-iteration because every write replaces
  the whole `Arc<Vec>` atomically.
- **Reader cost is one atomic acquire load** (plus a thread-local
  cache fast-path inside `arc-swap`).

### `HandlerEntry<E>`

```rust
struct HandlerEntry<E: Send + Sync + 'static> {
    id: HandlerId,                                       // 8 B
    priority: i32,                                       // 4 B + 4 B padding
    handler: Arc<dyn Fn(&E) + Send + Sync + 'static>,    // 16 B
}
// 32 bytes total, fits in half a cache line.
```

Cloning a `HandlerEntry` is one `HandlerId` copy + one `i32` copy +
one `Arc::clone` (a refcount bump). Cloning the full `Vec` during
register/unregister is therefore `O(N)` *cheap* operations, not `O(N)`
*allocations*.

### `HandlerIdGenerator`

```rust
pub(crate) struct HandlerIdGenerator { next: AtomicU64 }
```

`fetch_add(1, Relaxed)` produces a monotonic id stream starting at
`1`. `Relaxed` ordering is sufficient because the only invariant is
that *each call returns a distinct value*, which `fetch_add` provides
atomically. No happens-before relation is needed between the id
allocation and other registry state.

Two registries each have their own generator. Ids are not
comparable across registries.

### Panic callback storage

```rust
pub(crate) struct PanicCallbackHolder {
    inner: Arc<dyn Fn(&PanicInfo<'_>) + Send + Sync + 'static>,
}
```

The wrapper exists because `arc-swap`'s `RefCnt` trait requires the
inner type to be `Sized`, but `dyn Fn` is not. Wrapping in a sized
holder lets us use
`ArcSwapOption<PanicCallbackHolder>` for atomic install/replace/clear.

Reads of the panic callback happen on the **cold** path (inside
`handle_panic`, after a handler has already panicked) so the
single-load cost of `ArcSwapOption::load` is irrelevant.

---

## The hot path: `SyncRegistry::notify`

```rust
#[inline]
pub fn notify(&self, event: &E) {
    let snapshot = self.handlers.load();
    for entry in snapshot.iter() {
        let handler = &entry.handler;
        let result = catch_unwind(AssertUnwindSafe(|| handler(event)));
        if let Err(payload) = result {
            self.handle_panic(entry.id, payload);
        }
    }
}
```

Cost decomposition (measured, see `PERFORMANCE.md`):

| Stage                              | Approximate cost |
|------------------------------------|------------------|
| `handlers.load()`                  | ~2 ns (one atomic acquire, thread-local cached) |
| Per-entry: deref + iter step       | ~0.5 ns          |
| Per-entry: `Arc<dyn Fn>` deref + vtable + indirect call | ~1 ns |
| Per-entry: `catch_unwind` setup/teardown (no panic) | varies by OS |
| Per-entry: total marginal          | **~1.6 ns**      |

`handle_panic` is `#[cold]` so the linker keeps it out of the hot
instruction cache. The Err branch is taken essentially never on a
well-behaved handler set.

### Why `AssertUnwindSafe`?

`catch_unwind` requires its closure to implement `UnwindSafe`. Closures
that capture mutable references to non-`UnwindSafe` types (which most
trait objects technically are) don't satisfy this. We use
`AssertUnwindSafe` to bypass the static check.

The safety reasoning is documented in `docs/SECURITY.md`: the
registry's own state lives behind an immutable `Arc<Vec<...>>` snapshot
during iteration, so a panicking handler cannot corrupt it.

### Why no `notify_trusted` variant?

We considered an opt-out for `catch_unwind` ("if you trust your
handlers, save the cost"). Measured numbers showed the saving is
negligible (`catch_unwind` is essentially free on the no-panic path
across all our supported targets). Maintaining two variants of the
hot path was not worth the imperceptible win.

---

## The slow path: `register` / `unregister` / `clear`

All three use `ArcSwap::rcu` — the standard read-copy-update pattern:

```rust
drop(self.handlers.rcu(|current| {
    let mut new_vec: Vec<_> = Vec::with_capacity(current.len() + 1);
    new_vec.extend(current.iter().cloned());
    // … modify new_vec …
    Arc::new(new_vec)
}));
```

`rcu` loads the current `Arc<Vec>`, runs the closure to produce a new
`Arc<Vec>`, and compare-and-swaps it into the slot. If the CAS
fails — because another writer raced — `rcu` retries from the load.

Properties:

- **Linearizable across writers**: every write either lands or is
  retried; no write is lost.
- **`O(N)` per write** in the number of handlers (one `Vec` allocation
  + N `Arc::clone`s + one CAS).
- **Reader-side has zero impact on writers**: the read-side `Guard`
  from a concurrent `notify` does not block the writer's CAS.

### Priority-sorted insertion

```rust
let pos = new_vec.partition_point(|e| e.priority >= entry.priority);
new_vec.insert(pos, entry.clone());
```

`partition_point` is a binary search for the first index where the
predicate flips from `true` to `false`. Inserting at that index
keeps the vec sorted by descending priority with **stable** ordering
within priority bucket.

We chose `partition_point + insert` (= `O(log N + N)`) over a full
re-sort (= `O(N log N)`) because the rest of the slow path is already
`O(N)` and binary-search insertion preserves stability without a
custom comparator.

---

## The async path: `AsyncRegistry`

Structurally a clone of `SyncRegistry` with the handler signature
swapped:

```rust
type StoredAsyncHandler<E> =
    Arc<dyn Fn(&E) -> BoxFuture<()> + Send + Sync + 'static>;

type BoxFuture<T> = Pin<Box<dyn Future<Output = T> + Send + 'static>>;
```

The returned future is **`'static`** — it cannot borrow from `&E`.
Handlers that need event data must `clone` it inside the closure
before the inner `async move`. This is the canonical Rust async-fn
limitation; the registry doesn't try to paper over it.

### `CatchUnwind` (in `src/future_ext.rs`)

```rust
struct CatchUnwind<F: Future> {
    inner: Option<Pin<Box<F>>>,
}
```

Per-poll, `CatchUnwind` wraps `inner.as_mut().poll(cx)` in
`catch_unwind`. On panic, the inner future is consumed (set to `None`)
and the panic payload is returned as `Err`. On future completion the
inner is likewise consumed. The `Option` discriminant guards against
the otherwise-illegal "poll a Ready future" case.

### `JoinAll` (in `src/future_ext.rs`)

```rust
struct JoinAll<F: Future> {
    slots: Vec<JoinSlot<F>>,
    remaining: usize,
}
enum JoinSlot<F: Future> { Pending(Pin<Box<F>>), Done(F::Output) }
```

Polls every still-`Pending` slot per wake; transitions slots to
`Done` as they resolve; yields a `Vec<F::Output>` once `remaining ==
0`. Order of outputs is preserved.

This is the **minimal** concurrent driver. We don't pull in
`futures-util` (`join_all`, `select_all`, etc.) because we only need
this one combinator and the dependency carries its own non-trivial
surface. ~50 lines in-tree was the right trade-off.

### Two dispatch modes

```rust
pub async fn notify(&self, event: &E)             // concurrent
pub async fn notify_sequential(&self, event: &E)  // in priority order
```

`notify`: builds one wrapped future per handler, drives them
concurrently through `JoinAll`. Total wall-clock equals the slowest
handler.

`notify_sequential`: awaits each handler's future to completion
before starting the next. Total wall-clock equals the sum of
handler latencies but preserves a happens-before relation.

See `docs/PATTERNS.md#choosing-between-sync-and-async` for the
decision matrix.

---

## RAII guards

```rust
pub struct HandlerGuard<E: Send + Sync + 'static> {
    id: HandlerId,
    registry: Weak<SyncRegistry<E>>,
}
```

The guard holds a `Weak<SyncRegistry<E>>`, not an `Arc`. This breaks
a potential cycle (handler closure → captures `Arc<Self>` → owns
guard → holds Arc<Self>) and makes registry-before-guard drop a
no-op.

`Drop::drop` upgrades the `Weak`; if successful, calls
`registry.unregister(self.id)`. The `_ =` discard on the return
value is intentional — the handler may already have been removed by
a different code path.

`forget(self)` consumes the guard via `ManuallyDrop` so the registry
keeps the handler past the guard's scope. The caller is responsible
for unregistering manually after this.

---

## Cross-cutting design decisions

### Why `E: Send + Sync + 'static` at the struct level?

Putting the bound on the type (`pub struct SyncRegistry<E: Send + Sync
+ 'static>`) instead of on each impl block means:

- `Drop` impls can call methods (Rust requires the Drop impl's where
  clause to match the type's). Without this bound on the type,
  `HandlerGuard::drop` couldn't call `registry.unregister`.
- The user gets a clearer error at the construction site than at the
  point of `.register(...)`.
- The type is uniformly `Send + Sync` across all impls.

### Why monotonic ids instead of generational arenas?

The straight-forward `slotmap`/`generational-arena` approach would
let us re-use slot indices and provide compile-time-checked
liveness. We chose the `u64` counter for three reasons:

1. **Simpler invariant**: "every id ever issued is unique" is easier
   to reason about than "id is a (index, generation) pair, both of
   which can wrap."
2. **`HandlerId: Copy`** for free, with cheap equality (single `u64`
   compare).
3. **No re-use means no false positives.** Stale ids returned from
   `unregister` reliably stay rejected, which the property tests
   guarantee.

The downside — a 32-bit-counter would wrap after 4 billion
registrations — is mitigated by using `u64`, which wraps at
~10^19 registrations. At 1M registrations/sec that's ~580 000 years.

### Why panic isolation by default (no opt-out)?

A `notify` that propagates panics couples every subscriber to every
other subscriber. The "I'll be careful" argument always loses at
scale: every team that uses the registry would have to verify every
handler everywhere they touch. Catching the panic is the only
sustainable default.

The measured cost of `catch_unwind` on the no-panic path is
negligible on our supported targets (see `PERFORMANCE.md`), so the
trade-off is essentially free.

### Why no `tokio`/`async-std` runtime dependency?

`AsyncRegistry` is generic over whatever async runtime polls its
futures. The crate doesn't `spawn`, doesn't have a worker pool, and
doesn't care which executor drives `notify().await`. Pulling in a
runtime would force every consumer into the same one.

The dev-dependency `tokio` is for tests/examples/benches only and
does not propagate to downstream crates.

---

## Adding a new feature: the checklist

When adding to `registry-io`, the in-tree convention is:

1. **Public API change** → update [`STABILITY-1.0.md`]./STABILITY-1.0.md
   if it's a major bump territory. Otherwise add the new item under
   the appropriate section.
2. **Hot path change** → add a benchmark scenario before submitting
   the change. The regression gate is `>5%` on any tracked metric.
3. **Allocation behavior change** → ensure
   `cargo test --features dhat-heap --test zero_alloc` still passes.
4. **Async surface change** → mirror against the sync surface unless
   there's an explicit reason not to.
5. **Doc** → at minimum: a one-line summary, `# Examples` with a
   runnable example, and an entry in `docs/API.md` for any new public
   item. Update `docs/PATTERNS.md` if the new item introduces a new
   integration pattern.
6. **CHANGELOG** → entry under `[Unreleased]` describing what
   changed and why. Include a fix-up line if the change is a
   correction to a prior release's behavior.
7. **Run the full gate**: `cargo fmt --all -- --check`,
   `cargo clippy --all-targets --all-features -- -D warnings`,
   `cargo test --all-features`, `RUSTDOCFLAGS="-D warnings" cargo doc
   --no-deps --all-features`, `cargo build --all-features --examples`.

---

<sub>registry-io v1.0.0 — Copyright © 2026 James Gober. Apache-2.0 OR MIT.</sub>