# registry-io — Performance
This document describes the runtime cost model for `registry-io 0.4.0`,
how to benchmark it locally, and the principles that govern the
implementation.
The performance contract for `1.0.0` is in `.dev/ROADMAP.md`. Numbers below
reflect what the code is **designed** to deliver — measured numbers on
production hardware will be recorded as the verification phase
(`0.6.0`) completes.
---
## Cost model
| `SyncRegistry::new` / `with_capacity` | one `Arc<Vec>` alloc | ✓ | — |
| `SyncRegistry::register*` | `O(N)` clone + swap | ✓ | atomic CAS |
| `SyncRegistry::unregister` | `O(N)` clone + swap | ✓ | atomic CAS |
| `SyncRegistry::clear` | one `Arc<Vec>` alloc + atomic store | ✓ | atomic store |
| `SyncRegistry::handler_count` / `is_empty` | atomic load | — | — |
| `SyncRegistry::contains` | `O(N)` scan | — | — |
| `SyncRegistry::notify` (no panics) | `O(N)` virtual calls + `catch_unwind` | — | — |
| `SyncRegistry::notify` (handler panics) | + one `Box<dyn Any>` per panic | ✓ on panic | — |
| `HandlerGuard::drop` | one `unregister` call | ✓ | atomic CAS |
`N` is the number of currently-registered handlers.
---
## Hot path: what `notify` actually does
```rust
#[inline]
pub fn notify(&self, event: &E) {
let snapshot = self.handlers.load(); // 1 atomic load
for entry in snapshot.iter() { // straight-line scan
let handler = &entry.handler; // borrow Arc<dyn Fn>
let result = catch_unwind(AssertUnwindSafe(|| handler(event)));
if let Err(payload) = result {
self.handle_panic(entry.id, payload); // cold path
}
}
}
```
The no-panic path:
- Loads an [`arc_swap::Guard`] (single atomic acquire load, no allocation).
- Iterates the snapshot's `Vec<HandlerEntry<E>>` in priority order.
- Calls each handler through dynamic dispatch.
- Wraps each call in `catch_unwind` (no allocation when no panic occurs).
There is no `Mutex`, no `RwLock`, no channel send, no per-iteration
allocation. The cost per handler is roughly:
```
load arc-swap guard: 1 atomic acquire load
per-handler: 1 Arc deref + 1 vtable lookup + 1 indirect call
+ catch_unwind setup/teardown
```
`catch_unwind` adds modest overhead per call (typically tens of cycles on
modern x86-64). This is the cost of panic isolation — handlers are isolated
from one another and from the caller. There is no opt-out in 0.4.0; a
future release may add a `notify_trusted` variant for callers that accept
panic propagation in exchange for the saved cycles.
---
## Slow path: register / unregister
`register*` and `unregister` follow the standard
read-clone-modify-CAS-swap pattern via [`arc_swap::ArcSwap::rcu`]:
1. Load the current `Arc<Vec<HandlerEntry<E>>>`.
2. Clone the `Vec` (one allocation; cloning each entry is just an `Arc`
refcount bump).
3. Push (or remove) the entry.
4. Try to atomically swap. If another writer raced, retry from step 1.
Under heavy register/unregister contention, retries may occur. The notify
hot path is **never** affected — readers always see a complete snapshot.
`register_with_priority` inserts the new entry at the correct position
using binary search (`Vec::partition_point`), so the priority ordering
invariant is maintained without a full re-sort.
---
## Memory footprint
- An **empty registry**: one `Arc<Vec<...>>` (16 bytes) + `id_generator` (8
bytes) + `panic_callback` (16 bytes) ≈ 48 bytes plus the empty `Vec`'s
metadata. Well under the 128-byte target.
- Per **registered handler**: `HandlerId` (8) + priority (4 + 4 padding) +
`Arc<dyn Fn>` (16) = **32 bytes** per slot. 100 handlers ≈ 3.2 KiB +
per-handler closure allocation. Comfortably under the 16 KiB target.
---
## Running the benchmark suite
```bash
cargo bench --bench sync_notify
cargo bench --bench register_unregister
```
The benches use [`criterion`] and produce HTML reports under
`target/criterion/`.
### Scenarios covered
`benches/sync_notify.rs`:
- `notify/0_handlers` — baseline cost of a no-op notify.
- `notify/1_handlers` through `notify/64_handlers` — dispatch with N
registered handlers, single thread.
- `notify/contended/N_threads` — `N` threads firing notify against a
4-handler registry concurrently.
`benches/register_unregister.rs`:
- `register/into_N_handlers` — adding a new handler when N are already
registered (the slow-path cost as the list grows).
- `unregister/from_N_handlers` — removing a handler.
---
## Verifying zero allocation on the hot path
The `notify` no-panic path is allocation-free **by construction**, but the
intended verification approach (planned for phase 0.6.0) is:
1. Plug the `dhat` allocator under a dedicated bench harness.
2. Register 8 handlers.
3. Call `notify` ~10⁶ times.
4. Assert `dhat::HeapStats::total_blocks` does not grow.
This adds an automated guard against regressions like an accidental
`Arc::clone` on the hot path.
---
## Concurrency characteristics
- **Many simultaneous readers**: `notify` from any number of threads in
parallel is supported with zero coordination. The `ArcSwap::load` is a
single atomic acquire; iteration is over a snapshot that no writer can
mutate.
- **Reader + writer concurrency**: a register or unregister concurrent
with notify never causes a notify to skip or duplicate handlers — both
observe consistent snapshots.
- **Many simultaneous writers**: handled correctly by `ArcSwap::rcu`
retry-on-conflict; under contention some writes may retry, but
correctness is preserved.
---
## Anti-patterns to avoid
- **Slow handlers**: handlers run inline on the caller's thread. If a
handler does network I/O, the entire notify becomes slow. Spawn slow
work onto a runtime instead of doing it in the handler.
- **Handlers that re-enter the registry**: `register` or `unregister` from
inside a handler is supported (it operates on the next snapshot), but
calling `notify` recursively is unbounded — it will see whatever the
current snapshot is and may not converge.
- **Large per-handler captures**: each registration becomes an
`Arc<dyn Fn>` heap allocation. A handler that captures a 1 MB buffer
costs 1 MB per registration. Keep captures small; share via `Arc`.
---
<sub>registry-io v0.4.0 — Copyright © 2026 James Gober. Apache-2.0 OR MIT.</sub>