<div align="center">
<img width="120px" height="auto" src="https://raw.githubusercontent.com/jamesgober/jamesgober/main/media/icons/hexagon-3.svg" alt="Triple Hexagon">
<h1>
<strong>Process Daemon</strong>
<sup>
<br>
<sub>PERFORMANCE</sub>
<br>
</sup>
</h1>
</div>
[Home](../README.md)
## Quick Summary
Typical performance characteristics (Criterion, release, v1.0.0
baseline retained from the published bench-suite run):
- **Daemon creation**: ~2.14 μs per init
- **Subsystem registration** (3 tasks): ~3.76 μs per build
- **Config build**: ~80.6 ns per build
- **Shutdown coordination** (single quick task): ~2.87 μs per cycle
- **Error creation**: ~31.2 ns per error
- **Error chain construction**: ~137 ns per chain
v1.1.0 additionally lands several non-benchmarked improvements that
show up in real workloads rather than micro-benches — see
[v1.1.0 changes](#v110-changes-non-benchmarked) below.
> **Note:** Benchmarks were measured on 2026-02-23 on Windows x86_64.
> ±5–10% variance is normal between runs. Re-running on Linux x86_64
> typically shows slightly lower numbers due to lower syscall overhead.
## Benchmark Methodology
All benchmarks are generated using [Criterion.rs](https://bheisler.github.io/criterion.rs/book/):
**Test Configuration:**
- **Compiler**: Rust 1.82.0+, release mode with debuginfo
- **Runtime**: Tokio (async runtime)
- **Features**: `tokio` (core) only; no metrics collection
- **Platform**: Windows x86_64 (benchmarks may vary on Linux/macOS due to syscall overhead)
- **Warmup**: 3 seconds per benchmark
- **Samples**: 100 measurements per benchmark
- **Iterations**: Auto-scaled by Criterion to achieve stable measurements
**Interpretation Guidance:**
- Results are median values; see [Version Performance History](#version-performance-history) for full timing ranges.
- ±5–10% variance is normal between runs on the same machine.
- Larger variance may occur with system thermal throttling or background load.
- Actual latency in production may differ based on workload complexity and subsystem count.
### Memory Usage
- **Base daemon**: ~1–2 MB
- **Per subsystem**: ~4–8 KB
- **Configuration**: ~1–4 KB
- **Signal handling**: ~512 B
## Core Performance Metrics
| Daemon creation | 2.14 μs | 9.0 μs (v0.9.0) | ≈76% faster |
| Subsystem registration (3 tasks) | 3.76 μs | 12.9 μs (v0.9.0) | ≈71% faster |
| Config build | 80.6 ns | 234 ns (v0.9.0) | ≈66% faster |
| Shutdown coordination (1 task) | 2.87 μs | 10.2 μs (v0.9.0) | ≈72% faster |
| Error creation | 31.2 ns | 71.7 ns (v0.9.0) | ≈57% faster |
| Error cause chain | 137 ns | 539 ns (v0.9.0) | ≈75% faster |
<br>
## v1.1.0 changes (non-benchmarked)
v1.1.0 didn't move the headline micro-benchmark numbers above
significantly — those code paths were already optimized through the
v1.0.0 cycle. Instead it lands two real-workload improvements that
the existing bench suite doesn't cover, plus a contention-side win
that only matters under load:
### Shutdown latency: bounded by interval → bounded by signal
Previously, the daemon main loop slept the full
`monitoring.health_check_interval_ms` (default 30 s) between
shutdown-flag checks. A shutdown initiated 1 µs after the sleep
began had to wait the full interval before the loop noticed.
v1.1.0 races the sleep against a new
`ShutdownCoordinator::wait_initiated()` broadcast wait via
`tokio::select!`. **Observable shutdown latency is now bounded by
signal propagation, not by the configured interval** — typically
sub-millisecond. This change is observable in operations (kill
times, deploy times), not in Criterion.
### `parking_lot` migration completed
v1.0.0-RC.1's CHANGELOG advertised a `std::sync::Mutex` →
`parking_lot::Mutex` migration. That migration only covered
`pool.rs`, `shutdown.rs`, and `metrics.rs`. v1.1.0 completes it
across `subsystem.rs`, `resources.rs`, and `daemon.rs`'s rotating
log writer.
What changes:
- **Lock acquisition** is roughly 2–3× faster under contention
(parking_lot's spin-then-park design vs the OS-direct
`pthread_mutex` / `SRWLock` path).
- **No more lock poisoning**. `parking_lot::Mutex::lock()` returns
the guard directly; this eliminates an entire class of latent
failure modes inside subsystem metadata and resource history.
- **Smaller binary**: no `PoisonError` machinery for the converted
call sites.
In Criterion micro-benchmarks this swap is barely visible (most
benches don't contend on these mutexes), but **per-log-line latency**
under high logging volume and **subsystem-metadata read latency**
under many concurrent stats queries both improve measurably in real
workloads.
### Allocations: stale doc comments + `.unwrap()` calls removed
24 `.lock().unwrap()` / `.read().unwrap()` / `.write().unwrap()`
patterns were removed when `parking_lot` was adopted. Each was a
tiny branch the compiler had to keep alive; their removal slightly
shrinks the instruction footprint of subsystem and resource code.
Not enough to register on the benchmarks, but real.
<br>
## Version Performance History
Results generated with Criterion, default configuration, Tokio
runtime, no metrics feature (unless noted).
| 1.1.1 | 2026-05-19 | (unchanged) | (unchanged) | (unchanged) | (unchanged) | (unchanged) | (unchanged) |
| 1.1.0 | 2026-05-19 | (unchanged) | (unchanged) | (unchanged) | (unchanged) | (unchanged) | (unchanged) |
| 1.0.1 | 2026-05-18 | (unchanged) | (unchanged) | (unchanged) | (unchanged) | (unchanged) | (unchanged) |
| 1.0.0 | 2026-02-23 | 2.14 μs | 3.76 μs | 80.6 ns | 2.87 μs | 31.2 ns | 137 ns |
| 0.9.0 | 2025-08-26 | 9.0 μs | 12.9 μs | 234 ns | 10.2 μs | 71.7 ns | 539 ns |
v1.0.1 was a maintenance / audit patch with no perf-relevant code
changes; v1.1.0's gains land in shutdown responsiveness and
under-contention lock paths that this bench suite doesn't exercise
(see [v1.1.0 changes](#v110-changes-non-benchmarked) above).
v1.1.1 is a docs-only patch.
## Running Benchmarks Yourself
To run the benchmark suite on your own hardware:
```bash
# Run all benchmarks with Tokio
cargo bench --bench daemon_bench --features tokio
# Run benchmarks with metrics enabled (slightly slower)
cargo bench --bench daemon_bench --features "tokio metrics"
# Verbose output (per-sample measurements)
cargo bench --bench daemon_bench --features tokio -- --verbose
```
**Note on Windows:** some syscall-related benchmarks may show higher
latencies than Linux due to Windows kernel architecture differences.
This is expected and doesn't indicate a problem.
## Performance Optimization Tips
### For Your Application
1. **Enable `mimalloc` in production** if your daemon allocates heavily:
```toml
proc-daemon = { version = "1.1.2", features = ["tokio", "mimalloc"] }
```
2. **Use `high-res-timing` for sub-microsecond operations**:
```toml
proc-daemon = { version = "1.1.2", features = ["tokio", "high-res-timing"] }
```
3. **Profile your subsystems** using `--features profiling` (Unix-only;
the feature is target-gated and is a no-op on Windows). Use
`--features heap-profiling` for cross-platform heap profiling.
4. **Enable scheduler hints on Unix** for reduced context switching:
```toml
proc-daemon = { version = "1.1.2", features = ["tokio", "scheduler-hints-unix"] }
```
5. **Use lock-free coordination** for high-frequency inter-subsystem messaging:
```toml
proc-daemon = { version = "1.1.2", features = ["tokio", "lockfree-coordination"] }
```
6. **Subscribe to `SubsystemEvent` instead of polling
`SubsystemManager::get_stats()`** when you care about sub-second
reactivity. The polling path takes a lock per query; the event
stream is push-based and lock-free under
`lockfree-coordination`.
7. **`select!` on `cancelled()` in every long-running loop** — not
just the main daemon loop. v1.1.0 fixed this for the daemon
loop, but each subsystem still owns its own shutdown latency.
If a subsystem polls something every N seconds without racing
the shutdown future, expect shutdown to take up to N seconds
for that subsystem.
### For Benchmark Fairness
- **Close background applications** before running benchmarks.
- **Disable CPU frequency scaling** in BIOS if available (use the
`performance` governor on Linux).
- **Run multiple times** and average results to account for thermal
throttling.
- **Use `cargo bench -- --verbose`** to see individual sample
measurements.
<br><br><br>
<div align="center">
<br>
<h2></h2>
<sup>COPYRIGHT <small>©</small> 2026 <strong>JAMES GOBER.</strong></sup>
</div>