# Performance
## Official benchmark — 2026-05-20
### Test environment
**DNS server**
| CPU | AMD Threadripper PRO 5995WX — 64 cores / 128 threads @ up to 4 575 MHz |
| RAM | 128 GiB DDR5 |
| NUMA | 8 nodes × 8 cores |
| L3 cache | 256 MiB (8 × 32 MiB) |
| NIC | 2× Intel X540 (ixgbe) 10 GbE fibre — LACP 802.3ad, MTU 9 000 |
| OS | Debian GNU/Linux 13 (trixie) — kernel 6.17.13-1-pve |
| Storage | NVMe (0 ms rotational) |
| Architecture | bond0.10 → br-rb + veth-rb (.250) — see [docs/proxmox.md](proxmox.md) |
**Client — Dell PowerEdge T620**
| CPU | 2× Intel Xeon E5-2690 v2 — 20 cores / 40 threads @ 3.0 / 3.6 GHz |
| RAM | 256 GB DDR3 LRDIMM ECC @ 1 866 MT/s |
| NIC | Emulex OneConnect 10 GbE (be2net) — LACP bond, fibre |
| OS | Proxmox VE — Linux 6.17.2-2-pve |
| Tool | dnsmark 0.4.5 |
| AF_XDP | not available (be2net has no AF_XDP support) |
**Common configuration (all three servers)**
| Upstream resolvers | 8.8.8.8 / 8.8.4.4 / 1.1.1.1 |
| DNSSEC | disabled |
| Query file | 40 real-world domains (pre-cached after warm-up) |
| Protocol | UDP |
### Servers under test
| Runbound | 0.4.16 | 128 OS threads (SO_REUSEPORT) — note ¹ |
| BIND9 | 9.20.21 | kernel-managed multi-thread |
| Unbound | 1.22.0 | 64 threads |
> ¹ Runbound detected 128 "physical cores" on the AMD 5995WX due to a known SMT
> topology detection bug (fix in v0.4.2). Actual physical cores: 64.
> Impact: some SMT sibling contention. Results are still competitive; corrected
> numbers expected to improve slightly once the fix is deployed.
**Runbound log level: verbosity 1 (warn).** At verbosity 2 (info), Runbound logs
every query — this adds significant CPU overhead at high QPS (confirmed: p99 under
stress goes from 0.231 ms to 3.011 ms with per-query logging enabled). BIND9 and
Unbound log nothing by default, making verbosity 1 the fair comparison baseline.
---
### Benchmark methodology
Four phases, ~10 minutes per server, identical for all three:
**Phase 1 — Warm-up** (30 s, 1 000 QPS, 1 client)
Fill the DNS cache. Stabilise the process. Discard results.
**Phase 2 — Ceiling detection** (ramp: 1 000 QPS → ×2 every 5 s)
Find the maximum sustainable QPS. Stop when the server cannot burst
to 2× the current level without packet loss > 1%.
**Phase 3 — Sustained load** (60 s, 80% of ceiling, 4 clients)
Simulate realistic production load. Measure stable latency.
**Phase 4 — Stress** (60 s, 150% of ceiling, 4 clients)
Simulate DDoS conditions. Measure degradation and crash resistance.
---
### Results
| **Runbound 0.4.16** | 128 000 | 85 116 | 0.213 ms | 105 846 | **0.231 ms** | **0.00%** |
| BIND9 9.20.21 | 128 000 | 85 149 | 0.210 ms | 105 919 | 0.225 ms | 0.00% |
| Unbound 1.22.0 | 128 000 | 85 019 | **0.078 ms** | 105 781 | **0.170 ms** | 0.00% |
All three servers hit the **client NIC ceiling** (Emulex be2net at ~128 000 QPS).
At that ceiling, sustained QPS and loss rate are statistically identical.
The latency differences are measurable but sub-millisecond across the board.
---
### Analysis
**Throughput parity:** All three servers saturate the client NIC at 128 000 QPS
with 0.00% packet loss. The bottleneck is the benchmark client, not the DNS server.
**Latency:** Unbound leads with 0.170 ms p99 under stress — a result of 20+ years
of cache optimisation in C. Runbound (0.231 ms) and BIND9 (0.225 ms) are within
60 µs of each other and within 61 µs of Unbound.
**Stability under overload:** All three servers responded correctly after 60 seconds
at 150% of the detected ceiling. No crashes, no memory leaks observed.
**What this benchmark does NOT measure:**
- AF/XDP kernel-bypass performance (requires Intel NIC client — coming in next test)
- REST API throughput
- Blacklist performance at scale (100k+ entries)
- HA master/slave replication
- Performance under random/uncached queries (cache-miss rate)
---
### Context: scale reference
For scale, the same dnsmark protocol run against a consumer router used as a DNS
resolver (unnamed — any entry-level home/SMB router is representative):
| QPS ceiling | 1 000 | 128 000 | — |
| Packet loss at ceiling | 12.02% 🔴 | 0.01% | — |
| Sustained p99 | 30.495 ms | 0.078 ms | **×390** |
| Stress p99 | 24.687 ms | 0.170 ms | **×145** |
| Throughput | ×1 | **×128** | — |
A consumer router used as a DNS resolver saturates and drops packets at 1 000 QPS.
A single desktop-class PC running Runbound, BIND9, or Unbound handles 128 000 QPS
with zero packet loss. The hardware matters more than the software at this scale.
---
### Next benchmark: AF/XDP native (Intel X540 client)
The current client (Emulex be2net) does not support AF/XDP.
A follow-up benchmark with an Intel X540-T2 client will test:
- Runbound XDP native path on ixgbe (both ends)
- Expected range: 500 000 – 14 000 000 QPS
- Both BIND9 and Unbound lack XDP support — comparison becomes one-sided
Results will be published in [docs/benchmark-xdp.md](benchmark-xdp.md) when available.
---
### Reproduce these results
```bash
# Download dnsmark
curl -LO https://github.com/redlemonbe/dnsmark/releases/latest/download/dnsmark-linux-x86_64-musl
chmod +x dnsmark-linux-x86_64-musl
# Create query file
cat > /tmp/queries.txt << 'EOF'
google.com A
cloudflare.com A
github.com A
amazon.com A
youtube.com A
twitter.com A
reddit.com A
facebook.com A
EOF
# Run benchmark (adjust IP)
./dnsmark-linux-x86_64-musl -s <SERVER_IP> --ramp -l 60 --no-tui --no-xdp \
-d /tmp/queries.txt
```
Full raw data for all phases and all three servers: [docs/benchmark-2026-05-20.md](benchmark-2026-05-20.md)