runbound 0.4.4

RFC-compliant DNS resolver — drop-in Unbound with REST API, ACME auto-TLS, HMAC audit log, and master/slave HA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
# Security Architecture

This document covers the security model, defensive layers, and all audit findings
fixed across Runbound releases through v0.4.1.

---

## Defensive layers

```
Internet / LAN
┌─────────────────────────────────────────────────────┐
│  DoT / DoH / DoQ (TLS 1.2+, TLS 1.3 for DoQ)       │  ← rustls 0.23 + ring backend
│  Optional mTLS client auth (dot-client-auth-ca)     │  ← mutual TLS for DoT
├─────────────────────────────────────────────────────┤
│  ACL check (allow / deny / refuse)                  │  ← per-subnet rules, IPv4+IPv6
│  Rate limiter (token bucket)                        │  ← per-source-IP, DashMap+ahash
│  Inflight semaphore (max 4096)                      │  ← hard OOM backstop
├─────────────────────────────────────────────────────┤
│  XDP fast path (optional)                           │  ← same ACL + rate limit enforced
├─────────────────────────────────────────────────────┤
│  DNS engine (hickory-server 0.26)                   │
│  Zone lookup / forwarding                           │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│  REST API (port 8081, localhost only)               │
│  Body size check before rate limit (Content-Length) │  ← 413 before 429
│  Bearer token (timing-safe cmp)                     │  ← subtle::ConstantTimeEq
│  Entry limits (10k DNS, 100k BL)                   │
│  zones_mutex (atomic write+swap)                    │
│  HMAC-SHA256 store integrity (.mac sidecars)        │  ← RUNBOUND_STORE_KEY
└─────────────────────────────────────────────────────┘
```

---

## ACL (Access Control List)

Rules are evaluated in order; first match wins. Default if no rule matches: **REFUSE**.

```
access-control: 127.0.0.0/8    allow
access-control: 10.0.0.0/8     allow
access-control: 0.0.0.0/0      refuse   ← secure default
```

**IPv4-mapped IPv6 normalisation (SEC-03):** Clients connecting via IPv6 as
`::ffff:10.0.0.1` are normalised to `10.0.0.1` before ACL matching, ensuring
IPv4 rules apply correctly regardless of transport.

---

## Rate limiting

Token-bucket rate limiter, one bucket per source IP.

```
rate-limit: 500    # max queries per second per IP
```

- Implemented with `DashMap<IpAddr, IpBucket>` and `ahash` for low-contention
  concurrent access.
- Excess queries receive a REFUSED response — no amplification possible.
- Shared between the standard path and the XDP fast path.
- Disable with `rate-limit: 0` (not recommended for public-facing resolvers).

---

## Anti-OOM memory protection

Runbound has two independent, always-active defences against memory exhaustion:

### 1. Inflight concurrency semaphore

Hard cap of **4,096 concurrent in-flight requests**. When the semaphore is exhausted,
new requests receive REFUSED immediately without allocating any additional memory.

### 2. Memory pressure guard

A background task reads `/proc/meminfo` every **30 seconds**. When system RAM usage
reaches **80 %**, two caches are purged atomically:

- **Rate-limiter DashMap** — all token buckets cleared.
- **hickory-resolver cache** — rebuilt and atomically swapped via ArcSwap.

After the purge, the new usage level is logged. If usage is still above 50 %, a
second warning is emitted.

On non-Linux systems or containers without `/proc/meminfo`, the guard silently
skips its check.

```
WARN Memory pressure — purging DNS caches  used_pct=82.3%  avail_mb=312  total_mb=1753
WARN DNS resolver cache flushed and rate limiter cleared  freed_buckets=8241
WARN Memory after purge  used_pct=44.1%  status="below 50% target"
```

**The memory guard is always active — no configuration required.**

---

## TLS (DoT / DoH / DoQ)

Runbound supports three encrypted DNS transports:

| Transport | Port | Standard |
|---|---|---|
| DNS-over-TLS (DoT) | 853 | RFC 7858 |
| DNS-over-HTTPS (DoH) | 443 | RFC 8484 |
| DNS-over-QUIC (DoQ) | 853/UDP | RFC 9250 |

TLS is provided by **rustls 0.23** with the **ring** cryptographic backend. DoQ
requires TLS 1.3 (`ServerConfig::builder_with_protocol_versions(&[&TLS13])`).

### Mutual TLS for DoT (mTLS)

Optionally require clients to present a certificate signed by a trusted CA:

```
dot-client-auth-ca: /etc/runbound/client-ca.pem
```

When set, unauthenticated DoT connections are rejected at the TLS handshake
before any DNS message is parsed. See [configuration.md](configuration.md) for
the full setup guide including client certificate generation.

### Certificate management

Runbound supports automatic certificate provisioning via **Let's Encrypt ACME**
(HTTP-01 challenge) and includes a `--gen-cert` utility for development
self-signed certificates.

```bash
# Generate self-signed certificate for testing
runbound --gen-cert dns.example.com

# Use Let's Encrypt in production (add to unbound.conf)
acme-email: ops@example.com
acme-domain: dns.example.com
```

---

## REST API security

**Authentication:** Bearer token via `Authorization` header. Compared using
`subtle::ConstantTimeEq` — not vulnerable to timing attacks.

**API key management:**
```bash
# Set via environment variable — never write in config files
export RUNBOUND_API_KEY="$(openssl rand -hex 32)"
```

**Body size enforcement:** `Content-Length` is checked before the rate limiter
so oversized requests return HTTP 413 (not 429). The `DefaultBodyLimit` at
64 KiB prevents OOM via large payloads.

**Entry limits:** Enforced server-side to prevent authenticated DoS:
- DNS entries: max 10,000
- Blacklist entries: max 100,000
- Feed subscriptions: max 100

**Concurrent write safety:** The entire load → validate → write → ArcSwap
sequence is performed inside `zones_mutex`. Two concurrent API writes cannot
overwrite each other.

**Input validation:**
- DNS `name` and domain-type `value` fields (CNAME, MX, NS, PTR, SRV targets)
  are validated against RFC 1035 rules: max 253 chars, labels max 63 chars,
  valid label characters only, no control characters.
- TTL must be in [0, 2147483647] RFC 2181 §8.
- All JSON deserialization failures return structured JSON error bodies with
  `{"error": "INVALID_REQUEST", "details": "..."}`.

---

## Store integrity (HMAC)

Runbound optionally protects its JSON data stores against offline tampering using
HMAC-SHA256 sidecar files.

```bash
# 64-byte hex key (minimum)
export RUNBOUND_STORE_KEY="$(openssl rand -hex 32)"
```

Protected files:
- `dns_entries.json``dns_entries.json.mac`
- `blacklist.json``blacklist.json.mac`
- `feeds.json``feeds.json.mac`
- `feed_domains_<id>.txt``feed_domains_<id>.txt.mac`

| Key set | MAC file exists | Behaviour |
|---|---|---|
| No | No | OK — HMAC disabled |
| No | Yes | WARN — orphaned sidecar, load continues |
| Yes | No | WARN — file was written without MAC, load continues |
| Yes | Yes | Verify — mismatch → ERROR, load aborted |

A HMAC mismatch on startup returns an error and refuses to load the tampered file.
Startup continues with an empty store rather than serving poisoned data.

See [configuration.md](configuration.md) for the full 4-case behaviour table.

---

## Feed security

**SSRF protection — two independent layers:**

1. **Redirect policy:** HTTP→HTTPS downgrades and redirects to private/loopback
   addresses are blocked at the reqwest level before any HTTP request is issued.

2. **Connection-layer resolver (MED-03, v0.4.0):** A custom `reqwest` DNS
   resolver (`SsrfSafeDnsResolver`) filters private, loopback, and link-local
   addresses from DNS responses *before* a TCP connection is opened. This closes
   the gap where a feed URL resolves to a public IP at subscription time but a
   later DNS update returns a private IP (DNS rebinding).

**TOCTOU re-validation:** Feed URLs are re-validated on every fetch, not just
at subscription time.

**HTTPS enforcement:** HTTP feed URLs are rejected with 400 Bad Request —
only `https://` URLs are accepted.

**Credential stripping (v0.3.3):** Feed URLs with embedded credentials
(`user:pass@host`) are rejected before any network request.

**File permissions:** Serialised feed files are written with `chmod 640` —
owner and group readable only, with HMAC sidecar integrity verification.

---

## XDP path security

**ACL enforcement in XDP (SEC-02):** The AF/XDP fast path applies the full ACL
before answering any query. `Deny` → silent drop; `Refuse` → REFUSED frame
crafted directly in the XDP worker.

---

## HA master/slave sync

The sync HTTPS server (port 8082) uses **rustls 0.23** with a TOFU
(Trust-On-First-Use) certificate pinning strategy:

- Master generates a self-signed sync certificate on first start and pins its
  SHA-256 fingerprint.
- Slave connects only to a master whose certificate matches the configured
  fingerprint.
- Sync bearer token compared with `subtle::ConstantTimeEq`.
- All write operations are blocked on slave nodes (HTTP 503 `READ_ONLY`).

---

## File permissions reference

| File | Permissions | Notes |
|---|---|---|
| `/etc/runbound/runbound.conf` | `640` | Contains no secrets when using env vars |
| `/etc/runbound/api.key` | `600` | Auto-generated API key backup |
| `/etc/runbound/key.pem` | `600` | TLS private key — never world-readable |
| `/etc/runbound/cert.pem` | `644` | TLS certificate |
| `<base_dir>/dns_entries.json` | `640` | DNS store (auto-set by Runbound) |
| `<base_dir>/blacklist.json` | `640` | Blacklist store (auto-set by Runbound) |
| `<base_dir>/feeds.json` | `640` | Feed subscriptions |
| `<base_dir>/*.mac` | `640` | HMAC sidecar files |

---

## Systemd hardening

The provided unit file applies:
- `NoNewPrivileges=yes`
- `PrivateTmp=yes`
- `ProtectSystem=strict`
- `ProtectHome=yes`
- `ProtectKernelTunables=yes`
- `CapabilityBoundingSet=CAP_NET_BIND_SERVICE` (port 53 only — no root)

See [systemd.md](systemd.md) for the full unit file.

---

## Audit findings

### v0.2.0 – v0.3.x

| ID | Severity | Title | Fixed in |
|---|---|---|---|
| SEC-01 | High | Race condition on concurrent API writes | v0.2.0 |
| SEC-02 | High | XDP fast path bypassed ACL entirely | v0.2.0 |
| SEC-03 | Medium | IPv4-mapped IPv6 skipped ACL rules | v0.2.0 |
| SEC-04 | Medium | SSRF via HTTP redirect in feed fetcher | v0.2.0 |
| SEC-05 | Medium | TOCTOU on feed URL validation | v0.2.0 |
| SEC-06 | Medium | Unbounded data-store growth | v0.2.0 |
| SEC-07 | Low | Feed data files world-readable | v0.2.0 |
| SEC-08 | Low | Plaintext HTTP feeds accepted silently | v0.2.0 |
| SEC-09 | High | `POST /rotate-key` was a silent no-op | v0.3.3 |
| SEC-10 | Medium | CHAOS class queries returned NOERROR instead of NOTIMP | v0.3.3 |
| SEC-11 | Medium | Body limit dropped TCP instead of returning HTTP 413 | v0.3.3 |
| SEC-12 | Medium | Negative TTL caused panic instead of HTTP 422 | v0.3.3 |
| SEC-13 | Medium | Production `unwrap()` / `expect()` could crash the process | v0.3.3 |
| SEC-14 | Medium | Sync Bearer comparison was timing-vulnerable | v0.3.3 |
| SEC-15 | Low | Feed URLs with embedded credentials were not rejected | v0.3.3 |
| SEC-16 | Low | `rate-limit: u64::MAX` silently disabled rate limiting | v0.3.3 |

### v0.4.0

| ID | Severity | Title | Fixed in |
|---|---|---|---|
| HIGH-01 | High | Auth bypass — 7 attack vectors accepted unauthenticated | v0.4.0 |
| HIGH-02 | High | Timing oracle on API key comparison | v0.4.0 |
| HIGH-03 | High | DNS injection via unvalidated name/value fields | v0.4.0 |
| HIGH-04 | High | ANY amplification not blocked | v0.4.0 |
| HIGH-05 | High | AXFR zone transfer not refused | v0.4.0 |
| HIGH-06 | High | No integrity protection on data stores | v0.4.0 |
| MED-01 | Medium | Per-IP rate limit on API missing | v0.4.0 |
| MED-02 | Medium | `local-zone` / `local-data` count unbounded in config | v0.4.0 |
| MED-03 | Medium | SSRF via DNS rebinding not blocked at connection layer | v0.4.0 |
| MED-04 | Medium | Audit log HMAC not enforced | v0.4.0 |
| MED-05 | Medium | DoT/DoH TLS upgrade to rustls 0.23 (CVE exposure) | v0.4.0 |
| LOW-01 | Low | Client IP logged for all queries (privacy) | v0.4.0 |
| LOW-02 | Low | Log buffer unbounded growth | v0.4.0 |
| LOW-03 | Low | Config cap on local-zone / local-data directives missing | v0.4.0 |
| LOW-04 | Low | Sync certificate not pinned (TOFU gap) | v0.4.0 |
| LOW-05 | Low | Control characters in log fields not sanitised | v0.4.0 |

### v0.4.1

| ID | Severity | Title | Fixed in |
|---|---|---|---|
| BUG-01 | Blocking | Sync HTTPS server panic (CryptoProvider not installed) | v0.4.1 |
| S-10 | Medium | CNAME/MX/NS/PTR/SRV target values accepted beyond 253 chars | v0.4.1 |
| S-11 | Low | 1 MB body returned 429 instead of 413 (rate limit fired first) | v0.4.1 |
| Q-01 | Low | POST /dns invalid type → HTTP 422 non-JSON body | v0.4.1 |
| Q-02 | Low | POST /blacklist invalid action → HTTP 422 non-JSON body | v0.4.1 |
| Q-03 | Low | POST /rotate-key non-string type → HTTP 422 non-JSON body | v0.4.1 |
| Q-04 | Low | GET /logs?page=-1 → HTTP 400 non-JSON body | v0.4.1 |

See [security-audit.md](security-audit.md) for the full white-box audit report.

---

## HSM key storage (PKCS#11)

Runbound supports loading the REST API key and the JSON store HMAC key from a
Hardware Security Module via PKCS#11. When active, keys are physically non-extractable
from the hardware and never written to disk in plaintext.

**Key priority chain (highest to lowest):**

| Source | API key | Store key |
|---|:---:|:---:|
| HSM (`hsm-api-key-label`) |||
| Env var (`RUNBOUND_API_KEY` / `RUNBOUND_STORE_KEY`) |||
| Config file (`api-key:`) |||
| Auto-generated (CSPRNG) |||

When `hsm-pkcs11-lib` is set and key loading fails, Runbound exits immediately —
no silent fallback to env vars.

The HSM session is opened at startup, keys are extracted into `Zeroizing<T>` buffers
(memory is scrubbed on drop), and the session is closed. The HSM does not need to
remain connected during normal operation.

**Tested devices:** SoftHSM2 (dev), YubiHSM 2 (recommended, FIPS 140-2 L3), Nitrokey
HSM 2, AWS CloudHSM, Thales Luna.

→ Full setup guide: [docs/hsm.md](hsm.md)

---

## Supply chain & audit

Runbound enforces supply-chain security at three levels:

**1. CVE scanning (`cargo audit`)**  
Every dependency is checked against the [RustSec advisory database](https://rustsec.org/)
at each release. The gate is `--deny warnings`: any known vulnerability blocks the
release, with no exceptions.

**2. Licence and ban policy (`cargo deny`)**  
`deny.toml` enforces the licence whitelist (MIT / Apache-2.0 / BSD / ISC / Zlib)
and blocks GPL-2.0 and LGPL-without-exception, which are incompatible with
Runbound's AGPL-3.0 / commercial dual-licence model and with Rust static linking.
Wildcard version requirements are banned; only crates.io sources are allowed.

**3. SBOM (Software Bill of Materials)**  
A `sbom.cdx.json` file (CycloneDX 1.4 format) listing all transitive dependencies
with version, hash, and licence is attached to every GitHub release. Enterprise
customers and security auditors can use it to verify the full dependency tree
without rebuilding.

```bash
make audit       # cargo audit --deny warnings
make deny        # cargo deny check (licence + advisory + ban policy)
make sbom        # generate sbom.cdx.json
make audit-full  # all three + cargo outdated
```

→ Full audit process, release procedure, and manual review areas: [docs/audit.md](audit.md)

---

## Reporting a vulnerability

Send a report to **redlemonbe@codix.be** with subject line `[SECURITY] Runbound`.
Please include a description of the vulnerability, reproduction steps, and
your assessment of its impact. We aim to respond within 48 hours.

Do not open a public GitHub issue for security vulnerabilities.