runbound 0.4.1

RFC-compliant DNS resolver — drop-in Unbound with REST API, ACME auto-TLS, HMAC audit log, and master/slave HA
# Security Architecture

This document covers the security model, defensive layers, and all audit findings
fixed across Runbound releases through v0.4.1.

---

## Defensive layers

```
Internet / LAN
┌─────────────────────────────────────────────────────┐
│  DoT / DoH / DoQ (TLS 1.2+, TLS 1.3 for DoQ)       │  ← rustls 0.23 + ring backend
│  Optional mTLS client auth (dot-client-auth-ca)     │  ← mutual TLS for DoT
├─────────────────────────────────────────────────────┤
│  ACL check (allow / deny / refuse)                  │  ← per-subnet rules, IPv4+IPv6
│  Rate limiter (token bucket)                        │  ← per-source-IP, DashMap+ahash
│  Inflight semaphore (max 4096)                      │  ← hard OOM backstop
├─────────────────────────────────────────────────────┤
│  XDP fast path (optional)                           │  ← same ACL + rate limit enforced
├─────────────────────────────────────────────────────┤
│  DNS engine (hickory-server 0.26)                   │
│  Zone lookup / forwarding                           │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│  REST API (port 8081, localhost only)               │
│  Body size check before rate limit (Content-Length) │  ← 413 before 429
│  Bearer token (timing-safe cmp)                     │  ← subtle::ConstantTimeEq
│  Entry limits (10k DNS, 100k BL)                   │
│  zones_mutex (atomic write+swap)                    │
│  HMAC-SHA256 store integrity (.mac sidecars)        │  ← RUNBOUND_STORE_KEY
└─────────────────────────────────────────────────────┘
```

---

## ACL (Access Control List)

Rules are evaluated in order; first match wins. Default if no rule matches: **REFUSE**.

```
access-control: 127.0.0.0/8    allow
access-control: 10.0.0.0/8     allow
access-control: 0.0.0.0/0      refuse   ← secure default
```

**IPv4-mapped IPv6 normalisation (SEC-03):** Clients connecting via IPv6 as
`::ffff:10.0.0.1` are normalised to `10.0.0.1` before ACL matching, ensuring
IPv4 rules apply correctly regardless of transport.

---

## Rate limiting

Token-bucket rate limiter, one bucket per source IP.

```
rate-limit: 500    # max queries per second per IP
```

- Implemented with `DashMap<IpAddr, IpBucket>` and `ahash` for low-contention
  concurrent access.
- Excess queries receive a REFUSED response — no amplification possible.
- Shared between the standard path and the XDP fast path.
- Disable with `rate-limit: 0` (not recommended for public-facing resolvers).

---

## Anti-OOM memory protection

Runbound has two independent, always-active defences against memory exhaustion:

### 1. Inflight concurrency semaphore

Hard cap of **4,096 concurrent in-flight requests**. When the semaphore is exhausted,
new requests receive REFUSED immediately without allocating any additional memory.

### 2. Memory pressure guard

A background task reads `/proc/meminfo` every **30 seconds**. When system RAM usage
reaches **80 %**, two caches are purged atomically:

- **Rate-limiter DashMap** — all token buckets cleared.
- **hickory-resolver cache** — rebuilt and atomically swapped via ArcSwap.

After the purge, the new usage level is logged. If usage is still above 50 %, a
second warning is emitted.

On non-Linux systems or containers without `/proc/meminfo`, the guard silently
skips its check.

```
WARN Memory pressure — purging DNS caches  used_pct=82.3%  avail_mb=312  total_mb=1753
WARN DNS resolver cache flushed and rate limiter cleared  freed_buckets=8241
WARN Memory after purge  used_pct=44.1%  status="below 50% target"
```

**The memory guard is always active — no configuration required.**

---

## TLS (DoT / DoH / DoQ)

Runbound supports three encrypted DNS transports:

| Transport | Port | Standard |
|---|---|---|
| DNS-over-TLS (DoT) | 853 | RFC 7858 |
| DNS-over-HTTPS (DoH) | 443 | RFC 8484 |
| DNS-over-QUIC (DoQ) | 853/UDP | RFC 9250 |

TLS is provided by **rustls 0.23** with the **ring** cryptographic backend. DoQ
requires TLS 1.3 (`ServerConfig::builder_with_protocol_versions(&[&TLS13])`).

### Mutual TLS for DoT (mTLS)

Optionally require clients to present a certificate signed by a trusted CA:

```
dot-client-auth-ca: /etc/runbound/client-ca.pem
```

When set, unauthenticated DoT connections are rejected at the TLS handshake
before any DNS message is parsed. See [configuration.md](configuration.md) for
the full setup guide including client certificate generation.

### Certificate management

Runbound supports automatic certificate provisioning via **Let's Encrypt ACME**
(HTTP-01 challenge) and includes a `--gen-cert` utility for development
self-signed certificates.

```bash
# Generate self-signed certificate for testing
runbound --gen-cert dns.example.com

# Use Let's Encrypt in production (add to unbound.conf)
acme-email: ops@example.com
acme-domain: dns.example.com
```

---

## REST API security

**Authentication:** Bearer token via `Authorization` header. Compared using
`subtle::ConstantTimeEq` — not vulnerable to timing attacks.

**API key management:**
```bash
# Set via environment variable — never write in config files
export RUNBOUND_API_KEY="$(openssl rand -hex 32)"
```

**Body size enforcement:** `Content-Length` is checked before the rate limiter
so oversized requests return HTTP 413 (not 429). The `DefaultBodyLimit` at
64 KiB prevents OOM via large payloads.

**Entry limits:** Enforced server-side to prevent authenticated DoS:
- DNS entries: max 10,000
- Blacklist entries: max 100,000
- Feed subscriptions: max 100

**Concurrent write safety:** The entire load → validate → write → ArcSwap
sequence is performed inside `zones_mutex`. Two concurrent API writes cannot
overwrite each other.

**Input validation:**
- DNS `name` and domain-type `value` fields (CNAME, MX, NS, PTR, SRV targets)
  are validated against RFC 1035 rules: max 253 chars, labels max 63 chars,
  valid label characters only, no control characters.
- TTL must be in [0, 2147483647] RFC 2181 §8.
- All JSON deserialization failures return structured JSON error bodies with
  `{"error": "INVALID_REQUEST", "details": "..."}`.

---

## Store integrity (HMAC)

Runbound optionally protects its JSON data stores against offline tampering using
HMAC-SHA256 sidecar files.

```bash
# 64-byte hex key (minimum)
export RUNBOUND_STORE_KEY="$(openssl rand -hex 32)"
```

Protected files:
- `dns_entries.json``dns_entries.json.mac`
- `blacklist.json``blacklist.json.mac`
- `feeds.json``feeds.json.mac`
- `feed_domains_<id>.txt``feed_domains_<id>.txt.mac`

| Key set | MAC file exists | Behaviour |
|---|---|---|
| No | No | OK — HMAC disabled |
| No | Yes | WARN — orphaned sidecar, load continues |
| Yes | No | WARN — file was written without MAC, load continues |
| Yes | Yes | Verify — mismatch → ERROR, load aborted |

A HMAC mismatch on startup returns an error and refuses to load the tampered file.
Startup continues with an empty store rather than serving poisoned data.

See [configuration.md](configuration.md) for the full 4-case behaviour table.

---

## Feed security

**SSRF protection — two independent layers:**

1. **Redirect policy:** HTTP→HTTPS downgrades and redirects to private/loopback
   addresses are blocked at the reqwest level before any HTTP request is issued.

2. **Connection-layer resolver (MED-03, v0.4.0):** A custom `reqwest` DNS
   resolver (`SsrfSafeDnsResolver`) filters private, loopback, and link-local
   addresses from DNS responses *before* a TCP connection is opened. This closes
   the gap where a feed URL resolves to a public IP at subscription time but a
   later DNS update returns a private IP (DNS rebinding).

**TOCTOU re-validation:** Feed URLs are re-validated on every fetch, not just
at subscription time.

**HTTPS enforcement:** HTTP feed URLs are rejected with 400 Bad Request —
only `https://` URLs are accepted.

**Credential stripping (v0.3.3):** Feed URLs with embedded credentials
(`user:pass@host`) are rejected before any network request.

**File permissions:** Serialised feed files are written with `chmod 640` —
owner and group readable only, with HMAC sidecar integrity verification.

---

## XDP path security

**ACL enforcement in XDP (SEC-02):** The AF/XDP fast path applies the full ACL
before answering any query. `Deny` → silent drop; `Refuse` → REFUSED frame
crafted directly in the XDP worker.

---

## HA master/slave sync

The sync HTTPS server (port 8082) uses **rustls 0.23** with a TOFU
(Trust-On-First-Use) certificate pinning strategy:

- Master generates a self-signed sync certificate on first start and pins its
  SHA-256 fingerprint.
- Slave connects only to a master whose certificate matches the configured
  fingerprint.
- Sync bearer token compared with `subtle::ConstantTimeEq`.
- All write operations are blocked on slave nodes (HTTP 503 `READ_ONLY`).

---

## File permissions reference

| File | Permissions | Notes |
|---|---|---|
| `/etc/runbound/runbound.conf` | `640` | Contains no secrets when using env vars |
| `/etc/runbound/api.key` | `600` | Auto-generated API key backup |
| `/etc/runbound/key.pem` | `600` | TLS private key — never world-readable |
| `/etc/runbound/cert.pem` | `644` | TLS certificate |
| `<base_dir>/dns_entries.json` | `640` | DNS store (auto-set by Runbound) |
| `<base_dir>/blacklist.json` | `640` | Blacklist store (auto-set by Runbound) |
| `<base_dir>/feeds.json` | `640` | Feed subscriptions |
| `<base_dir>/*.mac` | `640` | HMAC sidecar files |

---

## Systemd hardening

The provided unit file applies:
- `NoNewPrivileges=yes`
- `PrivateTmp=yes`
- `ProtectSystem=strict`
- `ProtectHome=yes`
- `ProtectKernelTunables=yes`
- `CapabilityBoundingSet=CAP_NET_BIND_SERVICE` (port 53 only — no root)

See [systemd.md](systemd.md) for the full unit file.

---

## Audit findings

### v0.2.0 – v0.3.x

| ID | Severity | Title | Fixed in |
|---|---|---|---|
| SEC-01 | High | Race condition on concurrent API writes | v0.2.0 |
| SEC-02 | High | XDP fast path bypassed ACL entirely | v0.2.0 |
| SEC-03 | Medium | IPv4-mapped IPv6 skipped ACL rules | v0.2.0 |
| SEC-04 | Medium | SSRF via HTTP redirect in feed fetcher | v0.2.0 |
| SEC-05 | Medium | TOCTOU on feed URL validation | v0.2.0 |
| SEC-06 | Medium | Unbounded data-store growth | v0.2.0 |
| SEC-07 | Low | Feed data files world-readable | v0.2.0 |
| SEC-08 | Low | Plaintext HTTP feeds accepted silently | v0.2.0 |
| SEC-09 | High | `POST /rotate-key` was a silent no-op | v0.3.3 |
| SEC-10 | Medium | CHAOS class queries returned NOERROR instead of NOTIMP | v0.3.3 |
| SEC-11 | Medium | Body limit dropped TCP instead of returning HTTP 413 | v0.3.3 |
| SEC-12 | Medium | Negative TTL caused panic instead of HTTP 422 | v0.3.3 |
| SEC-13 | Medium | Production `unwrap()` / `expect()` could crash the process | v0.3.3 |
| SEC-14 | Medium | Sync Bearer comparison was timing-vulnerable | v0.3.3 |
| SEC-15 | Low | Feed URLs with embedded credentials were not rejected | v0.3.3 |
| SEC-16 | Low | `rate-limit: u64::MAX` silently disabled rate limiting | v0.3.3 |

### v0.4.0

| ID | Severity | Title | Fixed in |
|---|---|---|---|
| HIGH-01 | High | Auth bypass — 7 attack vectors accepted unauthenticated | v0.4.0 |
| HIGH-02 | High | Timing oracle on API key comparison | v0.4.0 |
| HIGH-03 | High | DNS injection via unvalidated name/value fields | v0.4.0 |
| HIGH-04 | High | ANY amplification not blocked | v0.4.0 |
| HIGH-05 | High | AXFR zone transfer not refused | v0.4.0 |
| HIGH-06 | High | No integrity protection on data stores | v0.4.0 |
| MED-01 | Medium | Per-IP rate limit on API missing | v0.4.0 |
| MED-02 | Medium | `local-zone` / `local-data` count unbounded in config | v0.4.0 |
| MED-03 | Medium | SSRF via DNS rebinding not blocked at connection layer | v0.4.0 |
| MED-04 | Medium | Audit log HMAC not enforced | v0.4.0 |
| MED-05 | Medium | DoT/DoH TLS upgrade to rustls 0.23 (CVE exposure) | v0.4.0 |
| LOW-01 | Low | Client IP logged for all queries (privacy) | v0.4.0 |
| LOW-02 | Low | Log buffer unbounded growth | v0.4.0 |
| LOW-03 | Low | Config cap on local-zone / local-data directives missing | v0.4.0 |
| LOW-04 | Low | Sync certificate not pinned (TOFU gap) | v0.4.0 |
| LOW-05 | Low | Control characters in log fields not sanitised | v0.4.0 |

### v0.4.1

| ID | Severity | Title | Fixed in |
|---|---|---|---|
| BUG-01 | Blocking | Sync HTTPS server panic (CryptoProvider not installed) | v0.4.1 |
| S-10 | Medium | CNAME/MX/NS/PTR/SRV target values accepted beyond 253 chars | v0.4.1 |
| S-11 | Low | 1 MB body returned 429 instead of 413 (rate limit fired first) | v0.4.1 |
| Q-01 | Low | POST /dns invalid type → HTTP 422 non-JSON body | v0.4.1 |
| Q-02 | Low | POST /blacklist invalid action → HTTP 422 non-JSON body | v0.4.1 |
| Q-03 | Low | POST /rotate-key non-string type → HTTP 422 non-JSON body | v0.4.1 |
| Q-04 | Low | GET /logs?page=-1 → HTTP 400 non-JSON body | v0.4.1 |

See [security-audit.md](security-audit.md) for the full white-box audit report.

---

## Reporting a vulnerability

Send a report to **redlemonbe@codix.be** with subject line `[SECURITY] Runbound`.
Please include a description of the vulnerability, reproduction steps, and
your assessment of its impact. We aim to respond within 48 hours.

Do not open a public GitHub issue for security vulnerabilities.