sozu-lib 2.0.2

sozu library to build hot reconfigurable HTTP reverse proxies
# PROXY Protocol — Session Workflow

Reference document for maintainers of `lib/src/protocol/proxy_protocol/`.
Companion to `lib/src/protocol/kawa_h1/LIFECYCLE.md` (downstream H1) and
`lib/src/protocol/mux/LIFECYCLE.md` (downstream H2).

Every claim is anchored to a concrete `file.rs:LINE`; line numbers were last
refreshed against the `docs/feat-h2-mux-audit` branch tip on 2026-04-26.

---

## 1. Wire Format Reminder

The PROXY protocol prepends a fixed metadata block to a TCP connection so the
inner protocol (H1 / H2 / TLS / TCP-passthrough) sees the original client
address even after the connection traversed an upstream load balancer.

Two wire versions exist:

- **v1** is text-only. Lines have the shape
  `PROXY TCP4 src_ip dst_ip src_port dst_port\r\n` (or `TCP6` /
  `UNKNOWN\r\n`). See `HeaderV1` (`lib/src/protocol/proxy_protocol/header.rs:56-60`)
  and the comment at `header.rs:47` flagging that **v1 is never used inside
  Sōzu** — only v2 is parsed or emitted on the wire. The v1 serializer
  (`HeaderV1::into_bytes`, `header.rs:81`) survives as dead weight pending the
  documented removal.
- **v2** is binary. The wire frame is a 12-byte signature
  (`0x0D 0x0A 0x0D 0x0A 0x00 0x0D 0x0A 0x51 0x55 0x49 0x54 0x0A`, see
  `header.rs:159-161`) followed by version+command, family, address-block
  length (big-endian `u16`), and the address block. `HeaderV2`
  (`header.rs:139`) carries the parsed shape; `HeaderV2::into_bytes`
  (`header.rs:156`) emits it.

Address families are modelled by `ProxyAddr` (`header.rs:187`): `Ipv4Addr`,
`Ipv6Addr`, `UnixAddr` (108 bytes per side per the AF_UNIX socket-path
limit), and `AfUnspec` for unknown/legacy.

The v2 parser lives in `lib/src/protocol/proxy_protocol/parser.rs`; the
public entry point is `parse_v2_header` (`parser.rs:33`). It uses `nom` and
returns either a complete `HeaderV2`, an `Incomplete` request for more
bytes, or a parse error.

---

## 2. Three Roles, Three SessionStates

`lib/src/protocol/proxy_protocol/mod.rs` exposes three sibling
`SessionState`s, one per role in the data path:

| Role     | Module         | Front interest at boot | Back interest at boot | When used                                                                                |
|----------|----------------|------------------------|-----------------------|------------------------------------------------------------------------------------------|
| `expect` | `expect.rs`    | `READABLE\|HUP\|ERROR` | n/a                   | Server-side ingress: the upstream LB sent us a v2 header; consume it, capture peer pair, transition to `Pipe`. |
| `relay`  | `relay.rs`     | `READABLE\|HUP\|ERROR` | `HUP\|ERROR`          | Forward an inbound v2 header verbatim onto a freshly opened backend socket.              |
| `send`   | `send.rs`      | `HUP\|ERROR`           | `HUP\|ERROR`          | Synthesise a v2 header describing the original client and emit it on the backend socket. |

All three carry a per-connection ULID (`request_id`) used by the log
context macros (`log_context!` in each module: `expect.rs:53`,
`relay.rs:44`, `send.rs:46`) so a session is grep-correlatable across the
PROXY phase and the downstream protocol.

### 2.1 `ExpectProxyProtocol`

- Type: `ExpectProxyProtocol<Front: SocketHandler>` (`expect.rs:79`).
- Buffer: `frontend_buffer: [u8; 232]` — the maximum legal v2 header size
  (Unix-socket family carries 2 × 108 bytes of address plus header overhead).
  Hard-bounded to defend against a malicious peer that opens TCP and never
  finishes the header.
- Entry point: `readable` (`expect.rs:117`).
  - `header_len` (`expect.rs:118-122`) tracks the expected read window;
    starts at the v4 size (28 bytes), bumps to v6 (52) and finally Unix
    (232) if `parse_v2_header` returns `Incomplete` after the prior cap.
  - 0-byte read with `index == 0` (`expect.rs:163-175`) closes the session
    immediately; this is the standard HAProxy bare-TCP healthcheck pattern
    (SYN/ACK/FIN with no `send-proxy`). Closing fast avoids zombie sessions
    sitting on `request_timeout` (default 10 s) and consuming the
    `nb_connections` quota.
  - Index of 232 with the parser still `Incomplete` (`expect.rs:203-212`)
    is the oversized-header sentinel — increment the
    `proxy_protocol.errors` metric and close.
  - Successful parse (`expect.rs:181-190`) stores the `ProxyAddr` into
    `self.addresses` and returns `SessionResult::Upgrade`; the proxy then
    swaps the session for a `Pipe` via `into_pipe` (`expect.rs:234`).

### 2.2 `RelayProxyProtocol`

- Type: `RelayProxyProtocol<Front: SocketHandler>` (`relay.rs:62`).
- Used when Sōzu sits between two PROXY-aware peers: read the inbound
  header, then write those exact bytes (and only those bytes) to the
  backend before any user-payload byte.
- Entry points: `readable` (`relay.rs:106`) feeds the parser; on a complete
  parse it flips `frontend_readiness.interest` to drop READABLE and arms
  the backend WRITABLE bit (`relay.rs:131-135`). `back_writable`
  (`relay.rs:158`) drains the captured prefix on the backend socket and
  returns `SessionResult::Upgrade` once the cursor reaches the recorded
  `header_size`.

### 2.3 `SendProxyProtocol`

- Type: `SendProxyProtocol<Front: SocketHandler>` (`send.rs:64`).
- Used when the front-end accepted a non-PROXY connection but the
  downstream backend expects PROXY-v2. Sōzu synthesises a header from the
  TCP peer pair captured on the frontend socket
  (`peer_addr` / `local_addr` at `send.rs:117-124`).
- Entry point: `back_writable` (`send.rs:109`). On first call it builds the
  header lazily (`send.rs:116-131`). The drain loop (`send.rs:135-159`)
  writes until the cursor reaches `header.len()` and returns
  `SessionResult::Upgrade`; partial writes set `WouldBlock` and yield to
  the event loop.

---

## 3. State Machine

```
     pre-protocol idle
            │  (event-loop notices READABLE on frontend)
       ─────────────────
       header parse loop
       ─────────────────
   ┌────────┴────────┐
   │                 │
parse error      Incomplete
or oversize          │
   │            (await more bytes)
   ▼                 │
SessionResult::      │   parse OK
Close                │       │
                     ▼       ▼
              header captured: peer pair
              stored, role-specific handoff
        ┌──────────────┼──────────────┐
        ▼              ▼              ▼
     expect        relay          send
   (into_pipe)   (back_writable, (back_writable,
   transitions  drains prefix to drains synth
   to Pipe)     backend)         header to backend)
        │              │              │
        ▼              ▼              ▼
            SessionResult::Upgrade
            (proxy swaps in the
             downstream protocol's
             SessionState)
```

The `Pipe` (`lib/src/protocol/pipe.rs`) is the typical downstream — it owns
the bidirectional byte-stream forwarding for TCP listeners. For HTTP(S)
listeners the downstream is built from `Http<Front, L>` + the mux
`Connection` enum, depending on the negotiated protocol.

---

## 4. Hardening Notes

The PROXY-protocol surface is the very first byte path on a new connection,
which makes it an attractive target. These rules are load-bearing.

1. **Bounded buffers, no growth.** `ExpectProxyProtocol::frontend_buffer`
   is a stack-sized `[u8; 232]` (`expect.rs:82`) — the maximum legal v2
   header size. There is no growable backing — a peer that floods bytes
   without a valid header trips the oversized-header branch
   (`expect.rs:203-212`) and is closed.
2. **TCP healthchecks bypass the protocol.** Upstream LBs probe backends
   with bare TCP (SYN/ACK/FIN) and never send `send-proxy`. The fast-close
   branch in `expect.rs:163-175` handles that gracefully — without it,
   every healthcheck would idle for the full `request_timeout`. Do not
   "fix" that branch without measuring against an HAProxy mesh.
3. **`MAX_LOOP_ITERATIONS` ceiling.** All three modules import
   `MAX_LOOP_ITERATIONS` from `sozu_command::config`
   (e.g. `expect.rs:15`); any drain or read loop must respect it so a
   misbehaving peer cannot starve the single-threaded event loop.
4. **Error counters on every reject.** Every `Close` path bumps the
   `proxy_protocol.errors` counter via `incr!` so operators can alert on a
   sudden spike (e.g. a backend that started rejecting the protocol).
5. **No panic on adversarial input.** Per the repo `CLAUDE.md`
   security-sensitive areas list, the proxy-protocol path must convert
   parse errors / partial reads / oversized headers into
   `SessionResult::Close` plus a metric and a contextual log line. New
   error paths follow the existing pattern (see `expect.rs:217-225`).
6. **`HeaderV1` is dead weight.** Per the `header.rs:47` comment the v1
   variant is never produced or consumed; tests are commented out
   (`header.rs:99` onwards). Removing it is a documented follow-up — until
   then, do not extend it.

---

## 5. Cross-References

- `lib/src/protocol/pipe.rs` — the typical downstream after `expect`.
- `lib/src/protocol/kawa_h1/LIFECYCLE.md` — H1 frontend that follows
  PROXY-v2 ingress on HTTP listeners.
- `lib/src/protocol/mux/LIFECYCLE.md` — H2 mux that follows PROXY-v2
  ingress on HTTPS listeners.
- `bin/src/command/LIFECYCLE.md` — how listener configuration (incl.
  `expect_proxy`, `send_proxy`) is delivered from the supervisor.
- HAProxy upstream spec: <https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt>