# Rayfish
P2P mesh VPN powered by [iroh](https://iroh.computer). Connects peers by cryptographic identity (EndpointId), not IP address. Dual-stack addressing: stable IPv4 in 100.64.0.0/10 (CGNAT, FNV-1a of identity) and stable IPv6 in 200::/7 (blake3 of identity, 120-bit, never rotates).
## Build
```bash
cargo -q build # add --features tor for Tor transport, --features otel for OTLP span export
cargo -q check
cargo -q test
cargo -q clippy
cargo bench # Criterion microbenchmarks of the per-packet data path (benches/forward.rs)
```
The crate splits into a library (`src/lib.rs`, daemon modules as `pub mod`) and a thin binary (`src/main.rs`, the `ray` CLI/IPC client, `use rayfish::…`). The split lets benchmarks (`benches/`) and integration tests reach the internal data path; `cargo install` builds the binary against the in-package library unchanged.
## Run
The daemon (`ray daemon`) owns the TUN device and iroh endpoint and runs as a system service. CLI commands talk to it over Unix-socket IPC.
```bash
sudo ray up # install+start the service, then activate the VPN
ray create [--open] [--name n] [--hostname h] [--tor] # closed by default; --open = public network. Prints room id (public key)
ray join <room-id-or-invite> [--name alias] [--hostname h] [--auto-accept-firewall] [--tor] # join by room id or one-time invite code; --auto-accept-firewall auto-installs suggested rules (managed node/server)
ray leave <net> | nuke <net> # nuke = publish empty record then leave
ray hostname <net> <name> # change hostname on existing network
ray status # all networks (works without daemon); per-host traffic, member count excludes self. Ends with a `pending` summary of things awaiting the user (firewall suggestions, join requests, file offers, connection requests) each with the command that clears it
ray <cmd> --json # global flag: machine-readable JSON for status/firewall show/files/invite list/requests/admin list/ping/netcheck (color + spinners off)
ray report # bundle logs+metrics, open a pre-filled GitHub issue
ray ping <peer> [-c N] [-i ms] # active mesh probe: per-probe RTT + loss + direct/relay path for a peer (hostname/mesh IP/short id). Sends live echo probes (ControlMsg::Ping/Pong over the mesh connection), unlike status's passive snapshot. -c/--count (default 3), -i/--interval ms (default 1000); --json emits the per-probe array
ray netcheck # local endpoint diagnostics: bound UDP port (+ fixed-forwardable vs ephemeral fallback), home relay + its latency, public IPv4/IPv6, UDP reachability (iroh net report). --json
ray up [--hostname h] | down # activate / standby. down takes the data plane (TUN + Magic DNS) offline but stays connected to peers (still online); --hostname sets your default name
ray invite <net> [--expires 7d] [--hostname H] [--qr] # coordinator-only: mint single-use invite; --qr prints a scannable QR; --hostname binds an authoritative name (overrides joiner choice, rejected on collision)
ray invite <net> --reusable [--expires 30d] # mint a reusable (multi-use, expiring) key for unattended fleets; rides the signed blob, no hostname binding. Servers: ray join <key> --hostname H --auto-accept-firewall
ray invite <net> list|revoke <id> # list / revoke invites (reusable keys tagged; revoke propagates via the blob)
ray requests <net> # coordinator-only: peers awaiting live approval
ray accept <net> <id> | deny <net> <id> # admit / reject a pending join request
ray connect <contact-id> [--hostname h] # request a direct 2-peer connection by the peer's contact id (no room id/invite); blocks as pending until they approve
ray connections [approve <id>] # list incoming connect requests (default) / approve one → mints a 2-peer network with the requester pre-approved
ray contact [id|rotate] # print (default) or rotate your shareable contact id (also shown at the top of `ray status`)
ray admin <net> add <id> | list # coordinator-only: grant the network key (co-coordinator) / list key-holders
ray firewall show|default|add|remove ... # per-device local firewall. Default posture: inbound TCP/UDP denied, inbound ICMP allowed, outbound allowed. `firewall default allow|deny` sets the inbound default. `--port`/`-P` takes a single port, a `start-end` range, or a comma list (`80,443`, `22,8000-9000`) that expands to one rule per item
ray firewall reject on|off # "fail fast" REJECT mode (opt-in, default off). On = a denied packet gets a TCP RST / ICMP-unreachable reply (both directions) so the initiator fails immediately ("connection refused") instead of hanging; off = silent drop (stealthy). Surfaced in `firewall show`
ray apply <spec> [--prune] [--dry-run] [--invite-missing] [--example] # declarative deploy (YAML only): create closed nets + suggest firewall + report membership gap
ray firewall suggest <net> --subject H [--allow peer:proto:ports] [--deny peer:proto:ports] # coordinator-only: suggest rules on any network (rides the signed blob). Subject/peer `*` = all hosts / any peer. `--allow`/`--deny` value is `[peer:]proto:ports` — the `peer:` prefix is optional, so a bare `tcp:22` (or `icmp`) means "any peer" (parsed by `main::parse_suggest_token`: a leading protocol keyword ⇒ peer `*`). Token grammar: `proto:ports` (tcp:22, udp:53, tcp:*, any:*) or bare proto (icmp, any, tcp). Suggestions are **additive** — each token becomes one allow/deny rule; an allow-list relies on the node's own inbound default-deny to block the rest (no catch-all is synthesized), denies-only ⇒ blacklist
ray firewall pending <net> | accept <net> | deny <net> # review/accept/discard queued suggested rules. On a TTY, `pending` is an interactive picker (↑↓ · enter accept · d deny · a all · q done); piped/`--json` falls back to a static table
ray firewall auto-accept <net> on|off # toggle this node's auto-install of suggested rules for a network (on = install current queue)
ray mdns on|off # local peer discovery (default on)
ray config [get [key] | set <key> <value> [--replace] | unset <key>] # global server overrides; keys: relay, discovery-dns, dns-upstreams. Value is a comma list of presets (rayfish/n0), URLs, or IPv4s (multiple custom relays allowed). Default augments n0; --replace swaps them out. `n0`/empty resets. Written client-side to settings.toml (like mdns); all apply on `sudo ray restart`
ray send <file> <peer> # file sharing; ray files [accept <id> [--output dir]]
ray pair [<ticket>|backup|restore <code>] # multi-device identity
ray pair backup [--1password [--vault V] [--item T]] # encrypted key backup; --1password stores the enc1 blob in 1Password (op CLI)
ray pair restore [<code>|--1password [--vault V] [--item T]] # restore from a code or from 1Password
ray completions <shell>
ray version | ray --version | ray -V # print the compiled rayfish version + git sha
ray update [--check] [--force] [--nightly] [--list] [--version V] # self-update from GitHub releases. Default = latest stable; --nightly tracks the rolling nightly pre-release (rebuilt on every commit to master); --version V pins a specific release (downgrades allowed); --list prints available releases; --check reports current vs latest without installing. Prints the release notes of every pending version (stable: each release in (current, latest]; nightly/pinned: the resolved release body) before updating, and in --check output. Replaces this binary, then (if the service is installed) restarts the daemon onto it (needs root). No persisted channel — each run picks its target from the flag
```
**Privilege & access (Tailscale operator model):** the always-root daemon does privileged work; clients are unprivileged. The IPC socket is mode `0666`; authority comes from a per-request `SO_PEERCRED` UID check in `DaemonState::check_authorized()`, not socket permissions. Reads (`status`, `*… show`, `files`) are open to any local user; mutating commands need root or the configured `operator_uid`; `set-operator` is root-only. Only `install`, `restart`, `uninstall`, `start`, `stop`, `set-operator`, and `daemon` need `sudo`; everything else (incl. `up`/`down`) is IPC. `ray up`/`install` auto-grant operator to `$SUDO_USER`.
```bash
sudo ray install | restart | uninstall # manage the service unit/plist
sudo ray start | stop # start / stop the service. stop = fully offline (closes peer connections); start = back online
sudo ray set-operator <user> # authorize a user to run ray without sudo
```
### Cross-compile & deploy
```bash
just cross # build for x86_64 Linux
just deploy <ip> # cross-build release + install + start daemon
just deploy-dev <ip> # same, debug build
```
## Architecture
```
App → TUN (100.64.x.x / 200::x) → rayfish → iroh QUIC datagrams → peer
```
One iroh Endpoint and TUN device are shared across all networks. Each network gets its own ALPN (`rayfish/net/<version>/<pubkey-prefix>`); the `ProtocolRouter` dispatches incoming connections by ALPN to per-network handlers. The leading `<version>` (`transport::MESH_PROTOCOL_VERSION`) makes the ALPN the **mesh protocol-version gate**: iroh negotiates the ALPN during the QUIC handshake, so peers on different mesh versions share no common ALPN and cannot connect — no in-band version handshake exists. Bumping the constant on a breaking mesh change severs old peers automatically (likewise the versioned `connect`/`files`/`pair` ALPNs gate their own protocols).
### Modules
- `src/main.rs` — thin clap CLI + IPC client; service install/start (`cmd_up`, `install_and_start_service`), `cmd_install`/`cmd_restart`/`cmd_uninstall_service` (root-gated), `cmd_set_operator`, `cmd_pair`. `ray daemon` (hidden) runs the foreground daemon loop. `build.rs` stamps the git short SHA into `RAY_GIT_SHA` (see Self-update).
- `src/daemon.rs` — daemon process. `DaemonState` (endpoint + TUN + PeerTable + ProtocolRouter); `NetworkHandle` per active network (per-network `invite_lock`); `NetworkState` (access `mode`, `suggested_firewall`, in-memory `pending` joins, `pending_suggestions` consent queue). Hosts the IPC server, accept handling (`CoordinatorAcceptState`/`MemberAcceptState` via `AcceptHandler`), reconnect loop, DHT publisher, group poller, activate/deactivate, nuke, invite/approval + firewall/file/pairing IPC handlers, `apply_suggested_firewall`, DNS updates. Admission gate: `CoordinatorAcceptState::handle_connection` → `admit_peer` (open / valid invite / pre-approved) vs queue-as-pending (closed). `register_coordinator_handler` (create, restore, admin-promotion) registers `CoordinatorAcceptState` and sets `NetworkRole::Coordinator`; `promote_to_coordinator` swaps a live member to it on `AdminGrant`. Fresh joins dial in `coordinator_dial_order` (minter, then other `is_coordinator` members); `gossip_targets` picks live coordinators for `InviteShare`/`InviteUsed`. `ray connect` handlers (`connect`/`list_connections`/`approve_connection`/`rotate_contact`) live here; `ProtocolRouter` holds the `pending_connects`/`approved_connects`/`outgoing_connects` `DashMap`s and the `CONNECT_ALPN` accept arm; `create_network_inner` takes `direct` + `pre_approve` to mint a 2-peer network. Diagnostics: `ping()` resolves a peer to its mesh IPv4 (`resolve_peer_ip`), sends `ControlMsg::Ping` over the peer's live connection, and awaits the `Pong` via `ProtocolRouter.pending_pongs` (nonce→oneshot, fired by both control readers; `respond_pong` echoes); `netcheck()` reads `endpoint.bound_sockets()` + `endpoint.net_report()` (iroh `unstable-net-report` feature) for port/relay/reachability. Both are open reads.
- `src/ipc.rs` — `IpcMessage` enum (requests + responses incl. `InviteCreate`/`InviteList`/`InviteRevoke`/`Requests`/`AcceptRequest`/`DenyRequest` + `InviteCreated`/`InviteListResponse`/`PendingRequests`; `Join` carries optional `invite` secret + `coordinator` to dial directly; `ray connect` adds `Connect`/`Connections`/`ApproveConnection`/`ContactId`/`RotateContact` + `ContactIdResponse`, `StatusResponse.contact_id`, `NetworkRole::Direct` (display-only)), `MsgpackCodec` (length-prefixed msgpack over Unix socket), socket at `/var/run/rayfish/rayfish.sock`.
- `src/identity.rs` — persistent Ed25519 keypair (`<config_dir>/secret_key`, `0600`); device certs (`create/store/load_device_cert`). Resolves its dir via `config::config_dir()` and writes via `config::write_file`.
- `src/onepassword.rs` — `op` CLI wrapper for `ray pair backup/restore --1password`: `op_available`/`store` (create-or-update an item, secret piped via stdin) / `read`. Transports the existing `enc1…` encrypted backup blob to/from a 1Password item; CLI-side only.
- `src/invite.rs` — coordinator-only **single-use** invite ledger (`<config_dir>/invites/<network>.toml`, `0600`, written atomically via `config::write_file`): `Invite { id, secret_hash, created, expires, status }`, `InviteStore` (`mint`/`redeem`/`revoke`/`list`/`restore`/`record_shared`/`burn_by_hash`; single-use + expiry; only the blake3 hash is persisted). `redeem` burns a secret at admission; `restore` un-burns it if `admit_peer` then rejects the join (hostname/IP collision) so the holder isn't locked out. `encode_invite_code`/`decode_invite_code` = `bs58(network_pubkey(32) || coordinator(32) || secret(16))`. Never in the GroupBlob. **Cross-coordinator gossip:** `record_shared` inserts a received `InviteShare` so this coordinator can validate and burn it; `burn_by_hash` marks it used on `InviteUsed`. **Reusable keys live in the signed blob** (`membership::ReusableKey`), not here; `generate_secret`/`encode_invite_code` are shared by both.
- `src/membership.rs` — `IdentityProvider`, IPv4/IPv6 derivation, `MemberList`/`ApprovedList`, `GroupBlob { members, approved, suggested_firewall, name, reusable_keys }` with canonical msgpack + blake3 hashing (`canonical_group_bytes`/`group_blob_hash`; BTreeMap keys ⇒ canonical bytes). `Member`/`ApprovedEntry` carry optional `user_identity` + `device_cert`, a boolean `is_coordinator` (set by `ray admin add`, published so joiners discover co-coordinators), and `collision_index: u32`. `assign_ip(members, identity) -> (Ipv4Addr, u32)` picks the lowest free collision index for per-member `(ip, index)` at admission; `validate_member`/`validate_approved` check the stored IP against `derive_ip_with_index`; `validate_no_duplicate_ips` rejects duplicate-IP rosters; `resolve_ip_tiebreak` re-seats contested entries in identity order (lowest keeps its index, others re-roll), run by reconverge before applying a fetched roster. `ReusableKey { id, created, expires, revoked }`, keyed by hex `blake3(secret)`: `from_secret` mints, `revoke_reusable` flips `revoked` (exact/prefix id), pure `validate_reusable_key(keys, secret, now)` is the admission decision. `SuggestedFirewall`/`HostSuggestions` live in `ray-proto` (`policy.rs`) so they cross IPC, ride in the blob, and parse from a `ray apply` spec uniformly.
- `src/transport.rs` — iroh endpoint setup, per-network ALPN (`network_alpn` → `rayfish/net/<MESH_PROTOCOL_VERSION>/<prefix>`; the version segment gates mesh-protocol compatibility at ALPN negotiation — bump `MESH_PROTOCOL_VERSION` on a breaking mesh change to sever old peers); identity-level `CONNECT_ALPN` (`rayfish/connect/1`) for the `ray connect` handshake; optional Tor transport (`tor` feature). The shared endpoint binds a **fixed UDP port** `RAYFISH_LISTEN_PORT` (41383), so the port is stable across restarts and can be manually port-forwarded for guaranteed direct reachability (iroh still does NAT traversal/UPnP/PCP, discovery, and relay fallback on top). If the port is in use the daemon warns and falls back to an ephemeral port (`0.0.0.0:0`) so it always starts. The port is hard-coded (one shared endpoint), so a manual forward benefits only one node per LAN; make `RAYFISH_LISTEN_PORT` configurable if multi-node-per-LAN forwarding is ever needed. **Custom relay/discovery overrides:** `bind_endpoint` keeps `Endpoint::builder(presets::N0)` and conditionally overrides it from the `config::ServerOverride` settings — `build_relay_mode` swaps relays (`RelayMode::custom`; replace = configured only, augment = configured + n0 defaults via `RelayMode::Default.relay_map().urls()`; supports multiple custom relays), and `apply_discovery` stacks `PkarrPublisher::builder`/`PkarrResolver::builder` per configured discovery URL (`address_lookup` is additive; replace clears the preset's first). Both are no-ops when unset, so the default bind is byte-for-byte unchanged. Set via `ray config set relay|discovery-dns ...`; applied at next daemon start.
- `src/tun.rs` — async dual-stack TUN (IPv4 /10 + IPv6 /128), split into `TunReader`/`TunWriter`. `configure_ipv6()` assigns the TUN's own IPv6 at creation (Linux netlink via rtnetlink, macOS ifconfig). `route_peer_range()` installs the `200::/7` peer-range route into the TUN and **must run after link-up** (called from `DaemonState::activate()` post-`set_link_up`) — on Linux the kernel won't install an IPv6 connected route while the link is down, so peer traffic would otherwise leak out the host's default IPv6 route (Linux: rtnetlink `RouteMessageBuilder`; macOS: `route add -inet6 -net 200::/7`). Idempotent across `up`/`down`. `route_self_loopback(v4, v6)` (also from `activate()` post-link-up) installs `-host` routes for our own dual-stack addresses via `lo0` so self-traffic (e.g. pinging our own `*.ray` hostname) is answered locally instead of sent out the point-to-point `utun` and dropped as "no peer for dst" — macOS only (a point-to-point `utun` lacks the `<own-ip> -> lo0` route a broadcast interface gets automatically; Linux's `local` route table handles it, so it's a no-op there).
- `src/forward.rs` — TUN ↔ peer forwarding via dual-stack routing lookup; firewall enforcement; labeled drop counters; resolves transport keys to user identities via `DeviceUserMap`. `run_mesh` intercepts UDP packets destined for `MAGIC_DNS_V4:53` (100.100.100.53) and answers them in-daemon via the `dns_resolver::Resolver`, so Magic DNS never binds the host's port 53. On a firewall deny, if `reject` mode is on it emits a `reject::build_reject` reply — outbound deny injects it into our own TUN (`tun_tx`), inbound deny sends it back over the peer `conn` — so the initiator fails fast.
- `src/reject.rs` — "fail fast" REJECT reply builder (opt-in via `ray firewall reject on`). `build_reject(packet, info)` synthesizes a TCP RST (RFC 793 reset rules) for denied TCP, or an ICMP/ICMPv6 destination-unreachable (UDP → port-unreachable; other → admin-prohibited) for everything else, src/dst swapped so it looks like it came back from the destination. Hand-rolled IPv4/IPv6 + TCP/ICMP checksums (no new deps, mirrors `dns_packet`). Returns `None` to keep silent-dropping for loop-risk cases: an incoming RST, an incoming ICMP error, a multicast/broadcast source, or a too-short packet.
- `src/dht.rs` — one pkarr record per network (blob hash + seed peers + `m,<mesh-version>` = the coordinator's `transport::MESH_PROTOCOL_VERSION`, via `mesh_version_from_record`/`resolve_network_packet`); only the coordinator (per-network secret key) can publish. Plus a per-user **contact record** (`_rayfish_contact`, signed by the contact key) mapping `contact_pubkey → current endpoint` for `ray connect` (`publish_contact`/`resolve_contact`); a TTL/2 active-gated publisher (`spawn_contact_publisher`) keeps it fresh. The pkarr relay URL defaults to `dns.iroh.link/pkarr` but follows the `discovery-dns` config when set: `set_discovery_override` (called once in `build_daemon`) stores the first configured discovery URL in a process `OnceLock`, and `effective_pkarr_url`/`create_pkarr_client` read it. One client = one URL, so only the first discovery URL is used for record publish/resolve (multi-URL discovery helps endpoint resolution but not dht publishing).
- `src/control.rs` — length-prefixed msgpack control protocol over QUIC streams (`JoinRequest`, `JoinPending`, Welcome, MemberApproved, MeshHello, BlobUpdated, `AdminGrant`, `InviteShare`, `InviteUsed`, …); `DeviceCert`, `PairMsg`. A fresh joiner sends `JoinRequest { invite_secret, hostname, device_cert }` first; the coordinator replies `Welcome`, `JoinPending`, or `JoinDenied` on the same stream. `AdminGrant` carries the per-network secret key to a member over the authenticated mesh ALPN (coordinator → co-coordinator). `ConnectMsg` (`Request`/`Pending`/`Approved`/`Denied`) is a separate enum for the `ray connect` handshake over `CONNECT_ALPN`, framed with `send_framed`/`recv_framed`. `InviteShare { id, secret_hash, expires }` is gossiped by the minting coordinator when a single-use invite is minted; `InviteUsed { secret_hash }` when one is redeemed — so any coordinator can validate and burn cross-minted invites. Both are ignored if the sending peer is not `is_coordinator` in the verified roster. `Ping { nonce }` / `Pong { nonce }` back `ray ping`: the receiver echoes the nonce over a *fresh* `open_bi` stream (the control readers drop the request stream's send half, so the reply can't ride it back), and the pinging daemon correlates via the `ProtocolRouter.pending_pongs` nonce→oneshot map to measure RTT. Both control readers (coordinator + member) handle them; they pass through the same `ControlGate` rate limit. Unknown to old peers (graceful: they don't reply → 100% loss, no ALPN bump).
- `src/peers.rs` — `PeerTable` (dual v4/v6 DashMaps), `DeviceUserMap`. A peer keeps one virtual IP across every network it joins, so each `PeerEntry` holds a *set* of connections (`network → Connection`); `lookup_v4/v6` return a `PeerRoute` (a deterministically-chosen connection + all shared networks, for union reachability/firewall checks). A multi-homed peer stays reachable while it shares one live connection; `remove_peer_from_network()` drops a single network's route, `remove()` drops it everywhere.
- `src/config.rs` — config storage. **Sharded, atomic, per-network** (replaces the old single `networks.toml` whose non-atomic full-file rewrites raced under concurrent load-modify-save and silently dropped networks): globals live in `settings.toml` (`mdns_enabled`, `operator_uid`, `default_hostname`, `contact_secret_key`, and the three `ServerOverride` settings `relay`/`discovery_dns`/`dns_upstreams`), each network in `networks/<name>.toml` (per-network secret/public key, `my_hostname`, `pending_hostname`, `group_mode`, `auto_accept_firewall`, `admins`, `direct`). `pending_hostname` is the durable "deliver this rename to a coordinator" intent set by `ray hostname` on a member: unlike `my_hostname` it is **not** overwritten when a reconverge applies a stale blob, so the rename keeps being re-sent until the signed roster confirms it (cleared by `rename_satisfied`). Writes go through `write_file` (temp file + `rename`, atomic on POSIX — no torn reads) and are **targeted**: `save_network`/`load_network`/`delete_network` touch one file, `save_settings` only the globals — so a write to one network can never clobber another (in-memory `upsert_network`/`remove_network` no longer persist). `load()` assembles the in-memory `AppConfig` from `settings.toml` + `networks/*.toml`, running a one-time `migrate_legacy` (splits an existing `networks.toml`, keeping it as `networks.toml.bak`). **Location:** `config_dir()` is `/etc/rayfish` on Linux (system service), `~/.config/rayfish` on macOS; dirs `0750 root:rayfish`, secret-bearing files (`secret_key`, `networks/*.toml`, `settings.toml`, `invites/*.toml`) `0600 root:root`, non-secret (`firewall.toml`, `audit.log`, `device_cert`) `0640 root:rayfish` (perms/owner applied by `write_file`/`restrict_perms`; the `rayfish` group is created on Linux install via `main::ensure_rayfish_group`). Every module resolves its paths through `config::config_dir()`. **`ServerOverride` settings** (relay/discovery-dns/dns-upstreams) carry pure resolver/validator helpers here: `relay_urls`/`discovery_urls` expand the `rayfish` preset + validate http(s) URLs, `resolve_upstreams` merges custom DNS upstreams over the system-captured set (consumed at `daemon::activate` before `set_upstreams`), and `config_set`/`config_get` back the `ray config` CLI (n0/empty resets, unknown key/bad URL/IP rejected before persist).
- `src/apply.rs` — declarative deploy spec for `ray apply`: `DeploySpec { networks: BTreeMap<network, SuggestedFirewall> }` — each network maps **directly** to its firewall subjects (no `firewall:` wrapper). **YAML only** (`load` rejects non-`.yaml`/`.yml`, parses via the `config` crate's YAML format; note the crate **lowercases keys**, so network/host names must be lowercase). `expected_hosts()` = union of subject + peer hostnames across the spec's networks, **excluding `*`**. `EXAMPLE_SPEC` (YAML, includes the wildcard Minecraft case) is printed by `--example`. The orchestrator lives in `main::ipc_apply`.
- `src/firewall.rs` — per-device firewall (direction/proto/port/peer + optional arrival-`network`), `ArcSwap` for lock-free reads, dual-stack packet parsing; `firewall.toml`. Direction-aware defaults (`default_inbound`/`default_outbound`) and the seeded `allow in icmp` rule (`default_icmp_rule()`, `origin: Local`) — see the Firewall key flow. The optional `network` field (`None` = any, back-compat) scopes a rule to one network's traffic, so a multi-homed host can restrict a peer per-network (e.g. `:8080` only from peers reached via `db`). `RuleOrigin` (`Local` | `Network(net)`) records provenance so reconvergence replaces the `Network(net)` set without touching `Local` rules. `materialize_suggestions` builds a subject's inbound rules from a `SuggestedFirewall` (token grammar `proto:ports`/bare proto), purely additively — one rule per token, no synthesized catch-all (an allow-list whitelists by relying on the node's own inbound default-deny; denies-only ⇒ blacklist; empty ⇒ nothing). `SharedFirewall::replace_network_rules` swaps one network's suggested set; `format_firewall_show` tags suggested rules `(suggested by <net>)`.
- `src/dns.rs` — Magic DNS responder for the `.ray` TLD (A/AAAA/PTR/SOA for `*.ray`); reached via the magic IP `100.100.100.53` (`MAGIC_DNS_V4`) routed through the TUN — no host-level port 53 bind. `handle_query` is called by `forward::run_mesh` when it intercepts a UDP DNS packet destined for the magic IP. `HostnameTable` + `ReverseLookupTable`; `sync_network_hostnames` rebuilds a network's forward+reverse entries from its member roster (run on every roster update so renames/joins/leaves reflect immediately).
- `src/dns_config.rs` — OS DNS config (`DnsConfigurator` trait). Points the OS at the magic resolver IP `100.100.100.53` (`dns::MAGIC_DNS_V4`) so `.ray` queries are intercepted by the TUN. macOS: SCDynamicStore. Linux detection chain: systemd-resolved D-Bus → NetworkManager D-Bus → resolvectl → resolvconf → `/etc/resolv.conf` takeover (`DirectResolvConf`, captures upstreams via `captured_upstreams()` for forwarding non-`.ray` queries). Split-DNS where available; falls back to full `/etc/resolv.conf` takeover. **Direct-mode anti-trample (Tailscale-style):** (1) `run_resolv_reassert` watches `/etc` via **inotify** and re-asserts our resolv.conf in ~ms when NetworkManager/dhclient overwrites it (30s tick + initial assert backstop the watch; threads the captured `search_domains()` through so a repair preserves them); (2) on an NM host, `apply()` installs a `dns=none` drop-in (`/etc/NetworkManager/conf.d/rayfish-dns.conf`, `nm_quiet_install`) + reloads NM so NM stops regenerating resolv.conf at all — removed + reloaded on `revert()` (`nm_quiet_remove`). Both the NM drop-in and the resolv.conf backup (`.before-rayfish`, always taken before overwrite) are **marker-guarded** so we never touch an operator's own config. **Crash safety:** the daemon panic hook calls `emergency_restore_resolv_conf()` (synchronous) to restore the backup + drop the `dns=none` snippet before `abort()`, and `restore_stale_backups()` cleans both on next start — so a crash can't leave the host pointing at the now-dead resolver with NM silenced (which would blackhole all DNS).
- `src/hostname.rs` / `src/network_name.rs` — hostname + local-alias generation and collision resolution (`resolve_collision` appends `-1`, `-2`, … on a clash, e.g. `dario` → `dario-1`).
- `src/stats.rs` — iroh-metrics `ForwardMetrics`/`PeerMetrics`, Prometheus export on `:9090`; `ForwardMetrics::snapshot()` reads counters into a serializable `MetricsSnapshot` for `ray report`.
- **CLI presentation** (dependency-light, all gated on `style::is_enabled()` = TTY + not `NO_COLOR`/`--json`): `src/style.rs` — 256-color ANSI palette + glyphs (`dot_online`/`dot_offline`/`check`/`cross`/`marker`/`latency`); `set_plain(true)` forces everything off (used by `--json`). `src/layout.rs` — ANSI-width-aware borderless column aligner (`Cell`/`columns`, via `unicode-width`); `main::table()` is the shared header+rows helper every list routes through. `src/progress.rs` — `indicatif` spinner factory (stderr, hidden when plain) for slow ops (`join`, service start, file download). `src/picker.rs` — `crossterm` inline (no alt-screen) interactive list for `ray firewall pending`; returns per-rule accept/deny `Resolution`s sent as `FirewallResolveSuggestions`. Firewall rules cross IPC as `ray_proto::ipc::FirewallRuleView` (pre-stringified, `Eq`/`Hash`) so the CLI renders/serializes and the daemon value-matches queued rules.
- `src/logdir.rs` — daemon log directory (`/var/log/rayfish` on Linux, `/Library/Logs/rayfish` on macOS). The daemon writes rolling daily files via `tracing-appender` (set up in `main::init_tracing`, which retains the 7 most recent daily files so logs older than ~a week are pruned automatically); `ray report` bundles them.
- `src/ratelimit.rs` — `ControlGate`: a per-connection token-bucket guard (the `ratelimit` crate) plus a strike counter over inbound control messages. `check()` returns `Verdict::Allow`/`Drop`/`Close`; over-budget messages are dropped and a sustained flood trips `Close`. One per control-listener task (no shared state). See the Control-plane abuse defense flow.
- `src/shutdown.rs` — SIGINT/SIGTERM via `CancellationToken`. `src/audit.rs` — append-only audit log (`<config_dir>/audit.log`, TSV `timestamp\tevent\tip\tendpoint_id`); `AuditLog` is held by `PeerTable` (`PeerTable::with_audit`), logging `connect` on a peer's first connection in a network and `disconnect` when its last one drops (or the peer is removed for identity rotation). Best-effort: the daemon runs without auditing if the log can't be opened.
### Key flows
- **Create:** generate per-network `SecretKey` → derive addresses → build initial `GroupBlob` → publish blob + signed pkarr record → persist keys + `group_mode` → print public key as the room id. Closed (`Restricted`) by default; `--open` for public.
- **Access modes & admission:** the room id (network public key) is a published discovery key, **never** an admission credential. **Open** networks auto-admit any peer that reaches a coordinator. **Closed** networks gate three ways: a one-time **invite** (coordinator-only local ledger, gossiped via `InviteShare`/`InviteUsed` so any coordinator can redeem a cross-minted one); a **reusable key** (hash rides the signed `GroupBlob.reusable_keys` — multi-use, expiring, revocation propagates via the blob; `validate_reusable_key`; admits non-authoritatively — joiner-chosen hostname, suffix on collision); or **live approval** (unknown peer queued in `NetworkState.pending`, surfaced via `ray requests`, admitted with `ray accept`). The handler is `CoordinatorAcceptState`, run by **any node holding the network key** (`register_coordinator_handler` at startup, `promote_to_coordinator` on `AdminGrant`). The admitting coordinator assigns the joiner's IPv4 via `assign_ip` (lowest free collision index).
- **Join handshake:** resolve pkarr record → fetch + verify `GroupBlob` → dial in `coordinator_dial_order` (invite-pinned minter first, then other `is_coordinator` members, skipping self) until one replies `Welcome` → send `JoinRequest { invite_secret? }` first → coordinator replies `Welcome` (admitted), `JoinPending` (closed, awaiting `ray accept` — the joiner retries with backoff on the *same* coordinator; `JoinPending` is not a fallback trigger), or `JoinDenied`. The secret is matched first against the local single-use ledger, then the verified blob's `reusable_keys`; a single-use match burns, a reusable one does not. `ray join <reusable-key> --hostname H --auto-accept-firewall` is the unattended-server path. Then connect to other members with `MeshHello` and poll pkarr for blob updates. Reconnecting/restoring members use the legacy coordinator-speaks-first handshake (`initial = false`).
- **Gatekeeper:** any coordinator (any network-key holder) can approve identities and broadcast `MemberApproved`; once approved, any peer can welcome that identity. So admitting a fresh joiner survives any single coordinator being offline — the joiner dials the full coordinator set. The coordinator need not be online for *member* reconnects at all.
- **DHT (single-record):** one pkarr record per network signed by the per-network secret key. The pkarr address *is* the network public key, so records can't be spoofed (MITM-resistant). `spawn_group_poller()` refetches the blob every 60s when the hash changes.
- **Reachability model (segmentation-first):** a network is a reachability boundary — two peers exchange packets iff they share ≥1 network (a QUIC connection only exists within a shared network, so connection existence enforces it). Coarse access is the network split; the per-device firewall is the fine-grained layer (directional, port-, and network-scoped). Declarative provisioning of networks + suggested firewalls is `ray apply` (Phase B).
- **Firewall (local + coordinator suggestions):** per-device, first-match-wins, persisted in `firewall.toml`, with a stateful conntrack so return traffic for outbound flows passes under a deny default. **Secure-by-default inbound** (`default_inbound` serde-default `Deny`, `default_outbound` serde-default `Allow`): inbound TCP/UDP denied, inbound ICMP allowed, outbound allowed (conntrack lets return traffic back). ICMP-allow is the seeded, removable `allow in icmp` rule (not a special case) — deleting it makes the deny default cover ICMP. `ray firewall add` inserts at the front (newest wins) and merges by selector (`firewall::same_selector`, ignoring action), so toggling allow↔deny never accumulates dead rules. Applies to **all installs on upgrade** — an older `firewall.toml` missing the new fields deserializes into the secure posture (the seeded ICMP rule ships only with a fresh config, so an existing file keeps exactly its own rules). `ray firewall default allow|deny` flips the inbound default; neither touches outbound. On **any** network the coordinator (any network-key holder) can **suggest** rules — advisory, riding the signed `GroupBlob` (keyed by subject hostname; `*` subject = every node). Each node materializes rules for its own hostname (+ `*`), resolving peer hostnames → identities from the blob's member list (`*` peer = any), expanding each `proto:ports` token into one rule **additively** — no catch-all is synthesized, so an allow-list whitelists only by relying on the node's own inbound default-deny (denies-only = blacklist; empty subject = nothing) — see `src/firewall.rs`. Consent is **per-node, per-network**: **auto-accept** (`ray join --auto-accept-firewall` / `ray firewall auto-accept <net> on`, persisted as `config.auto_accept_firewall`) or manual `ray firewall accept|deny` (`pending_suggestions`). Hostname authority (so "allow from alice" resolves to the real alice) comes from **invite binding**, not a network flag. Rules re-materialize on every verified reconverge — the 60s poller, or a **payload-free** `BlobUpdated`/`MemberSync` *trigger* that reconverges from the network-key-signed pkarr record (`reconverge_and_apply`/`fetch_verified_blob`); `Local` rules are never touched. Trust model: suggestions come only from the verified blob (signed record → hash → blob → rules), never from a control message — those are triggers only. **Fail-fast (REJECT) mode** is an opt-in per-device toggle (`config.reject`, serde-default false, set via `ray firewall reject on|off`, shown in `firewall show`): when on, a denied packet is answered with a TCP RST / ICMP-unreachable (`src/reject.rs`) instead of being silently dropped, so the initiator's socket fails immediately rather than hanging to a timeout. Both deny directions reject (local outbound → injected into our TUN; remote inbound → sent back over the peer connection, where the initiator's conntrack admits the RST and the seeded `allow in icmp` rule admits the ICMP error). Default off keeps the stealthy drop posture.
- **Multiple admins = shared network key.** An admin is any machine holding the per-network secret; `ray admin add <net> <id>` grants the key to a member over the authenticated mesh ALPN (`AdminGrant`), making it a co-coordinator that can publish the signed blob, suggest firewall rules, and **admit fresh joiners**. The granter also sets `is_coordinator = true` on the grantee and republishes so the full coordinator set is visible in the blob — joiners use this for dial-fallback. The grantee persists the key and, on `AdminGrant`, calls `promote_to_coordinator` to swap from `MemberAcceptState` to `CoordinatorAcceptState`. `ray admin list <net>` shows the local node + granted identities (local record; the shared key is not attributable).
- **Declarative apply (`ray apply`):** reconcile networks against a spec — **YAML only**, a `networks:` map of `<name> → SuggestedFirewall` (`*` subject/peer = all hosts / any peer). The orchestrator (`main::ipc_apply`) fetches `Status` once, then per spec network: `Create` (closed) if absent (never joins), then publishes the firewall block as suggestions (idempotent — replaces the live set). `--prune` publishes exactly the spec's subjects, dropping out-of-band suggestions; without it, spec subjects merge over the live set. `--dry-run` echoes the normalized spec; `--example` prints a template. **Membership diff:** expected = union of subject + peer hostnames (excluding `*`); joined = this node + peers from `Status`. The gap is reported as `ray invite <net> --hostname <missing>` commands; `--invite-missing` mints them via IPC. Because an invite-bound hostname is authoritative, the spec's hostnames are exactly the names admitted nodes carry — so suggestions always resolve the peers they name. No lock file; the live signed blob is state.
- **Direct connections (`ray connect`):** a friend-request flow linking two peers with no shared room id or invite. Each node has a standing, **rotatable contact key** (`AppConfig.contact_secret_key`, distinct from transport and per-network keys), published to pkarr while active (`dht::publish_contact`, `_rayfish_contact` = `contact_pubkey → endpoint`) and advertised over `CONNECT_ALPN` (`rayfish/connect/1`). `ray connect <contact-id>` resolves → endpoint, dials, sends `ConnectMsg::Request{from_contact_id, from_endpoint, hostname}`; the recipient queues it (`pending_connects`) and replies `Pending`, so the initiator polls with backoff (`spawn_connect_retry`). `ray connections approve <id>` mints a 2-peer network via `create_network_inner(.., direct=true, pre_approve=Some((peer, hostname)))` — restricted, auto-named `me-peer`, requester pre-approved. The minter records `(room_id, coordinator)` in `approved_connects`; the initiator's next poll gets `ConnectMsg::Approved`, joins normally (→ `Welcome`), flags it `direct` (`join_direct`). A direct network is real (firewall/DNS/mesh apply) but `ray status` shows role `[direct]` (`NetworkRole::Direct`) and hides the room id. **Edge cases:** offline recipient → clean "contact offline" (publisher is active-gated); maps keyed by transport endpoint id survive contact-key rotation (old id stops resolving after its 300s TTL); duplicate requests idempotent; if both peers connect *and* approve at once, only the higher `endpoint.id()` mints (the lower defers via `outgoing_connects`), so exactly one network forms.
- **File sharing:** `ray send` adds the file to iroh-blobs and sends a `FileOffer` over `FILES_ALPN`; receiver queues it; `ray files accept` fetches the blob by hash and verifies it.
- **Pairing:** primary issues a ticket (`bs58(endpoint_id || secret)`) over `PAIR_ALPN`; secondary authenticates and receives a `DeviceCert` binding its transport key to the primary's user identity. Backup/restore encrypts the identity key (argon2 + chacha20poly1305) into an `enc1…` base58 blob (`make_backup_blob`). `--1password` (alias `--op`) on backup/restore transports that blob to/from a 1Password item (default title `Rayfish Identity`, optional `--vault`) via the `op` CLI (`src/onepassword.rs`, create-or-update, secret piped via stdin not argv). 1Password is transport only — the blob stays password-encrypted, so a vault compromise alone can't unlock the key. All `op` calls are CLI-side in the user's context, never from the root daemon.
- **Hostname change:** `ray hostname` propagates immediately and is coordinator-authoritative. The coordinator keeps a continuous per-member control reader (`spawn_coordinator_control_reader`); a member's rename re-sends `MeshHello`, the coordinator resolves collisions (`name`/`name-1`/…), updates roster + DNS, republishes the blob, broadcasts a payload-free `MemberSync` *trigger*. The member applies its name optimistically and is corrected when it reconverges from the signed record (on `MemberSync` or the 60s poller). The coordinator renaming itself runs the same republish+broadcast directly. **Reliable delivery:** a member's rename is persisted as `config.pending_hostname` (a durable intent), so it survives a flaky coordinator link or a daemon restart. The node announces the pending name fresh from config (`outgoing_hostname`) on every (re)connect — never a value captured at startup — and `drain_pending_rename` re-sends `MeshHello(pending)` to every roster coordinator after each reconverge until the blob reflects it. Because the drain *dials* coordinators, the coordinator's accept-side control reader always reads the hello regardless of which side first established the mesh link. `apply_roster_to_dns` is pending-aware: while a rename is unconfirmed it keeps showing/persisting the requested name and overrides the node's own DNS entry, instead of letting a stale blob revert it; once confirmed (`rename_satisfied`, which also accepts a coordinator-assigned `name-N` collision suffix) it clears the intent and follows the blob. Receivers rebuild DNS from the roster on every verified reconverge via `apply_roster_to_dns` → `dns::sync_network_hostnames` (the roster is the single source of truth for `*.ray`), clearing stale names. Admission hostname authority follows the **invite binding** (not a network flag): an invite-bound hostname (`ray invite --hostname`) is assigned exactly, and a clash with a different identity is rejected — no silent rename — so no peer can claim another's name to take its suggested firewall rules (`hostname::admission_hostname`). A joiner-chosen (free) hostname keeps collision resolution.
- **Reconnection:** per-peer reader detects drop → coordinator removes the dead peer; joiner reconnects with exponential backoff (1s–30s) then re-sends `MeshHello`.
- **Control-plane abuse defense:** `MemberSync`/`BlobUpdated` triggers (and `MeshHello`/invite gossip) are cheap to send but expensive to process and carry no per-message auth, so both control read loops (member listener in `join_mesh_shared`, `spawn_coordinator_control_reader`) gate each connection with a per-task token bucket — `ratelimit::ControlGate` (`src/ratelimit.rs`, the `ratelimit` crate + a strike counter). Over-budget messages are dropped; a peer that *sustains* a flood trips `Verdict::Close` and the connection is closed with `forward::ABUSE_CODE` (a non-intentional disconnect; the peer may reconnect — no quarantine). To stop a trigger burst from fanning into N reconverges, `MemberSync`/`BlobUpdated` now only `notify_one()` a **per-network debounced reconverge worker** (~300ms coalesce, single-in-flight) instead of awaiting `reconverge_and_apply` inline — so several coordinators broadcasting after one roster change collapse into a single pkarr resolve + reconverge, and a slow reconverge never blocks the accept loop. The pending-join queue is still unbounded (out of scope; `TODO(abuse-hardening)` in the closed-network admission path).
- **Leave:** `ray leave` gracefully closes its connections with `forward::LEAVE_CODE` before local teardown. Peers see `DisconnectEvent.intentional = true`: the coordinator prunes the member, republishes the blob, then broadcasts a payload-free `MemberSync` trigger so other members reconverge from the (already-republished) signed record and drop it immediately; the 60s poller is the backstop. A plain timeout/reset is *not* intentional, so an offline-but-not-departed peer stays a known member.
- **up/down (data plane) vs start/stop (whole daemon):** the daemon connects every saved network at startup (control plane) and keeps those connections for its whole lifetime, dropping them only on `leave`/`nuke`/shutdown. `activate()`/`deactivate()` toggle only the **data plane**: TUN link up/down, peer-range + loopback routes, Magic DNS config, and the inbound forward gate (the shared TUN writer drops packets while `active` is false). So `ray down` is standby: the node stays connected and online to peers (still receiving roster/blob/firewall updates) but carries no traffic and resolves no `.ray` names. `ray up` is near-instant (no re-dial). To go fully offline, `sudo ray stop` exits the daemon (connections close cleanly, peers see offline); `sudo ray start` brings it back with both planes on.
- **Report:** `ray report` → daemon `build_report()` gathers sysinfo + a `ForwardMetrics::snapshot()` + the *sanitized* `StatusResponse` (no secret keys) + recent log files, writes a `.tgz` to `/tmp`, and chowns it to the calling UID. The CLI prints the path and opens a pre-filled GitHub issue (`REPORT_REPO_URL`) to attach the bundle. Local-first, so the user reviews it before sharing; a managed upload service can later replace the GitHub step.
- **Self-update (`ray update`):** queries the GitHub releases API (`rayfish/rayfish`, the repo `install.sh` pulls from), maps host OS/arch to the published asset (`release_asset_name` → `ray-{os}-{arch}`), fetches the asset's `.sha256` sidecar first, decides whether a swap is needed, downloads the binary, **verifies SHA-256** before touching anything, then atomically swaps the running binary via `self-replace`. If the service is installed it goes through the full install path so the daemon comes back on the new binary. Needs root when the service is installed or the binary's dir isn't user-writable (`require_root`); `--check`, `--list`, and `ray version`/`--version` don't. Raw release binaries aren't archived, so no tar/gzip here. **Three targets, chosen per-invocation (no persisted channel):** default **stable** hits `/releases/latest` and gates on `semver` (`version_is_newer`, strictly-newer unless `--force`); **`--nightly`** hits `/releases/tags/nightly` (the rolling pre-release rebuilt on every commit to master by `.github/workflows/nightly.yml`) and — since nightlies share a `CARGO_PKG_VERSION` — decides up-to-date by comparing the published checksum against the **running binary's** SHA-256 (`sha256_hex`), not the version; **`--version X`** hits `/releases/tags/vX` and is "current" only if `X` equals the running version, so it can downgrade. `--list` enumerates `/releases` (newest first, `[pre-release]`/`(installed)` annotated). **Release notes:** before any swap (and inside `--check` when behind), `print_pending_changelog` surfaces what the update brings — stable walks `/releases?per_page=100` and prints the `body` of each non-prerelease in `(current, latest]` newest-first; nightly/pinned print the single resolved release's `body` (the `GhRelease.body` field, git-cliff output from `release.yml`). Best-effort: a fetch/parse failure prints nothing and never blocks the update. `build.rs` stamps the git short SHA into `RAY_GIT_SHA`; `FULL_VERSION` = `CARGO_PKG_VERSION (sha)` is what `ray version`/`--version`/`ray report` print so a nightly build is identifiable.
- **Tor (optional):** `--tor` adds `TorCustomTransport` alongside relay; onion address derived from the iroh `SecretKey`. Needs a Tor daemon (`ControlPort 9051`).
## Conventions
- Use `cargo -q` for all cargo commands; `tracing` for logging. `main::init_tracing` composes the layers (console + file + optional OTLP) with **split filters**: the console (and CLI output) stays at `info`, while the rolling daily files under `src/logdir::log_dir()` capture our crate at `debug` (`info,rayfish=debug` — dependencies stay at `info` so iroh/quinn don't flood the file), so `ray report` bundles carry the verbose detail without the console getting noisy. The global registry gate is the permissive (file) filter; the console layer carries its own `info` per-layer filter. `RUST_LOG` overrides both. Returns a `LogGuard` that must stay alive for the process.
- Tracing carries spans: network lifecycle handlers (`create/join/leave/nuke_network`) use `#[tracing::instrument]`, and the per-peer reader (`forward::spawn_peer_reader`) + reconnect loop wrap their tasks in `info_span!("peer"/"reconnect", net=…, peer=…)` so report-bundle logs are correlatable per peer/network.
- `otel` feature (off by default): adds a `tracing-opentelemetry` layer exporting spans over OTLP/HTTP. Active only when `OTEL_EXPORTER_OTLP_ENDPOINT` (or `..._TRACES_ENDPOINT`) is set; flushed on shutdown via `LogGuard::drop`.
- Panics are fail-fast in the daemon: `main::install_panic_hook` (set only for `ray daemon`) records the panic via `tracing::error!`, synchronously appends it to `panic.log`, restores DNS via `dns_config::emergency_restore_resolv_conf()` (so a crash never blackholes DNS — see dns_config), then calls `std::process::abort()`. The service unit restarts it (`Restart=on-failure` / launchd `KeepAlive`); `panic.log` is bundled by `ray report` (and flags the issue title/body). A live-but-broken daemon wouldn't trip the restart, so we crash cleanly rather than limp.
- Never share I/O resources (TUN, sockets, streams) behind a Mutex — split into read/write halves. Avoid Mutex generally: prefer channels, atomics, or `RwLock`/`ArcSwap` for fast non-async state.
- CLI subcommands carry short `visible_alias`es (clap), so help lists them and completions pick them up: `create`→`new`, `leave`→`rm`, `status`→`st`/`ls`, `version`→`ver`, `update`→`upgrade`; action verbs `list`→`ls`, `remove`→`rm`/`del`, `show`→`ls`/`list`, `add`→`a`, `revoke`→`rm`, `approve`→`ok`. Aliases must be unique within each `#[derive(Subcommand)]` enum.
- ALPN per network: `rayfish/net/<version>/<pubkey-prefix>` (`MESH_PROTOCOL_VERSION` then first 16 hex chars). File ALPN `rayfish/files/1`, pairing ALPN `rayfish/pair/1`, connect ALPN `rayfish/connect/1`. **The version segment in every ALPN is that protocol's compatibility gate.** Each protocol versions independently: bump `MESH_PROTOCOL_VERSION` for a breaking mesh change, `FILES_ALPN`'s `/1` for a breaking file-transfer change, `CONNECT_ALPN`'s `/1` for a breaking `ray connect` change, `PAIR_ALPN`'s `/1` for a breaking pairing change. Because iroh negotiates the ALPN at the QUIC handshake, peers on different versions of a protocol share no common ALPN and can't connect — the gate is transport-enforced, with no in-band version check. **Rule of thumb: when you change one of these wire protocols in a backward-incompatible way, bump its ALPN version in the same change.** **Surfacing the failure:** the ALPN gate fails opaquely (no connection forms, so no reason can be sent). Two things recover a useful message: (1) `ray join` compares the coordinator's signed `m,<mesh-version>` from the network record *before dialing* and bails with a precise "this network runs vX, this build speaks vY — run `ray update`"; (2) `transport::connect_to_peer_with_alpn` maps an ALPN-mismatch connect error (`is_alpn_mismatch`, matches the "no known protocol"/"no application protocol" handshake error) to a "peer may be running an incompatible version (run ray update)" hint — a heuristic, used on every dial path (join/connect/file/pair) as the fallback when there's no signed version to pre-check.
- TUN MTU 1280 (IPv6 minimum link MTU, RFC 8200 §5; matches WireGuard/Tailscale). Wire format (control + IPC): 4-byte BE length + msgpack body.
- Room id = per-network public key string (discovery only). On a closed network, joining needs a one-time invite or operator approval; on an open network the room id alone admits. Invite code = `bs58(pubkey || coordinator || secret)`. Local aliases (adjective-noun-noun) are display-only.
- Config under `config::config_dir()` (`/etc/rayfish` on Linux, `~/.config/rayfish` on macOS): `secret_key`, `device_cert`, `settings.toml`, `networks/<name>.toml` (one per network), `firewall.toml`, `invites/<network>.toml` (coordinator-only). Pre-migration installs auto-split the old `networks.toml` on first load (kept as `networks.toml.bak`). On Linux the tree is `root:rayfish`; secret-bearing files are `0600 root:root`. CLI commands that write identity directly (e.g. `ray pair restore`) need root on Linux since the tree is under `/etc`.
- Keep commit subjects conventional (`feat`/`fix`/`docs`/`style`/`ci`/…): release notes are generated from them by git-cliff (`cliff.toml`). `release.yml` renders the tag's grouped changelog + a `prev...new` compare link; `nightly.yml` lists commits since the last stable tag.
- Always update docs (CLAUDE.md, README.md) after finishing a feature or significant change.
- Keep `CHANGELOG.md` current as part of every change, plan, or implementation (not just at release time). Add a user-facing entry under `## [Unreleased]` in the existing Keep a Changelog format (`Added`/`Changed`/`Fixed`/`Performance`), describing behavior from the user's perspective, not the commit. On release, rename `[Unreleased]` to the new version and add a fresh empty `[Unreleased]` plus the compare-link reference at the bottom. Skip pure-internal churn (refactors, test/CI/chore-only commits) that has no user-visible effect.