rayfish 0.1.2

P2P mesh VPN powered by iroh — connect peers by cryptographic identity, not IP address
Documentation
# Rayfish

P2P mesh VPN powered by [iroh](https://iroh.computer). Connects peers by cryptographic identity (EndpointId), not IP address. Dual-stack addressing: stable IPv4 in 100.64.0.0/10 (CGNAT, FNV-1a of identity) and stable IPv6 in 200::/7 (blake3 of identity, 120-bit, never rotates).

## Build

```bash
cargo -q build                 # add --features tor for Tor transport, --features otel for OTLP span export
cargo -q check
cargo -q test
cargo -q clippy
cargo bench                    # Criterion microbenchmarks of the per-packet data path (benches/forward.rs)
```

The crate splits into a library (`src/lib.rs`, daemon modules as `pub mod`) and a thin binary (`src/main.rs`, the `ray` CLI/IPC client, `use rayfish::…`). The split lets benchmarks (`benches/`) and integration tests reach the internal data path; `cargo install` builds the binary against the in-package library unchanged.

## Run

The daemon (`ray daemon`) owns the TUN device and iroh endpoint and runs as a system service. CLI commands talk to it over Unix-socket IPC.

```bash
sudo ray up                    # install+start the service, then activate the VPN
ray create [--open] [--name n] [--hostname h] [--tor]   # closed by default; --open = public network. Prints room id (public key)
ray join <room-id-or-invite> [--name alias] [--hostname h] [--auto-accept-firewall] [--tor]  # join by room id or one-time invite code; --auto-accept-firewall auto-installs suggested rules (managed node/server)
ray leave <net> | nuke <net>   # nuke = publish empty record then leave
ray hostname <net> <name>      # change hostname on existing network
ray status                     # all networks (works without daemon); per-host traffic, member count excludes self
ray <cmd> --json               # global flag: machine-readable JSON for status/firewall show/files/invite list/requests/admin list (color + spinners off)
ray report                     # bundle logs+metrics, open a pre-filled GitHub issue
ray up [--hostname h] | down   # activate / standby (TUN + DNS), daemon stays running; --hostname sets your default name

ray invite <net> [--expires 7d] [--hostname H] [--qr]   # coordinator-only: mint single-use invite; --qr prints a scannable QR; --hostname binds an authoritative name (overrides joiner choice, rejected on collision)
ray invite <net> --reusable [--expires 30d]          # mint a reusable (multi-use, expiring) key for unattended fleets; rides the signed blob, no hostname binding. Servers: ray join <key> --hostname H --auto-accept-firewall
ray invite <net> list|revoke <id>          # list / revoke invites (reusable keys tagged; revoke propagates via the blob)
ray requests <net>             # coordinator-only: peers awaiting live approval
ray accept <net> <id> | deny <net> <id>    # admit / reject a pending join request
ray connect <contact-id> [--hostname h]    # request a direct 2-peer connection by the peer's contact id (no room id/invite); blocks as pending until they approve
ray connections [approve <id>]             # list incoming connect requests (default) / approve one → mints a 2-peer network with the requester pre-approved
ray contact [id|rotate]        # print (default) or rotate your shareable contact id (also shown at the top of `ray status`)
ray admin <net> add <id> | list            # coordinator-only: grant the network key (co-coordinator) / list key-holders
ray firewall show|default|add|remove ...               # per-device local firewall. Default posture: inbound TCP/UDP denied, inbound ICMP allowed, outbound allowed. `firewall default allow|deny` sets the inbound default
ray apply <spec> [--prune] [--dry-run] [--invite-missing] [--example]   # declarative deploy (YAML only): create closed nets + suggest firewall + report membership gap
ray firewall suggest <net> --subject H [--allow peer:proto:ports] [--deny peer:proto:ports]  # coordinator-only: suggest rules on any network (rides the signed blob). Subject/peer `*` = all hosts / any peer. `--allow`/`--deny` value is `[peer:]proto:ports` — the `peer:` prefix is optional, so a bare `tcp:22` (or `icmp`) means "any peer" (parsed by `main::parse_suggest_token`: a leading protocol keyword ⇒ peer `*`). Token grammar: `proto:ports` (tcp:22, udp:53, tcp:*, any:*) or bare proto (icmp, any, tcp). An allow-list ⇒ whitelist (catch-all deny appended); denies-only ⇒ blacklist
ray firewall pending <net> | accept <net> | deny <net>  # review/accept/discard queued suggested rules. On a TTY, `pending` is an interactive picker (↑↓ · enter accept · d deny · a all · q done); piped/`--json` falls back to a static table
ray firewall auto-accept <net> on|off  # toggle this node's auto-install of suggested rules for a network (on = install current queue)
ray mdns on|off                # local peer discovery (default on)
ray send <file> <peer>         # file sharing; ray files [accept <id> [--output dir]]
ray pair [<ticket>|backup|restore <code>]              # multi-device identity
ray pair backup [--1password [--vault V] [--item T]]   # encrypted key backup; --1password stores the enc1 blob in 1Password (op CLI)
ray pair restore [<code>|--1password [--vault V] [--item T]]   # restore from a code or from 1Password
ray completions <shell>
ray version | ray --version | ray -V        # print the compiled rayfish version + git sha
ray update [--check] [--force] [--nightly] [--list] [--version V]   # self-update from GitHub releases. Default = latest stable; --nightly tracks the rolling nightly pre-release (rebuilt on every commit to master); --version V pins a specific release (downgrades allowed); --list prints available releases; --check reports current vs latest without installing. Replaces this binary, then (if the service is installed) restarts the daemon onto it (needs root). No persisted channel — each run picks its target from the flag
```

**Privilege & access (Tailscale operator model):** the always-root daemon does privileged work; clients are unprivileged. The IPC socket is mode `0666`; authority comes from a per-request `SO_PEERCRED` UID check in `DaemonState::check_authorized()`, not socket permissions. Reads (`status`, `*… show`, `files`) are open to any local user; mutating commands need root or the configured `operator_uid`; `set-operator` is root-only. Only `install`, `restart`, `uninstall`, `set-operator`, and `daemon` need `sudo`; everything else (incl. `up`/`down`) is IPC. `ray up`/`install` auto-grant operator to `$SUDO_USER`.

```bash
sudo ray install | restart | uninstall      # manage the service unit/plist
sudo ray set-operator <user>                 # authorize a user to run ray without sudo
```

### Cross-compile & deploy

```bash
just cross                     # build for x86_64 Linux
just deploy <ip>               # cross-build release + install + start daemon
just deploy-dev <ip>           # same, debug build
```

## Architecture

```
App → TUN (100.64.x.x / 200::x) → rayfish → iroh QUIC datagrams → peer
```

One iroh Endpoint and TUN device are shared across all networks. Each network gets its own ALPN (`rayfish/net/<version>/<pubkey-prefix>`); the `ProtocolRouter` dispatches incoming connections by ALPN to per-network handlers. The leading `<version>` (`transport::MESH_PROTOCOL_VERSION`) makes the ALPN the **mesh protocol-version gate**: iroh negotiates the ALPN during the QUIC handshake, so peers on different mesh versions share no common ALPN and cannot connect — no in-band version handshake exists. Bumping the constant on a breaking mesh change severs old peers automatically (likewise the versioned `connect`/`files`/`pair` ALPNs gate their own protocols).

### Modules

- `src/main.rs` — thin clap CLI + IPC client; service install/start (`cmd_up`, `install_and_start_service`), `cmd_install`/`cmd_restart`/`cmd_uninstall_service` (root-gated), `cmd_set_operator`, `cmd_pair`. `ray daemon` (hidden) runs the foreground daemon loop. `build.rs` stamps the git short SHA into `RAY_GIT_SHA` (see Self-update).
- `src/daemon.rs` — daemon process. `DaemonState` (endpoint + TUN + PeerTable + ProtocolRouter); `NetworkHandle` per active network (per-network `invite_lock`); `NetworkState` (access `mode`, `suggested_firewall`, in-memory `pending` joins, `pending_suggestions` consent queue). Hosts the IPC server, accept handling (`CoordinatorAcceptState`/`MemberAcceptState` via `AcceptHandler`), reconnect loop, DHT publisher, group poller, activate/deactivate, nuke, invite/approval + firewall/file/pairing IPC handlers, `apply_suggested_firewall`, DNS updates. Admission gate: `CoordinatorAcceptState::handle_connection` → `admit_peer` (open / valid invite / pre-approved) vs queue-as-pending (closed). `register_coordinator_handler` (create, restore, admin-promotion) registers `CoordinatorAcceptState` and sets `NetworkRole::Coordinator`; `promote_to_coordinator` swaps a live member to it on `AdminGrant`. Fresh joins dial in `coordinator_dial_order` (minter, then other `is_coordinator` members); `gossip_targets` picks live coordinators for `InviteShare`/`InviteUsed`. `ray connect` handlers (`connect`/`list_connections`/`approve_connection`/`rotate_contact`) live here; `ProtocolRouter` holds the `pending_connects`/`approved_connects`/`outgoing_connects` `DashMap`s and the `CONNECT_ALPN` accept arm; `create_network_inner` takes `direct` + `pre_approve` to mint a 2-peer network.
- `src/ipc.rs` — `IpcMessage` enum (requests + responses incl. `InviteCreate`/`InviteList`/`InviteRevoke`/`Requests`/`AcceptRequest`/`DenyRequest` + `InviteCreated`/`InviteListResponse`/`PendingRequests`; `Join` carries optional `invite` secret + `coordinator` to dial directly; `ray connect` adds `Connect`/`Connections`/`ApproveConnection`/`ContactId`/`RotateContact` + `ContactIdResponse`, `StatusResponse.contact_id`, `NetworkRole::Direct` (display-only)), `MsgpackCodec` (length-prefixed msgpack over Unix socket), socket at `/var/run/rayfish/rayfish.sock`.
- `src/identity.rs` — persistent Ed25519 keypair (`<config_dir>/secret_key`, `0600`); device certs (`create/store/load_device_cert`). Resolves its dir via `config::config_dir()` and writes via `config::write_file`.
- `src/onepassword.rs` — `op` CLI wrapper for `ray pair backup/restore --1password`: `op_available`/`store` (create-or-update an item, secret piped via stdin) / `read`. Transports the existing `enc1…` encrypted backup blob to/from a 1Password item; CLI-side only.
- `src/invite.rs` — coordinator-only **single-use** invite ledger (`<config_dir>/invites/<network>.toml`, `0600`, written atomically via `config::write_file`): `Invite { id, secret_hash, created, expires, status }`, `InviteStore` (`mint`/`redeem`/`revoke`/`list`/`restore`/`record_shared`/`burn_by_hash`; single-use + expiry; only the blake3 hash is persisted). `redeem` burns a secret at admission; `restore` un-burns it if `admit_peer` then rejects the join (hostname/IP collision) so the holder isn't locked out. `encode_invite_code`/`decode_invite_code` = `bs58(network_pubkey(32) || coordinator(32) || secret(16))`. Never in the GroupBlob. **Cross-coordinator gossip:** `record_shared` inserts a received `InviteShare` so this coordinator can validate and burn it; `burn_by_hash` marks it used on `InviteUsed`. **Reusable keys live in the signed blob** (`membership::ReusableKey`), not here; `generate_secret`/`encode_invite_code` are shared by both.
- `src/membership.rs` — `IdentityProvider`, IPv4/IPv6 derivation, `MemberList`/`ApprovedList`, `GroupBlob { members, approved, suggested_firewall, name, reusable_keys }` with canonical msgpack + blake3 hashing (`canonical_group_bytes`/`group_blob_hash`; BTreeMap keys ⇒ canonical bytes). `Member`/`ApprovedEntry` carry optional `user_identity` + `device_cert`, a boolean `is_coordinator` (set by `ray admin add`, published so joiners discover co-coordinators), and `collision_index: u32`. `assign_ip(members, identity) -> (Ipv4Addr, u32)` picks the lowest free collision index for per-member `(ip, index)` at admission; `validate_member`/`validate_approved` check the stored IP against `derive_ip_with_index`; `validate_no_duplicate_ips` rejects duplicate-IP rosters; `resolve_ip_tiebreak` re-seats contested entries in identity order (lowest keeps its index, others re-roll), run by reconverge before applying a fetched roster. `ReusableKey { id, created, expires, revoked }`, keyed by hex `blake3(secret)`: `from_secret` mints, `revoke_reusable` flips `revoked` (exact/prefix id), pure `validate_reusable_key(keys, secret, now)` is the admission decision. `SuggestedFirewall`/`HostSuggestions` live in `ray-proto` (`policy.rs`) so they cross IPC, ride in the blob, and parse from a `ray apply` spec uniformly.
- `src/transport.rs` — iroh endpoint setup, per-network ALPN (`network_alpn` → `rayfish/net/<MESH_PROTOCOL_VERSION>/<prefix>`; the version segment gates mesh-protocol compatibility at ALPN negotiation — bump `MESH_PROTOCOL_VERSION` on a breaking mesh change to sever old peers); identity-level `CONNECT_ALPN` (`rayfish/connect/1`) for the `ray connect` handshake; optional Tor transport (`tor` feature). The shared endpoint binds a **fixed UDP port** `RAYFISH_LISTEN_PORT` (41383), so the port is stable across restarts and can be manually port-forwarded for guaranteed direct reachability (iroh still does NAT traversal/UPnP/PCP, discovery, and relay fallback on top). If the port is in use the daemon warns and falls back to an ephemeral port (`0.0.0.0:0`) so it always starts. The port is hard-coded (one shared endpoint), so a manual forward benefits only one node per LAN; make `RAYFISH_LISTEN_PORT` configurable if multi-node-per-LAN forwarding is ever needed.
- `src/tun.rs` — async dual-stack TUN (IPv4 /10 + IPv6 /128), split into `TunReader`/`TunWriter`. `configure_ipv6()` assigns the TUN's own IPv6 at creation (Linux netlink via rtnetlink, macOS ifconfig). `route_peer_range()` installs the `200::/7` peer-range route into the TUN and **must run after link-up** (called from `DaemonState::activate()` post-`set_link_up`) — on Linux the kernel won't install an IPv6 connected route while the link is down, so peer traffic would otherwise leak out the host's default IPv6 route (Linux: rtnetlink `RouteMessageBuilder`; macOS: `route add -inet6 -net 200::/7`). Idempotent across `up`/`down`. `route_self_loopback(v4, v6)` (also from `activate()` post-link-up) installs `-host` routes for our own dual-stack addresses via `lo0` so self-traffic (e.g. pinging our own `*.ray` hostname) is answered locally instead of sent out the point-to-point `utun` and dropped as "no peer for dst" — macOS only (a point-to-point `utun` lacks the `<own-ip> -> lo0` route a broadcast interface gets automatically; Linux's `local` route table handles it, so it's a no-op there).
- `src/forward.rs` — TUN ↔ peer forwarding via dual-stack routing lookup; firewall enforcement; labeled drop counters; resolves transport keys to user identities via `DeviceUserMap`. `run_mesh` intercepts UDP packets destined for `MAGIC_DNS_V4:53` (100.100.100.53) and answers them in-daemon via the `dns_resolver::Resolver`, so Magic DNS never binds the host's port 53.
- `src/dht.rs` — one pkarr record per network (blob hash + seed peers + `m,<mesh-version>` = the coordinator's `transport::MESH_PROTOCOL_VERSION`, via `mesh_version_from_record`/`resolve_network_packet`); only the coordinator (per-network secret key) can publish. Plus a per-user **contact record** (`_rayfish_contact`, signed by the contact key) mapping `contact_pubkey → current endpoint` for `ray connect` (`publish_contact`/`resolve_contact`); a TTL/2 active-gated publisher (`spawn_contact_publisher`) keeps it fresh.
- `src/control.rs` — length-prefixed msgpack control protocol over QUIC streams (`JoinRequest`, `JoinPending`, Welcome, MemberApproved, MeshHello, BlobUpdated, `AdminGrant`, `InviteShare`, `InviteUsed`, …); `DeviceCert`, `PairMsg`. A fresh joiner sends `JoinRequest { invite_secret, hostname, device_cert }` first; the coordinator replies `Welcome`, `JoinPending`, or `JoinDenied` on the same stream. `AdminGrant` carries the per-network secret key to a member over the authenticated mesh ALPN (coordinator → co-coordinator). `ConnectMsg` (`Request`/`Pending`/`Approved`/`Denied`) is a separate enum for the `ray connect` handshake over `CONNECT_ALPN`, framed with `send_framed`/`recv_framed`. `InviteShare { id, secret_hash, expires }` is gossiped by the minting coordinator when a single-use invite is minted; `InviteUsed { secret_hash }` when one is redeemed — so any coordinator can validate and burn cross-minted invites. Both are ignored if the sending peer is not `is_coordinator` in the verified roster.
- `src/peers.rs` — `PeerTable` (dual v4/v6 DashMaps), `DeviceUserMap`. A peer keeps one virtual IP across every network it joins, so each `PeerEntry` holds a *set* of connections (`network → Connection`); `lookup_v4/v6` return a `PeerRoute` (a deterministically-chosen connection + all shared networks, for union reachability/firewall checks). A multi-homed peer stays reachable while it shares one live connection; `remove_peer_from_network()` drops a single network's route, `remove()` drops it everywhere.
- `src/config.rs` — config storage. **Sharded, atomic, per-network** (replaces the old single `networks.toml` whose non-atomic full-file rewrites raced under concurrent load-modify-save and silently dropped networks): globals live in `settings.toml` (`mdns_enabled`, `operator_uid`, `default_hostname`, `contact_secret_key`), each network in `networks/<name>.toml` (per-network secret/public key, `my_hostname`, `group_mode`, `auto_accept_firewall`, `admins`, `direct`). Writes go through `write_file` (temp file + `rename`, atomic on POSIX — no torn reads) and are **targeted**: `save_network`/`load_network`/`delete_network` touch one file, `save_settings` only the globals — so a write to one network can never clobber another (in-memory `upsert_network`/`remove_network` no longer persist). `load()` assembles the in-memory `AppConfig` from `settings.toml` + `networks/*.toml`, running a one-time `migrate_legacy` (splits an existing `networks.toml`, keeping it as `networks.toml.bak`). **Location:** `config_dir()` is `/etc/rayfish` on Linux (system service), `~/.config/rayfish` on macOS; dirs `0750 root:rayfish`, secret-bearing files (`secret_key`, `networks/*.toml`, `settings.toml`, `invites/*.toml`) `0600 root:root`, non-secret (`firewall.toml`, `audit.log`, `device_cert`) `0640 root:rayfish` (perms/owner applied by `write_file`/`restrict_perms`; the `rayfish` group is created on Linux install via `main::ensure_rayfish_group`). Every module resolves its paths through `config::config_dir()`.
- `src/apply.rs` — declarative deploy spec for `ray apply`: `DeploySpec { networks: BTreeMap<network, SuggestedFirewall> }` — each network maps **directly** to its firewall subjects (no `firewall:` wrapper). **YAML only** (`load` rejects non-`.yaml`/`.yml`, parses via the `config` crate's YAML format; note the crate **lowercases keys**, so network/host names must be lowercase). `expected_hosts()` = union of subject + peer hostnames across the spec's networks, **excluding `*`**. `EXAMPLE_SPEC` (YAML, includes the wildcard Minecraft case) is printed by `--example`. The orchestrator lives in `main::ipc_apply`.
- `src/firewall.rs` — per-device firewall (direction/proto/port/peer + optional arrival-`network`), `ArcSwap` for lock-free reads, dual-stack packet parsing; `firewall.toml`. Direction-aware defaults (`default_inbound`/`default_outbound`) and the seeded `allow in icmp` rule (`default_icmp_rule()`, `origin: Local`) — see the Firewall key flow. The optional `network` field (`None` = any, back-compat) scopes a rule to one network's traffic, so a multi-homed host can restrict a peer per-network (e.g. `:8080` only from peers reached via `db`). `RuleOrigin` (`Local` | `Network(net)`) records provenance so reconvergence replaces the `Network(net)` set without touching `Local` rules. `materialize_suggestions` builds a subject's inbound rules from a `SuggestedFirewall` (token grammar `proto:ports`/bare proto; allow-list ⇒ whitelist with catch-all deny, denies-only ⇒ blacklist, empty ⇒ open). `SharedFirewall::replace_network_rules` swaps one network's suggested set; `format_firewall_show` tags suggested rules `(suggested by <net>)`.
- `src/dns.rs` — Magic DNS responder for the `.ray` TLD (A/AAAA/PTR/SOA for `*.ray`); reached via the magic IP `100.100.100.53` (`MAGIC_DNS_V4`) routed through the TUN — no host-level port 53 bind. `handle_query` is called by `forward::run_mesh` when it intercepts a UDP DNS packet destined for the magic IP. `HostnameTable` + `ReverseLookupTable`; `sync_network_hostnames` rebuilds a network's forward+reverse entries from its member roster (run on every roster update so renames/joins/leaves reflect immediately).
- `src/dns_config.rs` — OS DNS config (`DnsConfigurator` trait). Points the OS at the magic resolver IP `100.100.100.53` (`dns::MAGIC_DNS_V4`) so `.ray` queries are intercepted by the TUN. macOS: SCDynamicStore. Linux detection chain: systemd-resolved D-Bus → NetworkManager D-Bus → resolvectl → resolvconf → `/etc/resolv.conf` takeover (`DirectResolvConf`, captures upstreams via `captured_upstreams()` for forwarding non-`.ray` queries). Split-DNS where available; falls back to full `/etc/resolv.conf` takeover. **Direct-mode anti-trample (Tailscale-style):** (1) `run_resolv_reassert` watches `/etc` via **inotify** and re-asserts our resolv.conf in ~ms when NetworkManager/dhclient overwrites it (30s tick + initial assert backstop the watch; threads the captured `search_domains()` through so a repair preserves them); (2) on an NM host, `apply()` installs a `dns=none` drop-in (`/etc/NetworkManager/conf.d/rayfish-dns.conf`, `nm_quiet_install`) + reloads NM so NM stops regenerating resolv.conf at all — removed + reloaded on `revert()` (`nm_quiet_remove`). Both the NM drop-in and the resolv.conf backup (`.before-rayfish`, always taken before overwrite) are **marker-guarded** so we never touch an operator's own config. **Crash safety:** the daemon panic hook calls `emergency_restore_resolv_conf()` (synchronous) to restore the backup + drop the `dns=none` snippet before `abort()`, and `restore_stale_backups()` cleans both on next start — so a crash can't leave the host pointing at the now-dead resolver with NM silenced (which would blackhole all DNS).
- `src/hostname.rs` / `src/network_name.rs` — hostname + local-alias generation and collision resolution (`resolve_collision` appends `-1`, `-2`, … on a clash, e.g. `dario` → `dario-1`).
- `src/stats.rs` — iroh-metrics `ForwardMetrics`/`PeerMetrics`, Prometheus export on `:9090`; `ForwardMetrics::snapshot()` reads counters into a serializable `MetricsSnapshot` for `ray report`.
- **CLI presentation** (dependency-light, all gated on `style::is_enabled()` = TTY + not `NO_COLOR`/`--json`): `src/style.rs` — 256-color ANSI palette + glyphs (`dot_online`/`dot_offline`/`check`/`cross`/`marker`/`latency`); `set_plain(true)` forces everything off (used by `--json`). `src/layout.rs` — ANSI-width-aware borderless column aligner (`Cell`/`columns`, via `unicode-width`); `main::table()` is the shared header+rows helper every list routes through. `src/progress.rs` — `indicatif` spinner factory (stderr, hidden when plain) for slow ops (`join`, service start, file download). `src/picker.rs` — `crossterm` inline (no alt-screen) interactive list for `ray firewall pending`; returns per-rule accept/deny `Resolution`s sent as `FirewallResolveSuggestions`. Firewall rules cross IPC as `ray_proto::ipc::FirewallRuleView` (pre-stringified, `Eq`/`Hash`) so the CLI renders/serializes and the daemon value-matches queued rules.
- `src/logdir.rs` — daemon log directory (`/var/log/rayfish` on Linux, `/Library/Logs/rayfish` on macOS). The daemon writes rolling daily files via `tracing-appender` (set up in `main::init_tracing`, which retains the 7 most recent daily files so logs older than ~a week are pruned automatically); `ray report` bundles them.
- `src/shutdown.rs` — SIGINT/SIGTERM via `CancellationToken`. `src/audit.rs` — append-only audit log (`<config_dir>/audit.log`, TSV `timestamp\tevent\tip\tendpoint_id`); `AuditLog` is held by `PeerTable` (`PeerTable::with_audit`), logging `connect` on a peer's first connection in a network and `disconnect` when its last one drops (or the peer is removed for identity rotation). Best-effort: the daemon runs without auditing if the log can't be opened.

### Key flows

- **Create:** generate per-network `SecretKey` → derive addresses → build initial `GroupBlob` → publish blob + signed pkarr record → persist keys + `group_mode` → print public key as the room id. Closed (`Restricted`) by default; `--open` for public.
- **Access modes & admission:** the room id (network public key) is a published discovery key, **never** an admission credential. **Open** networks auto-admit any peer that reaches a coordinator. **Closed** networks gate three ways: a one-time **invite** (coordinator-only local ledger, gossiped via `InviteShare`/`InviteUsed` so any coordinator can redeem a cross-minted one); a **reusable key** (hash rides the signed `GroupBlob.reusable_keys` — multi-use, expiring, revocation propagates via the blob; `validate_reusable_key`; admits non-authoritatively — joiner-chosen hostname, suffix on collision); or **live approval** (unknown peer queued in `NetworkState.pending`, surfaced via `ray requests`, admitted with `ray accept`). The handler is `CoordinatorAcceptState`, run by **any node holding the network key** (`register_coordinator_handler` at startup, `promote_to_coordinator` on `AdminGrant`). The admitting coordinator assigns the joiner's IPv4 via `assign_ip` (lowest free collision index).
- **Join handshake:** resolve pkarr record → fetch + verify `GroupBlob` → dial in `coordinator_dial_order` (invite-pinned minter first, then other `is_coordinator` members, skipping self) until one replies `Welcome` → send `JoinRequest { invite_secret? }` first → coordinator replies `Welcome` (admitted), `JoinPending` (closed, awaiting `ray accept` — the joiner retries with backoff on the *same* coordinator; `JoinPending` is not a fallback trigger), or `JoinDenied`. The secret is matched first against the local single-use ledger, then the verified blob's `reusable_keys`; a single-use match burns, a reusable one does not. `ray join <reusable-key> --hostname H --auto-accept-firewall` is the unattended-server path. Then connect to other members with `MeshHello` and poll pkarr for blob updates. Reconnecting/restoring members use the legacy coordinator-speaks-first handshake (`initial = false`).
- **Gatekeeper:** any coordinator (any network-key holder) can approve identities and broadcast `MemberApproved`; once approved, any peer can welcome that identity. So admitting a fresh joiner survives any single coordinator being offline — the joiner dials the full coordinator set. The coordinator need not be online for *member* reconnects at all.
- **DHT (single-record):** one pkarr record per network signed by the per-network secret key. The pkarr address *is* the network public key, so records can't be spoofed (MITM-resistant). `spawn_group_poller()` refetches the blob every 60s when the hash changes.
- **Reachability model (segmentation-first):** a network is a reachability boundary — two peers exchange packets iff they share ≥1 network (a QUIC connection only exists within a shared network, so connection existence enforces it). Coarse access is the network split; the per-device firewall is the fine-grained layer (directional, port-, and network-scoped). Declarative provisioning of networks + suggested firewalls is `ray apply` (Phase B).
- **Firewall (local + coordinator suggestions):** per-device, first-match-wins, persisted in `firewall.toml`, with a stateful conntrack so return traffic for outbound flows passes under a deny default. **Secure-by-default inbound** (`default_inbound` serde-default `Deny`, `default_outbound` serde-default `Allow`): inbound TCP/UDP denied, inbound ICMP allowed, outbound allowed (conntrack lets return traffic back). ICMP-allow is the seeded, removable `allow in icmp` rule (not a special case) — deleting it makes the deny default cover ICMP. `ray firewall add` inserts at the front (newest wins) and merges by selector (`firewall::same_selector`, ignoring action), so toggling allow↔deny never accumulates dead rules. Applies to **all installs on upgrade** — an older `firewall.toml` missing the new fields deserializes into the secure posture (the seeded ICMP rule ships only with a fresh config, so an existing file keeps exactly its own rules). `ray firewall default allow|deny` flips the inbound default; neither touches outbound. On **any** network the coordinator (any network-key holder) can **suggest** rules — advisory, riding the signed `GroupBlob` (keyed by subject hostname; `*` subject = every node). Each node materializes rules for its own hostname (+ `*`), resolving peer hostnames → identities from the blob's member list (`*` peer = any), expanding each `proto:ports` token, appending a network-scoped catch-all deny for an allow-list (whitelist; denies-only = blacklist; empty subject = open) — see `src/firewall.rs`. Consent is **per-node, per-network**: **auto-accept** (`ray join --auto-accept-firewall` / `ray firewall auto-accept <net> on`, persisted as `config.auto_accept_firewall`) or manual `ray firewall accept|deny` (`pending_suggestions`). Hostname authority (so "allow from alice" resolves to the real alice) comes from **invite binding**, not a network flag. Rules re-materialize on every verified reconverge — the 60s poller, or a **payload-free** `BlobUpdated`/`MemberSync` *trigger* that reconverges from the network-key-signed pkarr record (`reconverge_and_apply`/`fetch_verified_blob`); `Local` rules are never touched. Trust model: suggestions come only from the verified blob (signed record → hash → blob → rules), never from a control message — those are triggers only.
- **Multiple admins = shared network key.** An admin is any machine holding the per-network secret; `ray admin add <net> <id>` grants the key to a member over the authenticated mesh ALPN (`AdminGrant`), making it a co-coordinator that can publish the signed blob, suggest firewall rules, and **admit fresh joiners**. The granter also sets `is_coordinator = true` on the grantee and republishes so the full coordinator set is visible in the blob — joiners use this for dial-fallback. The grantee persists the key and, on `AdminGrant`, calls `promote_to_coordinator` to swap from `MemberAcceptState` to `CoordinatorAcceptState`. `ray admin list <net>` shows the local node + granted identities (local record; the shared key is not attributable).
- **Declarative apply (`ray apply`):** reconcile networks against a spec — **YAML only**, a `networks:` map of `<name> → SuggestedFirewall` (`*` subject/peer = all hosts / any peer). The orchestrator (`main::ipc_apply`) fetches `Status` once, then per spec network: `Create` (closed) if absent (never joins), then publishes the firewall block as suggestions (idempotent — replaces the live set). `--prune` publishes exactly the spec's subjects, dropping out-of-band suggestions; without it, spec subjects merge over the live set. `--dry-run` echoes the normalized spec; `--example` prints a template. **Membership diff:** expected = union of subject + peer hostnames (excluding `*`); joined = this node + peers from `Status`. The gap is reported as `ray invite <net> --hostname <missing>` commands; `--invite-missing` mints them via IPC. Because an invite-bound hostname is authoritative, the spec's hostnames are exactly the names admitted nodes carry — so suggestions always resolve the peers they name. No lock file; the live signed blob is state.
- **Direct connections (`ray connect`):** a friend-request flow linking two peers with no shared room id or invite. Each node has a standing, **rotatable contact key** (`AppConfig.contact_secret_key`, distinct from transport and per-network keys), published to pkarr while active (`dht::publish_contact`, `_rayfish_contact` = `contact_pubkey → endpoint`) and advertised over `CONNECT_ALPN` (`rayfish/connect/1`). `ray connect <contact-id>` resolves → endpoint, dials, sends `ConnectMsg::Request{from_contact_id, from_endpoint, hostname}`; the recipient queues it (`pending_connects`) and replies `Pending`, so the initiator polls with backoff (`spawn_connect_retry`). `ray connections approve <id>` mints a 2-peer network via `create_network_inner(.., direct=true, pre_approve=Some((peer, hostname)))` — restricted, auto-named `me-peer`, requester pre-approved. The minter records `(room_id, coordinator)` in `approved_connects`; the initiator's next poll gets `ConnectMsg::Approved`, joins normally (→ `Welcome`), flags it `direct` (`join_direct`). A direct network is real (firewall/DNS/mesh apply) but `ray status` shows role `[direct]` (`NetworkRole::Direct`) and hides the room id. **Edge cases:** offline recipient → clean "contact offline" (publisher is active-gated); maps keyed by transport endpoint id survive contact-key rotation (old id stops resolving after its 300s TTL); duplicate requests idempotent; if both peers connect *and* approve at once, only the higher `endpoint.id()` mints (the lower defers via `outgoing_connects`), so exactly one network forms.
- **File sharing:** `ray send` adds the file to iroh-blobs and sends a `FileOffer` over `FILES_ALPN`; receiver queues it; `ray files accept` fetches the blob by hash and verifies it.
- **Pairing:** primary issues a ticket (`bs58(endpoint_id || secret)`) over `PAIR_ALPN`; secondary authenticates and receives a `DeviceCert` binding its transport key to the primary's user identity. Backup/restore encrypts the identity key (argon2 + chacha20poly1305) into an `enc1…` base58 blob (`make_backup_blob`). `--1password` (alias `--op`) on backup/restore transports that blob to/from a 1Password item (default title `Rayfish Identity`, optional `--vault`) via the `op` CLI (`src/onepassword.rs`, create-or-update, secret piped via stdin not argv). 1Password is transport only — the blob stays password-encrypted, so a vault compromise alone can't unlock the key. All `op` calls are CLI-side in the user's context, never from the root daemon.
- **Hostname change:** `ray hostname` propagates immediately and is coordinator-authoritative. The coordinator keeps a continuous per-member control reader (`spawn_coordinator_control_reader`); a member's rename re-sends `MeshHello`, the coordinator resolves collisions (`name`/`name-1`/…), updates roster + DNS, republishes the blob, broadcasts a payload-free `MemberSync` *trigger*. The member applies its name optimistically and is corrected when it reconverges from the signed record (on `MemberSync` or the 60s poller). The coordinator renaming itself runs the same republish+broadcast directly. Receivers rebuild DNS from the roster on every verified reconverge via `apply_roster_to_dns` → `dns::sync_network_hostnames` (the roster is the single source of truth for `*.ray`), clearing stale names. Admission hostname authority follows the **invite binding** (not a network flag): an invite-bound hostname (`ray invite --hostname`) is assigned exactly, and a clash with a different identity is rejected — no silent rename — so no peer can claim another's name to take its suggested firewall rules (`hostname::admission_hostname`). A joiner-chosen (free) hostname keeps collision resolution.
- **Reconnection:** per-peer reader detects drop → coordinator removes the dead peer; joiner reconnects with exponential backoff (1s–30s) then re-sends `MeshHello`.
- **Leave:** `ray leave` gracefully closes its connections with `forward::LEAVE_CODE` before local teardown. Peers see `DisconnectEvent.intentional = true`: the coordinator prunes the member, republishes the blob, then broadcasts a payload-free `MemberSync` trigger so other members reconverge from the (already-republished) signed record and drop it immediately; the 60s poller is the backstop. A plain timeout/reset is *not* intentional, so an offline-but-not-departed peer stays a known member.
- **up/down:** the daemon (endpoint, IPC, blob store, metrics) is always-on; the active VPN state (TUN up + system DNS + connected networks) is toggled by `activate()`/`deactivate()`, tracked in `DaemonState.active`.
- **Report:** `ray report` → daemon `build_report()` gathers sysinfo + a `ForwardMetrics::snapshot()` + the *sanitized* `StatusResponse` (no secret keys) + recent log files, writes a `.tgz` to `/tmp`, and chowns it to the calling UID. The CLI prints the path and opens a pre-filled GitHub issue (`REPORT_REPO_URL`) to attach the bundle. Local-first, so the user reviews it before sharing; a managed upload service can later replace the GitHub step.
- **Self-update (`ray update`):** queries the GitHub releases API (`rayfish/rayfish`, the repo `install.sh` pulls from), maps host OS/arch to the published asset (`release_asset_name` → `ray-{os}-{arch}`), fetches the asset's `.sha256` sidecar first, decides whether a swap is needed, downloads the binary, **verifies SHA-256** before touching anything, then atomically swaps the running binary via `self-replace`. If the service is installed it goes through the full install path so the daemon comes back on the new binary. Needs root when the service is installed or the binary's dir isn't user-writable (`require_root`); `--check`, `--list`, and `ray version`/`--version` don't. Raw release binaries aren't archived, so no tar/gzip here. **Three targets, chosen per-invocation (no persisted channel):** default **stable** hits `/releases/latest` and gates on `semver` (`version_is_newer`, strictly-newer unless `--force`); **`--nightly`** hits `/releases/tags/nightly` (the rolling pre-release rebuilt on every commit to master by `.github/workflows/nightly.yml`) and — since nightlies share a `CARGO_PKG_VERSION` — decides up-to-date by comparing the published checksum against the **running binary's** SHA-256 (`sha256_hex`), not the version; **`--version X`** hits `/releases/tags/vX` and is "current" only if `X` equals the running version, so it can downgrade. `--list` enumerates `/releases` (newest first, `[pre-release]`/`(installed)` annotated). `build.rs` stamps the git short SHA into `RAY_GIT_SHA`; `FULL_VERSION` = `CARGO_PKG_VERSION (sha)` is what `ray version`/`--version`/`ray report` print so a nightly build is identifiable.
- **Tor (optional):** `--tor` adds `TorCustomTransport` alongside relay; onion address derived from the iroh `SecretKey`. Needs a Tor daemon (`ControlPort 9051`).

## Conventions

- Use `cargo -q` for all cargo commands; `tracing` for logging (INFO default, `RUST_LOG` to override). The daemon also writes rolling daily log files under `src/logdir::log_dir()` (console output unchanged for CLI commands). `main::init_tracing` composes the layers (console + file + optional OTLP) and returns a `LogGuard` that must stay alive for the process.
- Tracing carries spans: network lifecycle handlers (`create/join/leave/nuke_network`) use `#[tracing::instrument]`, and the per-peer reader (`forward::spawn_peer_reader`) + reconnect loop wrap their tasks in `info_span!("peer"/"reconnect", net=…, peer=…)` so report-bundle logs are correlatable per peer/network.
- `otel` feature (off by default): adds a `tracing-opentelemetry` layer exporting spans over OTLP/HTTP. Active only when `OTEL_EXPORTER_OTLP_ENDPOINT` (or `..._TRACES_ENDPOINT`) is set; flushed on shutdown via `LogGuard::drop`.
- Panics are fail-fast in the daemon: `main::install_panic_hook` (set only for `ray daemon`) records the panic via `tracing::error!`, synchronously appends it to `panic.log`, restores DNS via `dns_config::emergency_restore_resolv_conf()` (so a crash never blackholes DNS — see dns_config), then calls `std::process::abort()`. The service unit restarts it (`Restart=on-failure` / launchd `KeepAlive`); `panic.log` is bundled by `ray report` (and flags the issue title/body). A live-but-broken daemon wouldn't trip the restart, so we crash cleanly rather than limp.
- Never share I/O resources (TUN, sockets, streams) behind a Mutex — split into read/write halves. Avoid Mutex generally: prefer channels, atomics, or `RwLock`/`ArcSwap` for fast non-async state.
- ALPN per network: `rayfish/net/<version>/<pubkey-prefix>` (`MESH_PROTOCOL_VERSION` then first 16 hex chars). File ALPN `rayfish/files/1`, pairing ALPN `rayfish/pair/1`, connect ALPN `rayfish/connect/1`. **The version segment in every ALPN is that protocol's compatibility gate.** Each protocol versions independently: bump `MESH_PROTOCOL_VERSION` for a breaking mesh change, `FILES_ALPN`'s `/1` for a breaking file-transfer change, `CONNECT_ALPN`'s `/1` for a breaking `ray connect` change, `PAIR_ALPN`'s `/1` for a breaking pairing change. Because iroh negotiates the ALPN at the QUIC handshake, peers on different versions of a protocol share no common ALPN and can't connect — the gate is transport-enforced, with no in-band version check. **Rule of thumb: when you change one of these wire protocols in a backward-incompatible way, bump its ALPN version in the same change.** **Surfacing the failure:** the ALPN gate fails opaquely (no connection forms, so no reason can be sent). Two things recover a useful message: (1) `ray join` compares the coordinator's signed `m,<mesh-version>` from the network record *before dialing* and bails with a precise "this network runs vX, this build speaks vY — run `ray update`"; (2) `transport::connect_to_peer_with_alpn` maps an ALPN-mismatch connect error (`is_alpn_mismatch`, matches the "no known protocol"/"no application protocol" handshake error) to a "peer may be running an incompatible version (run ray update)" hint — a heuristic, used on every dial path (join/connect/file/pair) as the fallback when there's no signed version to pre-check.
- TUN MTU 1280 (IPv6 minimum link MTU, RFC 8200 §5; matches WireGuard/Tailscale). Wire format (control + IPC): 4-byte BE length + msgpack body.
- Room id = per-network public key string (discovery only). On a closed network, joining needs a one-time invite or operator approval; on an open network the room id alone admits. Invite code = `bs58(pubkey || coordinator || secret)`. Local aliases (adjective-noun-noun) are display-only.
- Config under `config::config_dir()` (`/etc/rayfish` on Linux, `~/.config/rayfish` on macOS): `secret_key`, `device_cert`, `settings.toml`, `networks/<name>.toml` (one per network), `firewall.toml`, `invites/<network>.toml` (coordinator-only). Pre-migration installs auto-split the old `networks.toml` on first load (kept as `networks.toml.bak`). On Linux the tree is `root:rayfish`; secret-bearing files are `0600 root:root`. CLI commands that write identity directly (e.g. `ray pair restore`) need root on Linux since the tree is under `/etc`.
- Keep commit subjects conventional (`feat`/`fix`/`docs`/`style`/`ci`/…): release notes are generated from them by git-cliff (`cliff.toml`). `release.yml` renders the tag's grouped changelog + a `prev...new` compare link; `nightly.yml` lists commits since the last stable tag.
- Always update docs (CLAUDE.md, README.md) after finishing a feature or significant change.