rayfish 0.1.0

P2P mesh VPN powered by iroh — connect peers by cryptographic identity, not IP address
Documentation
# Rayfish Roadmap

**Thesis:** a basic P2P layer that apps build on with zero SDK — resolve a `.pi`
name, open a socket, done. Unmodified apps work over the mesh. Priority = how directly
an item serves that socket/DNS surface.

## Done

- [x] Point-to-point tunnel between two peers
- [x] Multi-peer full mesh (N peers in one network)
- [x] Multiple simultaneous networks with isolation
- [x] Persistent network config
- [x] Three-word names for easy sharing
- [x] DHT membership publishing for offline coordinator resilience
- [x] Distributed ACLs with tag-based allow rules
- [x] Systemd/launchd service integration
- [x] Daemon architecture with Unix socket IPC
- [x] Magic DNS with .pi domain resolution (A + AAAA)
- [x] Local device firewall with port/protocol/peer filtering
- [x] Dual-stack IPv6/IPv4 with stable addresses
- [x] Tor transport via `iroh-tor-transport` (optional `--features tor`)

---

## Tier 0 — The spine (these ARE the public interface, do first)

- [x] **Refactor to iroh ProtocolHandler for ALPN dispatch**
  - MeshProtocol implements `ProtocolHandler`, one instance per network
  - ProtocolRouter dispatches by ALPN to MeshProtocol + BlobsProtocol handlers
  - Dynamic registration/unregistration as networks are created/joined/left
- [x] **Dual-stack IPv6/IPv4 with stable addresses**
  - **IPv6 (stable, identity-bound):** derived from EndpointId into `200::/7` range
    (blake3 hash, 15 bytes + `0x02` prefix → 128-bit address). Never rotates, never
    collides (120 bits of address space). TUN gets `/128` host address
  - **IPv4 (compat):** CGNAT `100.64.0.0/10` via FNV-1a. `derive_ip_with_index()`
    ready for future collision rotation (`hash(pubkey + index)`)
  - **Dual-stack forwarding:** version nibble dispatch, PeerTable with dual DashMaps
    (v4 + v6), `parse_packet_info()` handles both IPv4 and IPv6 headers
  - **DNS:** A + AAAA queries answered from `HostnameEntry = (Ipv4Addr, Ipv6Addr)`
  - **Hot-path:** SmolStr network names, Arc<AclData>, ArcSwap firewall — zero heap
    allocations and zero locks on the per-packet forwarding path
  - **TUN MTU = 1280:** the IPv6 minimum (RFC 8200 §5). Below this, Linux refuses to
    enable IPv6 on the device (`configure_ipv6`/`route_peer_range` fail with netlink
    `EINVAL`, IPv6 silently dead — exactly the bug that hit the Scaleway Linux peer:
    v4 ping worked, v6 timed out). 1280 is also WireGuard/Tailscale's TUN MTU.
- [ ] **Per-peer PMTU feedback (synthetic ICMP) — fixes the MTU conflict**
  - Two hard constraints conflict and **no single static TUN MTU satisfies both**:
    - Linux enables IPv6 only at MTU ≥ 1280 (so 1280 is the floor for v6 to work at all)
    - iroh QUIC datagrams cap the payload per-connection; on a measured *direct* path
      the cap was ≈1223 bytes (`max_datagram_size()`, quinn's `max_datagram_frame_size`),
      so any full-size TUN packet above that is dropped as `SendDatagramError::TooLarge`.
      (The old MTU 1200 fit the datagram floor but broke IPv6; 1280 fixes v6 but makes
      full-size packets to low-cap peers black-hole — small/interactive traffic is fine,
      large transfers stall.)
  - Keep TUN MTU at 1280 and give the OS path-MTU discovery instead:
    - In `run_mesh` (`src/forward.rs`), handle `SendDatagramError::TooLarge` distinctly
      from other send failures (today both hit `DropReason::SendFailure`).
    - Read `route.conn.max_datagram_size()` → the real per-peer datagram cap.
    - Synthesize **ICMPv4 Type 3 Code 4** ("fragmentation needed") or **ICMPv6 Type 2**
      ("Packet Too Big") carrying `next-hop MTU = max_datagram_size() − transport overhead`,
      using the original packet's src/dst, and inject it back into the TUN via `tun_tx`.
      The OS then clamps TCP MSS / shrinks future packets to that peer per-destination.
  - Wire-up: `tun_tx` must reach `run_mesh` (currently only peer readers own it); add an
    ICMP-synthesis helper (v4 + v6, with checksums); unit-test the synthesized frames.
    ~100 lines. This is the WireGuard/Tailscale model — each peer gets the largest MTU
    its path supports, nothing silently drops, and v6 stays up.
- [x] **Magic DNS**
  - Local resolver intercepts `.pi` queries → A records (IPv4) + AAAA records (IPv6)
  - Per-network names: `alice.gaming.pi`, registered on join via `--hostname`
  - Multi-platform DNS config (macOS scoped resolver, Linux systemd-resolved/resolvconf/direct)
  - Backup/restore of DNS files with crash recovery

---

## Tier 1 — Gateway features (high-bandwidth, always-on Linux peers)

These are where bulk throughput matters and where the optional WG fast path applies.

- [ ] **Subnet routing**
  - `ray subnet advertise 192.168.1.0/24` — expose a LAN (NAS, printer, home server)
  - Advertising peer is a gateway; routing updates propagated via control messages
  - ACL integration: which peers reach which subnets
- [ ] **Exit nodes**
  - `ray exit-node enable` / `ray exit-node use alice`
  - NAT/masquerade outbound on the exit's real interface
  - Route DNS through the exit (leak prevention) + kill switch; IPv6 from day one
  - ACL integration: who can offer / who can use
- [x] **File sharing via iroh-blobs**
  - `ray send file.zip alice` — content-addressed, so dedup + resume are free
  - TODO: lean into directory *sync*, not just one-shot send (the feature people actually want)
- [ ] **Split tunneling**
  - Route only matching traffic: `ray route add 10.0.0.0/8`
  - Mesh-only vs full-tunnel modes; important for gaming (game on mesh, streaming direct)
- [ ] **Kernel-WG fast path (NEW — optimization, only when throughput is measured)**
  - Scoped to easy-NAT, own-socket peers: public IP / port-mapped / full-cone / LAN
  - Tailscale-style: WG owns its own real UDP socket with GSO/GRO; iroh stays as
    control plane + fallback for hard-NAT peers
  - Prereq gates everything: port-mapping client (UPnP-IGD / NAT-PMP / PCP)
  - Linux/Windows only; macOS/iOS/Android stay on iroh (no kernel WG)
- [ ] **Multicast/broadcast relay (DEMOTED — needs per-network scoping design)**
  - Relay broadcast/multicast so LAN protocols work transparently across the mesh
  - Minecraft LAN, Steam LAN, mDNS/Bonjour discovery — friend's server shows up in LAN tab
  - Scoped per-network; rate-limited to prevent broadcast storms
  - Open question: same identity IP across all networks means source IP can't disambiguate
    which network a broadcast belongs to — relay to all networks the sender is in, or add
    an explicit scoping mechanism?

---

## Tier 2 — UX / friction reduction

- [ ] **Declarative deploy (`ray apply config.yaml`)**
  - Terraform-like, idempotent: define networks, per-host membership, and firewall
    rules in one version-controlled YAML; apply composes existing IPC ops
    (Create/Join/Leave/Firewall*/Status) by diffing desired vs. current state.
  - Replaces hand-typed `ray create`/`join`/`firewall add` for fleets; the network
    split + per-host firewall carry the policy, so no ACL sprawl.
  - Hard part: per-network keys are random at create, but members join by public key.
    Resolve logical names → keys via a committed lock/state file (`ray.lock.yaml`)
    produced by the coordinator's apply. Never put secret keys in the spec or derive
    keys from a spec seed.
  - Add a YAML dep (`serde_yml` / `serde_yaml_ng`; upstream `serde_yaml` is archived).
  - Decide: per-host apply (each host runs it with shared spec+lock) vs. single-operator
    remote push (SSH, like `just deploy`).
  - Builds on multi-homing (a host in many networks) + the firewall per-network rule field.
- [ ] **Invite links**
  - `rayfish://join/<base58>` URI scheme handler, click-to-join anywhere
  - **Sign them** — unsigned handlers are a forgery/phishing surface
  - Optional expiry + single-use
- [ ] **Web dashboard**
  - `ray dashboard`, localhost only: topology, connection type, latency, per-peer stats
  - NAT-type detection, network health; add a Prometheus/OpenMetrics endpoint alongside
- [ ] **Smart relay routing (fastest-path selection)**
  - Multi-hop when faster than direct; Dijkstra/Bellman-Ford over a latency graph
  - Don't do full-mesh O(N²) pinging — gossip a sampled subset
  - Separate "opt in to relaying" from "opt in to being relayed through" (metadata privacy)

---

## Tier 3 — Protocol correctness (before public / scale)

Foundational but not blocking the MVP demo. Land before you have users who'd be hurt by bugs.

- [ ] **Identity vs node model** — user key signing device keys; affects ACLs, DNS, invites
- [ ] **Key rotation + revocation** — signed revocation lists / DHT tombstones
- [ ] **ACL merge semantics** — resolve concurrent edits (CRDT or signed monotonic log),
  not last-writer-wins
- [ ] **DHT threat model** — signed records, Sybil/eclipse/poisoning resistance, rendezvous
  fallback when the DHT degrades (this is your biggest new attack surface)

---

## Tier 4 — Hardening (DEMOTED — after the protocol stops moving)

- [ ] **Automated e2e test pyramid (CI-gating)** — currently the only e2e scenarios are
  manual Scaleway runs, absent from CI. Two new tiers:
  - **Tier 1 — mock-TUN data-path tests** (`cargo test`, deterministic, no root): thin
    `TunRead`/`TunWrite` trait seam in `src/tun.rs` so the real `forward.rs` path can be
    driven with injected packets + a fake `PeerTable`. Asserts routing (v4/v6), firewall
    enforcement (deny/whitelist/network-scoped), oversized drops, conntrack return traffic.
  - **Tier 2 — Docker e2e** (`ubuntu-latest`, `--cap-add NET_ADMIN,NET_RAW --device
    /dev/net/tun`): each node a container on a user-defined bridge (auto-NAT egress so real
    pkarr/relay work; mDNS direct-connect over the bridge). `docker exec` swaps in for the
    Scaleway `ssh` transport in `tests/lib/common.sh`. Scenarios: port device-cert +
    connect, plus coordinator-offline reconnect, multi-homed + per-net firewall, and
    leave/nuke/rename convergence. Reliability rule: no `sleep` asserts — poll real
    end-state (`ray status --json`); wrap the one external step (`join`'s pkarr lookup) in
    bounded retries.
  - **Rejected:** mocking the iroh transport (no loopback in iroh 1.0.0; ~2-3wk fragile
    custom transport that *hides* the real connection — the thing most worth testing).
  - Full design: `~/.claude/plans/let-s-brainstorm-how-it-playful-ladybug.md`
- [ ] **Deterministic network simulator (TigerBeetle-style VOPR)**
  - Premature as a *next* item: multi-month sink to harden a committed protocol
  - For now: targeted tests for the one thing you doubt — membership/ACL convergence
    under partition — and move on
  - Full VOPR (partitions, churn, split-brain, race conditions) once the protocol is stable

---

## Tier 5 — Social product (SEPARATE PRODUCT — build ON rayfish, not IN it)

A different company with a different moat. Build at most one as a demo; defer the rest.
Discovery is centralized (Slack/Discord identity as trust anchor); once connected it's all P2P.

- [ ] Voice/calls over mesh — UDP audio + UI, as a separate binary on the public API
- [ ] Slack/Discord bot (privately hosted) — chat identity → network code, slash commands
- [ ] Open-source social connector — self-hostable generic version
- [ ] Game lobby integration — per-session networks, "click to join game night"
- [ ] Steam integration — discover networks through Steam friends/groups
- [ ] ~~SDK/API for developers~~ — mostly subsumed by sockets+DNS + the Tier 1 lifecycle API

---

## Tier 6 — Platform expansion

- [ ] macOS Network Extension (no sudo)
- [ ] BLE transport — local/offline mesh via `btleplug`, needs fragmentation layer (MTU 23-247)
- [ ] Protocol obfuscation (TCP/443, WebSocket, obfs4-style) for restrictive networks
- [ ] Windows, iOS, Android

---

## Speculative (parked)

- [ ] Post-quantum handshake (harvest-now-decrypt-later) — check iroh/noq KEM support
- [ ] Declarative signed network config ("GitOps for your mesh")
- [ ] Nym mixnet transport — high-latency privacy transport via `nym-sdk`; packets are 20KB+,
  seconds of latency, but strong metadata resistance
- [ ] I2P transport — fully decentralized alternative to Tor; `i2p-rs` ecosystem less mature
- [ ] Multipath bonding (WiFi + cellular failover) — QUIC migration gives a head start;
  a differentiator Tailscale structurally can't match