# Bairelay — Architecture
Static reference for the project's structure, dependencies, and runtime patterns. Day-to-day implementation knowledge lives in `docs/implementation.md`.
---
## Workspace structure
```
bairelay/
├── Cargo.toml # workspace root + binary crate
├── tarpaulin.toml # coverage tool defaults
├── src/ # binary: CLI, config, orchestration, lifecycle
├── crates/
│ ├── core/ # bairelay_neolink_core: Baichuan protocol (vendored)
│ ├── rtsp/ # bairelay_rtsp: RTSP server + RTP packetisation
│ ├── mqtt/ # bairelay_mqtt: MQTT bridge + HA discovery
│ └── wake-server/ # bairelay_wake_server: local BcUdp wake server
├── fuzz/ # cargo-fuzz harness (excluded from workspace)
├── docs/ # specification, architecture, build, etc.
└── tests/ # integration tests, fixtures, scripts
```
## Crate responsibilities
### `bairelay` (binary, `src/`)
Owns the application lifecycle:
- CLI parsing via `clap` — service modes (`mqtt` / `rtsp` / `mqtt-rtsp`), one-shot camera commands (`reboot`, `snapshot`, `battery`, etc.), and `check-config` (parse + validate + warn, no camera connect).
- TOML configuration loading and validation.
- Camera orchestrator: spawns per-camera task trees, manages connections.
- Wake-lock counter with dual `Notify` and Drop-based RAII guards.
- Watchdog: 30 s sweep reconciling camera state vs. active wake locks.
- `Supervisor` (`src/supervisor.rs`) — named-spawn + cancel-on-shutdown for the long-running services (RTSP plain, RTSPS, wake server, push listener, watchdog, startup-wake). MQTT lives outside the supervisor: its event loop has a distinct cancel token because per-camera teardown publishes its final `disconnected` status via MQTT, so MQTT must outlive the orchestrator.
- `MqttBackoff` (`src/mqtt_loop.rs`) — exponential backoff (1, 2, 4, 8, 16, 30 s) with log dedupe + 60 s relog window for the broker reconnect path.
- `sleep_or_cancel` (`src/run_support.rs`) — shared sleep+cancel primitive used by every retry / backoff path.
- All listener sockets bind synchronously in `main.rs` before any "started" log line; bind failures halt startup.
- RTSP `max_connections` semaphore (default 256 in the binary) caps concurrent client handlers.
- Graceful shutdown via `CancellationToken` + supervisor-orchestrated per-task join with 2 s budget.
### `crates/core/` — `bairelay_neolink_core`
Vendored Baichuan protocol implementation. Modernised to edition 2021 with updated dependencies. Public surface:
- `BcCamera` — async API for every camera operation: login, streaming, PTZ, motion detection, battery, LED, reboot, etc.
- `CameraDriver` — `dyn`-compatible trait mirroring the subset of `BcCamera` the binary's non-stream code paths call. Lets test code substitute a `FakeCamera` without a live camera session.
- `BcConnection` — TCP/UDP connection management.
- Discovery: `Discovery` struct with `CameraDiscoverer` trait (local broadcast, remote, map, relay, cellular).
- Protocol encoding/decoding via `nom` parser + `cookie-factory` serialiser.
- AES-CFB encryption, MD5 challenge-response authentication.
- `VideoStream` trait over `StreamData` so video pull loops can be tested against `MockVideoStream`.
### `crates/rtsp/` — `bairelay_rtsp`
Pure-Rust RTSP server:
- RTSP session state machine (OPTIONS, DESCRIBE, SETUP, PLAY, TEARDOWN).
- SDP generation from codec parameters.
- H.264 RTP packetisation per RFC 6184 (NAL fragmentation, SPS/PPS handling).
- H.265 RTP packetisation per RFC 7798 (FU fragmentation, VPS/SPS/PPS handling).
- AAC (RFC 3640 AU-hbr) and G.711 µ-law (RFC 3551 PT 0) audio.
- ADPCM → G.711 transcode with 16 → 8 kHz resample.
- TCP-interleaved transport (RTP-over-RTSP) and UDP-unicast transport.
- Per-session keepalive watchdog; digest auth with 5 min nonce TTL + RFC 7616 §3.4 URI binding; basic auth on plain transport for drop-in compat.
- 30 s slow-loris timer on every fresh connection (disarmed once a complete request dispatches).
- `max_connections` semaphore (default unlimited at the crate boundary, `Some(256)` set by the binary) caps concurrent client handlers.
- `Content-Length` capped at the request buffer maximum (64 KiB) with `checked_add` arithmetic.
- `RtspServer::serve_with_listener` accepts a pre-bound `TcpListener` so `main.rs` binds synchronously at startup; `serve` is the thin wrapper that binds + delegates.
- `StreamProvider` trait — the binary implements via `CameraProvider`.
- Multi-track SETUP with per-SSRC RTP counters; RTCP Sender Reports are intentionally suppressed (mpv/ffmpeg re-anchor on every SR receipt — see `docs/implementation.md` § RTCP).
- Per-session coordinator (`session_task::run`) spawns parallel `video_dispatch_loop` + `audio_dispatch_loop`. Each holds its own `broadcast::Receiver`, so video FU bursts can't queue audio behind them; the TCP-interleaved write mutex holds at one `$-framed` packet at a time.
### `crates/mqtt/` — `bairelay_mqtt`
MQTT bridge:
- `SharedMqttClient` wrapping `rumqttc::AsyncClient`.
- Status / control / query topic helpers.
- Home Assistant MQTT discovery payloads (light, camera, binary_sensor, switch, select, button, sensor).
- `test_support::mock_client()` returning a `MockHandle` capture sink for unit tests.
### `crates/wake-server/` — `bairelay_wake_server`
Local replacement for Reolink's P2P cloud. Full wire-level reference: `docs/cloud-interception.md` § Part I.
- BcUdp Discovery framing reused from `bairelay_neolink_core::bcudp` (header + CRC + XOR XML).
- Two `tokio::net::UdpSocket` listeners (`middleman` port 9999, `register` port 58200) sharing one `Arc<CameraRegistry>` plus an `Arc<SessionAnchors>` map keyed by camera UID (issued at `M2D_Q_R`, echoed in `R2D_R_R` — cameras anchor to it).
- Middleman: `C2M_Q` (clients) → `M2C_Q_R`; `D2M_Q` (cameras on boot) → `M2D_Q_R` issuing a fresh session token + ac.
- Register: `D2R_R` (camera registration) → `R2D_R_R{rsp:-4, ac}`; `D2R_HB` upserts UID → source-addr at `Instant::now()`; `C2R_C` for a fresh entry spawns 10 × `R2D_C` at 100 ms then replies `R2C_C_R` + `R2C_T`; `D2R_DISC` acked with `R2D_DC_R`.
- Lazy stale-on-lookup registry (default 80 s TTL, ≈ 4 × heartbeat). No background sweep. Long-form / short-form UID prefix-match on lookup so `C2R_C` from operator config matches `D2R_HB`'s firmware-suffixed UID.
- One public entrypoint `run(RuntimeConfig, CancellationToken) -> Result<(), WakeServerError>`. Bind IP inherited from the top-level `bind_addr`; `route::advertise_ip` derives the per-peer local IP when `bind = 0.0.0.0` so we never advertise the wildcard.
## Dependencies
### Core stack
| Async runtime | `tokio` | 1.x | `rt-multi-thread` + `macros` |
| Async traits | `async-trait` | 0.1.x | Workspace dep |
| Logging | `tracing` | 0.1.x | Structured, span-based |
| Log output | `tracing-subscriber` | 0.3.x | `fmt` layer with `EnvFilter` |
| Errors (binary) | `anyhow` | 1.x | Top-level error chains |
| Errors (libraries) | `thiserror` | 2.x | Typed errors in crate APIs |
| CLI | `clap` | 4.x | Derive macro |
| Serde | `serde` | 1.x | Derive macro |
| Config format | `toml` | 0.8.x | TOML parsing |
| XML serialisation | `quick-xml` | 0.36.x | Serialiser + deserialiser |
| Byte buffers | `bytes` | 1.x | XML serialisation buffer |
### RTSP / RTP (`crates/rtsp/`)
| RTSP messages | `rtsp-types` | 0.1.x | RFC 7826 parse/serialise |
| RTP packets | `rtp-types` | 0.1.x | RFC 3550 packet framing |
| SDP | `sdp-types` | 0.1.x | SDP body generation |
| TLS | `tokio-rustls` | 0.26 | rustls 0.23 + aws-lc-rs; serves `rtsps://` |
H.264 NAL → RTP (RFC 6184) and H.265 NAL → RTP (RFC 7798) packetisation are implemented in-crate — no existing Rust crate covers this.
### MQTT (`crates/mqtt/`)
| MQTT client | `rumqttc` | 0.24.x |
| JSON (HA discovery) | `serde_json` | 1.x |
| Base64 (preview) | `base64` | 0.22.x |
### Protocol core (`crates/core/`)
| Encryption | `aes` + `cfb-mode` | latest | AES-CFB for Baichuan |
| Hashing | `md5` | 0.7.x | Challenge-response auth |
| Parsing | `nom` | 7.x | Binary protocol parsing |
| Serialisation | `cookie-factory` | 0.3.x | Binary protocol writing |
| XML | `quick-xml` | 0.36.x | Camera XML messages |
| CRC | `crc32fast` | 1.x | Packet checksums |
### Wake server (`crates/wake-server/`)
Minimal additional dependencies — uses `tokio` UDP, `crc32fast`, `quick-xml`, `serde` / `toml` from the workspace.
## Architecture patterns
### Per-camera task tree
Each camera gets a tokio task tree rooted in the orchestrator. All tasks for a camera share a `CancellationToken` derived from the global token:
```
global CancellationToken
├── MQTT event loop (polls broker, spawns dispatch tasks)
├── watchdog (30 s sweep, requests disconnect for idle cameras)
└── camera "frontdoor" token
└── connection loop (discover → login → keepalive → reconnect)
└── session tasks (cancelled on disconnect, aborted after 2 s)
├── motion detection listener (retry on error)
├── battery poller (configurable interval)
├── floodlight tasks poller (configurable interval)
├── floodlight event listener (real-time)
├── PIR state poller (one-shot on connect, re-publish on set)
├── preview poller (camera-lifetime — see Last-frame buffer)
└── grace period (countdown after last wake-lock release)
```
Cancelling the global token shuts down everything. Each camera has a child token. Each connected session has a session token that cancels pollers and listeners on disconnect.
### Wake lock + grace period
```rust
struct WakeLockInner {
count: AtomicUsize,
notify_release: Notify, // signals 1 → 0
notify_acquire: Notify, // signals 0 → 1
}
```
Two separate notifications:
- `notify_acquire` (0 → 1): wakes idle-disconnect cameras waiting for work.
- `notify_release` (1 → 0): triggers the grace-period countdown.
Both use `Notify::notify_one()` (not `notify_waiters()`) so a permit is stored when no waiter is registered — late `.notified().await` calls still fire. The grace-period timer listens on `notify_release` and starts a countdown; any new `acquire()` before the countdown expires resets it.
The watchdog also monitors wake-lock state and can force-disconnect via a `disconnect_signal: Notify` on each camera.
### Watchdog
A tokio task that runs every 30 seconds. For each camera with `idle_disconnect` enabled:
1. If connected with zero wake locks held, call `request_disconnect()` to cancel the session.
2. If there are stream sources with no subscribers older than `stream_prune_grace_secs` (default 30), drop them. The default is intentionally ≤ each camera's `idle_disconnect_timeout_secs` (default 45) so a cached `StreamSource` can never outlive the Baichuan session that feeds it.
The watchdog is a safety net, not the primary lifecycle mechanism.
### RTSP session lifecycle (battery camera)
```
RTSP client connects
→ acquire wake lock
→ if camera sleeping: wake camera, serve last-frame placeholder
→ camera awake: start Baichuan video stream
→ relay frames as RTP to client
RTSP client disconnects
→ stop Baichuan video stream
→ release wake lock
→ grace period starts
→ no new clients: camera disconnects and sleeps
```
The RTSP crate exposes connect/disconnect callbacks; the binary hooks these to the wake-lock system. The RTSP crate knows nothing about cameras.
### Last-frame buffer
```rust
struct LastFrameBuffer {
frame: RwLock<Option<Bytes>>, // most recent JPEG
}
```
Owned by `CameraHandle` (one per camera, not per stream). Updated on every video frame during active streaming and by `BcCamera::get_snapshot` calls. Read by:
- The RTSP server as a placeholder while the camera wakes.
- The MQTT preview publisher (`status/preview` topic).
Cleared on service restart; repopulated by the startup-wake cycle. Never persisted to disk.
### Placeholder streams during gaps
Per-`StreamSource` `GapState { Live, Bridging }`. A 200 ms ticker compares `last_live_frame_at` against `gap_threshold_secs` (default 1.0). On exceedance, the source flips to `Bridging` and re-broadcasts cached `VideoBurst::iframe_nals` with synthesised PTS so RTSP clients see continuous RTP. Audio packets are dropped on the wire while bridging but per-codec PTS counters advance via the camera's audio cadence so A/V stays aligned on Live resume.
Per-camera `PreviewState { Live, Connecting, Sleeping }` published via `watch::Sender`. The MQTT preview publisher composites a caption (e.g. `SLEEPING`) on stale JPEGs so HA dashboards distinguish live from stale.
### Trait abstractions for testing
| `CameraDriver` | `bairelay_neolink_core::bc_protocol::camera_driver` | Subset of `BcCamera` the binary calls |
| `CameraDiscoverer` | `bairelay_neolink_core::bc_protocol::connection::discovery` | Discovery fallback chain (local/remote/map/relay) |
| `VideoStream` | `bairelay_neolink_core::bc_protocol::stream` | `BcMedia` pull loop over `StreamData` |
| `PacketSource` | `bairelay::stream_source` (binary) | `BcMedia` injection for translator-loop tests |
| `StreamProvider` | `bairelay_rtsp::provider` | RTSP server's view of a camera |
Production impls forward to the concrete types (`BcCamera`, `Discovery`, `StreamData`, etc.); test impls (`FakeCamera`, `ScriptedDiscoverer`, `MockVideoStream`, `FakeStreamProvider`) live alongside and let unit tests exercise the same code paths a live camera would drive.
## Error handling strategy
- **Library crates** (`core`, `rtsp`, `mqtt`, `wake-server`): use `thiserror` with typed error enums. Each crate defines its own error type.
- **Binary crate** (`bairelay`): uses `anyhow` for top-level error propagation with context.
- **Connection failures**: logged and retried with exponential backoff. Never crash the process.
- **Authentication failures**: stop retrying permanently. Don't hammer the camera with bad credentials.
- **Protocol errors**: logged at warn / error. Malformed packets from cameras are discarded, not propagated.
- **Configuration errors**: fail fast at startup with clear messages.
The CLI's coarse exit-code table for one-shot commands (see `src/oneshot/classify.rs`):
| 0 | success |
| 1 | generic failure |
| 2 | usage (bad args, unknown camera, missing config) |
| 3 | config (malformed TOML, validation failure) |
| 4 | connection / auth (login refused, transport dead, DNS) |
| 5 | protocol (malformed reply, XML parse, time-sync issue) |
| 6 | unsupported (`MissingAbility` — camera lacks the feature) |
| 130 | Ctrl+C |
Scripts can branch on the exit code without parsing stdout.
## Reproducible builds
Bairelay release artefacts are bit-for-bit reproducible from `(commit, target triple, rustc version)`. The properties this rests on:
- `Cargo.lock` is committed; every CI and release `cargo`/`cross` invocation passes `--locked`.
- No `git = "..."` dependencies — everything resolves through `crates.io` or workspace `path =`.
- `build.rs` is absent. The version comes from `env!("CARGO_PKG_VERSION")`, sourced from `[workspace.package].version`.
- No build-time wall-clock timestamps, hostnames, usernames, or absolute build paths are embedded. Every `SystemTime::now()` / `OffsetDateTime::now_utc()` in the tree is runtime.
- `[profile.release]` sets `strip = "symbols"`. The release workflow additionally exports `RUSTFLAGS=--remap-path-prefix=...` to rewrite the cargo registry + workspace paths that otherwise leak into panic strings. Cargo's unified `trim-paths` profile key remains unstable in Cargo 1.95 — swap in when it stabilises.
- `SOURCE_DATE_EPOCH` is not consulted because no build date is embedded; the contract is the env-var-free baseline. Any future build-time date must go through `SOURCE_DATE_EPOCH` rather than `SystemTime::now()`.
Out-of-tree caveat: `aws-lc-rs` (pulled via `rustls = "0.23"`) compiles C code whose `__DATE__`/`__TIME__` may leak into the static archive. If Debian packaging surfaces it, the documented fallback is the rustls `ring` feature flag — one call site in `crates/rtsp/src/server/tls.rs::install_*`.