cellos-server 0.5.3

HTTP control plane for CellOS — admission, projection over JetStream, WebSocket fan-out of CloudEvents. Pure event-sourced architecture.
Documentation
# cellos-server

The CellOS HTTP control plane — a thin projection over JetStream CloudEvents
that admits formations, lists state, and streams events to clients.

## What it is

`cellos-server` is the operator-facing API. It exposes a small REST surface
(`POST /v1/formations`, `GET /v1/formations[/{id}]`, `DELETE
/v1/formations/{id}`, `GET /v1/cells[/{id}]`) and a single WebSocket
endpoint (`GET /ws/events`) that streams CloudEvents in real time. The
server is built with axum 0.7 + tower-http 0.5 on top of `async-nats` and
`async-nats::jetstream`.

It sits at L7 of the layer model — above the supervisor and the event log,
below `cellctl`. The architectural contract is from CHATROOM.md Session 16
and ADR-0011: `cellos-server` is a **pure-state-machine projection** over
the JetStream `CELLOS_EVENTS` stream. The in-memory registry
(`AppState::formations`, `AppState::cells`) is a *cache for query latency
only* — it MUST be rebuildable by replaying `cellos.events.>` from
sequence 1. HTTP is the query interface; WebSocket is the live projection
feed; NATS is the source of truth.

What `cellos-server` deliberately does NOT do:

- It does not run cells (that is `cellos-supervisor`).
- It does not own state of its own — the registry is a derived projection.
- It does not serve a UI bundle. Per ADR-0017, the web view is served by
  `cellctl webui` as a localhost reverse proxy; the `ServeDir` fallback
  that lived here in early drafts is gone, and unmatched paths return
  404 (`src/lib.rs:30`).
- It does not authorise browser writes. ADR-0016 enforces the read-only
  browser boundary *structurally*: the CORS layer (`src/lib.rs:59`) only
  advertises `GET` and `OPTIONS`, so even a compromised in-page script
  that slipped past the cellctl-webui proxy is refused at preflight by
  any compliant browser.

## Public API surface

The crate is mostly an axum binary; the library surface is the seam used
by integration tests and future embedders.

- `router(state: AppState) -> Router` — assemble the full axum router
  with all canonical routes mounted. `src/lib.rs:39`.
- `AppState` — clonable per-request state (NATS client, JetStream
  context, formations/cells registries, API token, applied cursor).
  `src/state.rs:26`.
- `AppState::new(nats, api_token)` — constructor used by both `main.rs`
  and the test harness. `src/state.rs:58`.
- `AppState::with_jetstream(ctx)` — attach the JetStream context after
  `ensure_stream` succeeds. `src/state.rs:72`.
- `AppState::cursor()` / `bump_cursor(seq)` — the ADR-0015 §D2 cursor.
  `src/state.rs:78`.
- `state::CellRecord` — the per-cell projection row.
  `src/state.rs:233`.
- `state::FormationRecord` — the per-formation projection row.
  `src/state.rs:222`.
- `state::FormationStatus` — the formation state-machine enum
  (`PENDING`, `LAUNCHING`, `RUNNING`, `DEGRADED`, `COMPLETED`, `FAILED`).
  `src/state.rs:210`.
- `state::ApplyOutcome` — the result of applying a CloudEvent to the
  projection. `src/state.rs:195`.
- `jetstream::STREAM_NAME` / `STREAM_SUBJECT` — the `CELLOS_EVENTS`
  stream binding (`cellos.events.>`). `src/jetstream.rs:65`.
- `jetstream::ensure_stream(&Client)` — best-effort create-or-attach of
  the durable JetStream stream. `src/jetstream.rs:94`.
- `jetstream::replay_projection(&AppState, &Context)` — replay events
  from sequence 1 to rebuild the projection cache. `src/jetstream.rs:185`.
- `jetstream::open_ws_message_stream(...)` — open the per-connection
  message stream that backs `/ws/events`. `src/jetstream.rs:272`.
- `ws::ws_events` — WebSocket handler. `src/ws.rs:73`.
- `ws::WsParams` — the `?subject=` + `?since=` query parameters.
  `src/ws.rs:60`.
- `routes::formations::*`, `routes::cells::*` — the HTTP handlers; not
  intended for direct re-use but documented here for reference.

The bearer-token contract (`Authorization: Bearer <api_token>`) is
enforced in `src/auth.rs` and called from every handler before any state
access.

## Architecture / how it works

```
        ┌──────────────┐
        │   cellos    │ ──► HTTP/WS over loopback
        └──────┬───────┘
        ┌──────────────┐      ┌────────────────────────────┐
        │ cellos-server│◄─────│ JetStream  (CELLOS_EVENTS) │
        │   (axum)     │──────►│  subject: cellos.events.>  │
        └──────┬───────┘      └────────────────────────────┘
               │                          ▲
               │                          │
   in-memory  ▼                          │
   projection cache                cellos-supervisor publishes
   (BTreeMap, RwLock)              every lifecycle / observability /
                                   identity / policy CloudEvent here
```

Startup flow (`src/main.rs`):

1. Read `CELLOS_SERVER_BIND`, `CELLOS_NATS_URL`, and the required
   `CELLOS_SERVER_API_TOKEN` (fail-closed: empty/unset → refuse to
   start).
2. Best-effort connect to NATS. A broker outage at startup is not
   fatal — the HTTP query interface MUST serve cached state precisely
   when the event log is unhealthy, so operators can inspect the system.
   WebSocket clients see an immediate close until the broker returns.
3. Call `ensure_stream` to bind the `CELLOS_EVENTS` durable stream, then
   call `replay_projection` to rebuild `AppState.formations` and
   `AppState.cells` from sequence 1 (ADR-0011 §Consequences). The
   `CELLOS_SERVER_SKIP_REPLAY` env var bypasses this for tests.
4. Bind the listener, mount `router(state)`, run `axum::serve` with
   graceful shutdown on SIGTERM/SIGINT.

The WebSocket bridge (`src/ws.rs`) accepts `?since=<seq>` per ADR-0015
§D3 and emits a JSON envelope `{"seq": N, "event": {...}}` per frame
(`src/ws.rs:1`). A 25-second `Ping` heartbeat (ADR-0015 §D6) keeps the
connection alive across NAT timeouts and lets the client detect a dead
upstream within the 45s budget the web view tolerates
(`src/ws.rs:HEARTBEAT`).

The CORS layer in `router()` (`src/lib.rs:59`) advertises only `GET` and
`OPTIONS`. A unit test (`cors_preflight_for_post_does_not_allow_post`,
`src/lib.rs:87`) asserts the ADR-0016 structural enforcement.

## Configuration

| Env var | Default | Effect |
|---|---|---|
| `CELLOS_SERVER_BIND` | `127.0.0.1:8080` | TCP listen address. |
| `CELLOS_NATS_URL` | `nats://127.0.0.1:4222` | Broker URL. Outage at startup is non-fatal. |
| `CELLOS_SERVER_API_TOKEN` | (required) | Bearer token for every route. Server refuses to start when unset or empty. |
| `CELLOS_SERVER_SKIP_REPLAY` | unset | When `1`/`true`, skip the ADR-0011 replay-on-boot. |
| `RUST_LOG` / `EnvFilter` | `info` | tracing-subscriber filter. JSON output is on by default. |

## Examples

Mount the router into a test:

```rust
use axum::body::Body;
use axum::http::{header, Method, Request, StatusCode};
use cellos_server::{router, AppState};
use tower::ServiceExt;

#[tokio::test]
async fn ping() {
    let state = AppState::new(None, "test-token");
    let app   = router(state);

    let req = Request::builder()
        .method(Method::GET)
        .uri("/v1/formations")
        .header(header::AUTHORIZATION, "Bearer test-token")
        .body(Body::empty())
        .unwrap();

    let resp = app.oneshot(req).await.unwrap();
    assert_eq!(resp.status(), StatusCode::OK);
}
```

Run the binary against a local broker:

```bash
CELLOS_SERVER_BIND=127.0.0.1:8080 \
CELLOS_NATS_URL=nats://127.0.0.1:4222 \
CELLOS_SERVER_API_TOKEN=$(openssl rand -hex 32) \
RUST_LOG=info \
cargo run -p cellos-server --bin cellos-server
```

Stream events:

```bash
cellctl events --follow
# or directly:
curl -i --no-buffer \
     -H "Authorization: Bearer $CELLOS_SERVER_API_TOKEN" \
     -H "Upgrade: websocket" \
     "http://127.0.0.1:8080/ws/events?since=0"
```

## Testing

```bash
cargo test -p cellos-server
```

Most tests drive the axum router via `tower::ServiceExt::oneshot` and
need no broker. The integration tests under `crates/cellos-server/tests/`
include `formation_authority_invariant.rs` (ADR-0010 §Enforcement —
exercises all four rejection paths through `POST /v1/formations`) and
`signed_envelope_round_trip.rs`. The JetStream-dependent paths
(`replay_projection`, the WS bridge with a live consumer) are exercised
in workspace-level integration tests; running them requires a local
NATS with JetStream enabled (`nats-server -js`).

## Related crates

- [`cellos-core`]../cellos-core/README.md — owns `CloudEventV1`,
  formation/lifecycle event builders, and the spec validators consumed
  by `routes::formations`.
- [`cellos-supervisor`]../cellos-supervisor/README.md — the producer
  of every CloudEvent this server projects.
- [`cellos-projector`]../cellos-projector/README.md — the offline
  equivalent of `replay_projection` for audit work.
- [`cellos-ctl`]../cellos-ctl/README.md — the operator client and the
  read-only browser proxy in front of this server.

## ADRs

- [ADR-0001]../../docs/adr/0001-rust-nats-jetstream-proprietary-host.md
  — NATS JetStream as the proprietary host substrate.
- [ADR-0010]../../docs/adr/0010-formation-authority-invariant.md  formation admission invariant; enforced in `routes::formations`.
- [ADR-0011]../../docs/adr/0011-cellos-server-http-control-plane.md  this crate; defines the projection-cache + replay-on-boot contract.
- [ADR-0014]../../docs/adr/0014-formation-cloudevent-state-model.md  the formation event family the server emits and projects.
- [ADR-0015]../../docs/adr/0015-websocket-cursor-and-reconnect.md  WebSocket cursor (`seq`), `?since=` resume, and heartbeat.
- [ADR-0016]../../docs/adr/0016-web-view-read-only-boundary.md  read-only browser boundary, enforced by the CORS allow-methods list.
- [ADR-0017]../../docs/adr/0017-cellctl-webui-localhost-proxy.md  the static-bundle responsibility moved to `cellctl webui`; this
  server is API-only.