cellos-server 0.5.1

HTTP control plane for CellOS — admission, projection over JetStream, WebSocket fan-out of CloudEvents. Pure event-sourced architecture.
Documentation

cellos-server

The CellOS HTTP control plane — a thin projection over JetStream CloudEvents that admits formations, lists state, and streams events to clients.

What it is

cellos-server is the operator-facing API. It exposes a small REST surface (POST /v1/formations, GET /v1/formations[/{id}], DELETE /v1/formations/{id}, GET /v1/cells[/{id}]) and a single WebSocket endpoint (GET /ws/events) that streams CloudEvents in real time. The server is built with axum 0.7 + tower-http 0.5 on top of async-nats and async-nats::jetstream.

It sits at L7 of the layer model — above the supervisor and the event log, below cellctl. The architectural contract is from CHATROOM.md Session 16 and ADR-0011: cellos-server is a pure-state-machine projection over the JetStream CELLOS_EVENTS stream. The in-memory registry (AppState::formations, AppState::cells) is a cache for query latency only — it MUST be rebuildable by replaying cellos.events.> from sequence 1. HTTP is the query interface; WebSocket is the live projection feed; NATS is the source of truth.

What cellos-server deliberately does NOT do:

  • It does not run cells (that is cellos-supervisor).
  • It does not own state of its own — the registry is a derived projection.
  • It does not serve a UI bundle. Per ADR-0017, the web view is served by cellctl webui as a localhost reverse proxy; the ServeDir fallback that lived here in early drafts is gone, and unmatched paths return 404 (src/lib.rs:30).
  • It does not authorise browser writes. ADR-0016 enforces the read-only browser boundary structurally: the CORS layer (src/lib.rs:59) only advertises GET and OPTIONS, so even a compromised in-page script that slipped past the cellctl-webui proxy is refused at preflight by any compliant browser.

Public API surface

The crate is mostly an axum binary; the library surface is the seam used by integration tests and future embedders.

  • router(state: AppState) -> Router — assemble the full axum router with all canonical routes mounted. src/lib.rs:39.
  • AppState — clonable per-request state (NATS client, JetStream context, formations/cells registries, API token, applied cursor). src/state.rs:26.
  • AppState::new(nats, api_token) — constructor used by both main.rs and the test harness. src/state.rs:58.
  • AppState::with_jetstream(ctx) — attach the JetStream context after ensure_stream succeeds. src/state.rs:72.
  • AppState::cursor() / bump_cursor(seq) — the ADR-0015 §D2 cursor. src/state.rs:78.
  • state::CellRecord — the per-cell projection row. src/state.rs:233.
  • state::FormationRecord — the per-formation projection row. src/state.rs:222.
  • state::FormationStatus — the formation state-machine enum (PENDING, LAUNCHING, RUNNING, DEGRADED, COMPLETED, FAILED). src/state.rs:210.
  • state::ApplyOutcome — the result of applying a CloudEvent to the projection. src/state.rs:195.
  • jetstream::STREAM_NAME / STREAM_SUBJECT — the CELLOS_EVENTS stream binding (cellos.events.>). src/jetstream.rs:65.
  • jetstream::ensure_stream(&Client) — best-effort create-or-attach of the durable JetStream stream. src/jetstream.rs:94.
  • jetstream::replay_projection(&AppState, &Context) — replay events from sequence 1 to rebuild the projection cache. src/jetstream.rs:185.
  • jetstream::open_ws_message_stream(...) — open the per-connection message stream that backs /ws/events. src/jetstream.rs:272.
  • ws::ws_events — WebSocket handler. src/ws.rs:73.
  • ws::WsParams — the ?subject= + ?since= query parameters. src/ws.rs:60.
  • routes::formations::*, routes::cells::* — the HTTP handlers; not intended for direct re-use but documented here for reference.

The bearer-token contract (Authorization: Bearer <api_token>) is enforced in src/auth.rs and called from every handler before any state access.

Architecture / how it works

        ┌──────────────┐
        │   cellos    │ ──► HTTP/WS over loopback
        └──────┬───────┘
               │
               ▼
        ┌──────────────┐      ┌────────────────────────────┐
        │ cellos-server│◄─────│ JetStream  (CELLOS_EVENTS) │
        │   (axum)     │──────►│  subject: cellos.events.>  │
        └──────┬───────┘      └────────────────────────────┘
               │                          ▲
               │                          │
   in-memory  ▼                          │
   projection cache                cellos-supervisor publishes
   (BTreeMap, RwLock)              every lifecycle / observability /
                                   identity / policy CloudEvent here

Startup flow (src/main.rs):

  1. Read CELLOS_SERVER_BIND, CELLOS_NATS_URL, and the required CELLOS_SERVER_API_TOKEN (fail-closed: empty/unset → refuse to start).
  2. Best-effort connect to NATS. A broker outage at startup is not fatal — the HTTP query interface MUST serve cached state precisely when the event log is unhealthy, so operators can inspect the system. WebSocket clients see an immediate close until the broker returns.
  3. Call ensure_stream to bind the CELLOS_EVENTS durable stream, then call replay_projection to rebuild AppState.formations and AppState.cells from sequence 1 (ADR-0011 §Consequences). The CELLOS_SERVER_SKIP_REPLAY env var bypasses this for tests.
  4. Bind the listener, mount router(state), run axum::serve with graceful shutdown on SIGTERM/SIGINT.

The WebSocket bridge (src/ws.rs) accepts ?since=<seq> per ADR-0015 §D3 and emits a JSON envelope {"seq": N, "event": {...}} per frame (src/ws.rs:1). A 25-second Ping heartbeat (ADR-0015 §D6) keeps the connection alive across NAT timeouts and lets the client detect a dead upstream within the 45s budget the web view tolerates (src/ws.rs:HEARTBEAT).

The CORS layer in router() (src/lib.rs:59) advertises only GET and OPTIONS. A unit test (cors_preflight_for_post_does_not_allow_post, src/lib.rs:87) asserts the ADR-0016 structural enforcement.

Configuration

Env var Default Effect
CELLOS_SERVER_BIND 127.0.0.1:8080 TCP listen address.
CELLOS_NATS_URL nats://127.0.0.1:4222 Broker URL. Outage at startup is non-fatal.
CELLOS_SERVER_API_TOKEN (required) Bearer token for every route. Server refuses to start when unset or empty.
CELLOS_SERVER_SKIP_REPLAY unset When 1/true, skip the ADR-0011 replay-on-boot.
RUST_LOG / EnvFilter info tracing-subscriber filter. JSON output is on by default.

Examples

Mount the router into a test:

use axum::body::Body;
use axum::http::{header, Method, Request, StatusCode};
use cellos_server::{router, AppState};
use tower::ServiceExt;

#[tokio::test]
async fn ping() {
    let state = AppState::new(None, "test-token");
    let app   = router(state);

    let req = Request::builder()
        .method(Method::GET)
        .uri("/v1/formations")
        .header(header::AUTHORIZATION, "Bearer test-token")
        .body(Body::empty())
        .unwrap();

    let resp = app.oneshot(req).await.unwrap();
    assert_eq!(resp.status(), StatusCode::OK);
}

Run the binary against a local broker:

CELLOS_SERVER_BIND=127.0.0.1:8080 \
CELLOS_NATS_URL=nats://127.0.0.1:4222 \
CELLOS_SERVER_API_TOKEN=$(openssl rand -hex 32) \
RUST_LOG=info \
cargo run -p cellos-server --bin cellos-server

Stream events:

cellctl events --follow
# or directly:
curl -i --no-buffer \
     -H "Authorization: Bearer $CELLOS_SERVER_API_TOKEN" \
     -H "Upgrade: websocket" \
     "http://127.0.0.1:8080/ws/events?since=0"

Testing

cargo test -p cellos-server

Most tests drive the axum router via tower::ServiceExt::oneshot and need no broker. The integration tests under crates/cellos-server/tests/ include formation_authority_invariant.rs (ADR-0010 §Enforcement — exercises all four rejection paths through POST /v1/formations) and signed_envelope_round_trip.rs. The JetStream-dependent paths (replay_projection, the WS bridge with a live consumer) are exercised in workspace-level integration tests; running them requires a local NATS with JetStream enabled (nats-server -js).

Related crates

  • cellos-core — owns CloudEventV1, formation/lifecycle event builders, and the spec validators consumed by routes::formations.
  • cellos-supervisor — the producer of every CloudEvent this server projects.
  • cellos-projector — the offline equivalent of replay_projection for audit work.
  • cellos-ctl — the operator client and the read-only browser proxy in front of this server.

ADRs

  • ADR-0001 — NATS JetStream as the proprietary host substrate.
  • ADR-0010 — formation admission invariant; enforced in routes::formations.
  • ADR-0011 — this crate; defines the projection-cache + replay-on-boot contract.
  • ADR-0014 — the formation event family the server emits and projects.
  • ADR-0015 — WebSocket cursor (seq), ?since= resume, and heartbeat.
  • ADR-0016 — read-only browser boundary, enforced by the CORS allow-methods list.
  • ADR-0017 — the static-bundle responsibility moved to cellctl webui; this server is API-only.