mlua_swarm_server/operator_ws.rs
1//! # WebSocket Operator Callback IF
2//!
3//! Path for seating an external HTTP/WS caller as an **Operator role** inside
4//! the Engine. One WS connection = one session = three traits co-hosted
5//! (`Operator` / `SeniorBridge` / `SpawnHook`); a single sid is registered into
6//! all three registries simultaneously.
7//!
8//! ## Architecture overview
9//!
10//! ```text
11//! ┌─────────────── External Operator (Human / Agent / other process) ────────┐
12//! │ WS /v1/operators/:sid/ws (Bearer required) │
13//! │ S→C: Ask{req_id,task_id,question} │
14//! │ S→C: HookBefore{req_id,task_id,agent,attempt} │
15//! │ S→C: HookAfter{req_id,task_id,agent,attempt,result} (fire-and-forget)│
16//! │ S→C: Spawn{req_id,task_id,agent,attempt,capability_token} │
17//! │ C→S: Answer{req_id,value} (SeniorBridge.ask reply) │
18//! │ C→S: HookAck{req_id,ok,reason?} (SpawnHook.before reply) │
19//! │ C→S: SpawnAck{req_id,value,ok,error?} (Operator.execute reply) │
20//! └────────────────────────────────┬────────────────────────────────────────┘
21//! │ axum WebSocket
22//! ┌────────────────────────────────▼────────────────────────────────────────┐
23//! │ login.rs │
24//! │ operators_ws_connect — `GET /v1/operators/:sid/ws` upgrade, │
25//! │ Bearer token check against minted sid │
26//! │ handle_operator_socket — write task / read task / disconnect path │
27//! └────────────────────────────────┬────────────────────────────────────────┘
28//! │
29//! ┌────────────────────────────────▼────────────────────────────────────────┐
30//! │ session.rs : WSOperatorSession │
31//! │ sid + auth_token + tx (Mutex<Option<>>) + pending (Mutex<HashMap>) │
32//! │ impl SeniorBridge { ask → send Ask + wait Answer } │
33//! │ impl SpawnHook { before → send HookBefore + wait HookAck / │
34//! │ after → send HookAfter fire-and-forget } │
35//! │ impl Operator { execute → send Spawn + wait SpawnAck, │
36//! │ thin-forward capability_token to MainAI } │
37//! └────────────────────────────────┬────────────────────────────────────────┘
38//! │ same sid → registered into 3 registries at once
39//! ▼
40//! engine.senior_bridges / spawn_hooks / operators (SoT)
41//! │ dispatch_attempt → resolve_operator_info
42//! │ looks up session.bridge_id / hook_id / operator_backend_id
43//! ▼
44//! Ctx.operator (= read by SeniorEscalationMiddleware / MainAIMiddleware /
45//! OperatorDelegateMiddleware)
46//!
47//! protocol.rs : ServerMsg / ClientMsg / PendingReply (= wire format + internal reply IR)
48//! ```
49//!
50//! ## Thin-control discipline for Spawn (the Spawn thin-control axis)
51//!
52//! The server sends only `Spawn{capability_token}`; the MainAI (WS Client) forwards the
53//! token to the SubAgent, and the SubAgent hits `/v1/worker/prompt` +
54//! `/v1/worker/result` itself with `Authorization: Bearer <capability_token>`
55//! (= heavy payloads go over HTTP; WS stays purely thin control). See
56//! `protocol::ServerMsg::Spawn` and `mlua_swarm::Operator::execute`
57//! for details.
58//!
59//! ## Design rationale (= for future re-constructors)
60//!
61//! - **3 traits co-hosted**: Holding all 3 faces of the Operator role
62//! (judgment = `SeniorBridge` / observation = `SpawnHook` / execution =
63//! `Operator`) in a single session gives 1 WS connection = 1 Operator that
64//! answers ask/before/after/spawn — the natural shape. Registering the same
65//! sid into three registries preserves "same Operator" semantics on the
66//! Registry axis as well.
67//! - **`Mutex<Option<Sender>>` for tx swap-in**: `None` on disconnect,
68//! `Some(new_tx)` on reconnect. The pending `HashMap` persists on the session
69//! side, so a client that held answer/ack values during a disconnect can
70//! reconnect and resend them. (In v1.5, sends during a disconnect fail
71//! immediately — the client is responsible for remembering its own pending.)
72//! - **req_id naming**: `<sid>-<ask|hb|ha|spawn>-<uuid>` covers both the trait
73//! axis and uniqueness. Clients can identify the trait from the req_id.
74//! - **`parent_req_id` field**: Schema for representing nesting (e.g. a hook
75//! firing inside an ask). In v1.5 the engine-side middleware does not fire
76//! nested calls, so this is always `None`; v2 will re-introduce nesting via
77//! `task_local`.
78//!
79//! ## Out of scope for v1.5 (carry)
80//!
81//! - Buffering / replay of ask/spawn/hook_before during a disconnect (= sends
82//! currently just return `Err` on failure).
83//! - Automatic session-TTL cleanup (= session leaks after disconnect wait for
84//! the admin `DELETE` endpoint).
85//! - True nested ask (= depends on a middleware extension; the `parent_req_id`
86//! schema is already carried).
87//! - Multi-Blueprint scope separation (= a single WS Operator currently serves
88//! as the Operator for all tasks).
89//! - `CapToken` consistency between the Operator session and the engine attach session.
90//!
91//! ## REST-like login flow (`login.rs`) — sole Operator session entry point
92//!
93//! `POST/GET/DELETE /v1/operators` + `WS /v1/operators/:sid/ws` (`login.rs`) is
94//! the only Operator session route. The login flow mints the sid server-side,
95//! requires Bearer auth (no empty-string default), and enforces a
96//! roles-exclusivity 409 at mint time. See the `login` module doc for details.
97
98/// REST-like Operator session resource (`POST/GET/DELETE /v1/operators` + WS upgrade).
99pub mod login;
100/// Wire format (`ServerMsg` / `ClientMsg`) for `WS /v1/operators/:sid/ws`.
101pub mod protocol;
102/// `WSOperatorSession`: the 3-trait (`SeniorBridge`/`SpawnHook`/`Operator`) WS session object.
103pub mod session;
104
105pub use login::{
106 operators_create, operators_delete, operators_info, operators_ws_connect, OperatorSessionEntry,
107 OperatorsCreateReq, OperatorsCreateResp, OperatorsInfoResp,
108};
109pub use protocol::{ClientMsg, ServerMsg};
110pub use session::WSOperatorSession;