1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
//! Production deployment guide.
//!
//! This module is documentation only. It covers operational patterns for
//! running `tower-mcp` servers in production: load balancers, session
//! affinity, horizontal scaling, reverse proxies, observability, and
//! containerized deployments.
//!
//! # Overview
//!
//! MCP is session-oriented. The `mcp-session-id` header established during
//! `initialize` ties subsequent requests together, and `HttpTransport`
//! keeps live state per session (broadcast channel for SSE notifications,
//! pending sampling requests, service instance). Two deployment shapes
//! cover most cases:
//!
//! 1. **Session affinity.** A load balancer routes all requests for a
//! given session back to the same server instance. Any in-memory
//! session store works; no cross-instance state needed.
//! 2. **Shared stores.** Server instances plug [`session_store`] and
//! [`event_store`] into an external backend (Redis, Postgres, etc.) so
//! session metadata and SSE event buffers survive across instances.
//! Required when you can't (or don't want to) pin sessions to a host.
//!
//! # Single-Instance Deployment
//!
//! The minimal production server is the same shape as the examples:
//!
//! ```rust,no_run
//! use tower_mcp::{HttpTransport, McpRouter};
//!
//! # async fn run() -> Result<(), tower_mcp::BoxError> {
//! let router = McpRouter::new().server_info("my-server", "1.0.0");
//! let transport = HttpTransport::new(router);
//! transport.serve("0.0.0.0:3000").await?;
//! # Ok(()) }
//! ```
//!
//! ## systemd unit
//!
//! Run as a service with restart-on-failure and structured logging
//! captured by the journal:
//!
//! ```ini
//! [Unit]
//! Description=MCP server
//! After=network.target
//!
//! [Service]
//! ExecStart=/usr/local/bin/my-mcp-server
//! Restart=on-failure
//! RestartSec=5s
//! Environment=RUST_LOG=info
//!
//! # Drop privileges
//! User=mcp
//! Group=mcp
//!
//! # Systemd hardening (optional but recommended)
//! ProtectSystem=strict
//! ProtectHome=true
//! PrivateTmp=true
//! NoNewPrivileges=true
//!
//! [Install]
//! WantedBy=multi-user.target
//! ```
//!
//! ## Graceful shutdown
//!
//! `axum::serve` supports graceful shutdown via its `with_graceful_shutdown`
//! method. Use `into_router()` so you own the server lifecycle:
//!
//! ```rust,no_run
//! # use tower_mcp::{HttpTransport, McpRouter};
//! # async fn run() -> Result<(), tower_mcp::BoxError> {
//! let router = McpRouter::new().server_info("my-server", "1.0.0");
//! let app = HttpTransport::new(router).into_router();
//! let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
//!
//! axum::serve(listener, app)
//! .with_graceful_shutdown(async {
//! tokio::signal::ctrl_c().await.ok();
//! })
//! .await?;
//! # Ok(()) }
//! ```
//!
//! # Health Checks
//!
//! [`HttpTransport`] exposes `GET /health` returning `200 OK` with a JSON
//! body. Use it as the liveness/readiness probe for your load balancer or
//! orchestrator:
//!
//! ```yaml
//! # Kubernetes
//! livenessProbe:
//! httpGet:
//! path: /health
//! port: 3000
//! initialDelaySeconds: 5
//! periodSeconds: 10
//!
//! readinessProbe:
//! httpGet:
//! path: /health
//! port: 3000
//! periodSeconds: 5
//! ```
//!
//! The health endpoint does not require a session; it's safe to probe
//! without establishing MCP state.
//!
//! # Load Balancer Patterns
//!
//! ## Session affinity (simplest)
//!
//! Configure the load balancer to consistently route the same session to
//! the same backend. Two common strategies:
//!
//! - **Hash on `mcp-session-id` header.** Gives exact session affinity.
//! Requires a Layer 7 load balancer that can read application headers
//! (nginx, HAProxy, Envoy, AWS ALB, Azure Application Gateway).
//! - **Source-IP hash.** Works with L4 load balancers but breaks when
//! clients share NAT egress or reconnect from different networks.
//!
//! Session affinity is the right default: simpler to reason about, no
//! external dependencies, and matches how SSE streams behave naturally.
//!
//! ## Azure Load Balancer idle timeout (gotcha)
//!
//! Azure Standard Load Balancer defaults to a 4-minute TCP idle timeout.
//! Long-running SSE streams without traffic will be dropped. Raise the
//! idle timeout (`idle_timeout_in_minutes` in Terraform, up to 30 min) or
//! emit periodic SSE keepalives to keep the connection warm.
//!
//! AWS NLB and GCP NLB have similar idle-timeout semantics. Application
//! load balancers (AWS ALB, Azure Application Gateway) generally handle
//! this better.
//!
//! ## No affinity: use shared stores
//!
//! When you can't pin sessions (e.g. scaling events remap hashes, clients
//! cross VPC boundaries, stateless API gateways), externalize both:
//!
//! ```rust,no_run
//! # use std::sync::Arc;
//! # use tower_mcp::{HttpTransport, McpRouter};
//! # use tower_mcp::session_store::{MemorySessionStore, SessionStore};
//! # use tower_mcp::event_store::{EventStore, MemoryEventStore};
//! # async fn run() -> Result<(), tower_mcp::BoxError> {
//! # let router = McpRouter::new();
//! // Replace MemorySessionStore/MemoryEventStore with Redis-backed
//! // implementations in production.
//! let session_store: Arc<dyn SessionStore> = Arc::new(MemorySessionStore::new());
//! let event_store: Arc<dyn EventStore> = Arc::new(MemoryEventStore::new());
//!
//! HttpTransport::new(router)
//! .session_store(session_store)
//! .event_store(event_store)
//! .serve("0.0.0.0:3000")
//! .await?;
//! # Ok(()) }
//! ```
//!
//! See [`session_store`] and [`event_store`] for the trait definitions and
//! the included in-memory / caching implementations.
//!
//! # Reverse Proxies
//!
//! tower-mcp's SSE streams require buffering disabled and generous read
//! timeouts. Three common proxies:
//!
//! ## nginx
//!
//! ```nginx
//! location / {
//! proxy_pass http://mcp_backend;
//! proxy_http_version 1.1;
//!
//! # Disable buffering for SSE
//! proxy_buffering off;
//! proxy_cache off;
//!
//! # Preserve connection for SSE
//! proxy_set_header Connection "";
//! chunked_transfer_encoding on;
//!
//! # Long-lived streams: must exceed session TTL (default 30 min)
//! proxy_read_timeout 3600s;
//! proxy_send_timeout 3600s;
//!
//! # Preserve MCP headers
//! proxy_set_header Host $host;
//! proxy_set_header X-Real-IP $remote_addr;
//! }
//! ```
//!
//! ## Caddy
//!
//! ```caddyfile
//! mcp.example.com {
//! reverse_proxy mcp-backend:3000 {
//! flush_interval -1
//! transport http {
//! read_timeout 1h
//! write_timeout 1h
//! }
//! }
//! }
//! ```
//!
//! `flush_interval -1` disables response buffering, which is required for
//! SSE to stream events as they happen.
//!
//! ## Traefik
//!
//! ```yaml
//! http:
//! middlewares:
//! sse-no-buffer:
//! buffering:
//! maxResponseBodyBytes: 0
//! services:
//! mcp:
//! loadBalancer:
//! servers:
//! - url: "http://mcp-backend:3000"
//! responseForwarding:
//! flushInterval: "-1ms"
//! ```
//!
//! # Observability
//!
//! ## Structured logging
//!
//! Use `tracing-subscriber` with JSON output for machine-parseable logs.
//! The transport emits spans keyed by `session_id`, letting you correlate
//! every log line for a session:
//!
//! ```rust,ignore
//! // Requires the "json" feature on tracing-subscriber.
//! use tracing_subscriber::{EnvFilter, fmt};
//!
//! fmt()
//! .json()
//! .with_env_filter(EnvFilter::from_default_env())
//! .init();
//! ```
//!
//! ## Tower middleware for request-level tracing
//!
//! [`HttpTransport::layer`](crate::HttpTransport::layer) stacks tower
//! middleware on every MCP request. `tower-http`'s `TraceLayer` is a good
//! starting point:
//!
//! ```rust,no_run
//! # use tower_mcp::{HttpTransport, McpRouter};
//! # use tower::ServiceBuilder;
//! # async fn run() -> Result<(), tower_mcp::BoxError> {
//! # let router = McpRouter::new();
//! // Wrap the MCP router in tracing middleware.
//! let transport = HttpTransport::new(router);
//! // Additional axum-level middleware can be applied to the resulting
//! // Router via .layer(), including tower_http::trace::TraceLayer.
//! let app = transport.into_router();
//! // app = app.layer(tower_http::trace::TraceLayer::new_for_http());
//! # Ok(()) }
//! ```
//!
//! ## Session metrics
//!
//! [`HttpTransport::into_router_with_handle`](crate::HttpTransport::into_router_with_handle)
//! returns a [`SessionHandle`](crate::SessionHandle) you can use from an
//! admin endpoint or metrics exporter:
//!
//! ```rust,no_run
//! # use tower_mcp::{HttpTransport, McpRouter};
//! # async fn run() -> Result<(), tower_mcp::BoxError> {
//! # let router = McpRouter::new();
//! let (app, handle) = HttpTransport::new(router).into_router_with_handle();
//!
//! // Somewhere else:
//! let count = handle.session_count().await;
//! for info in handle.list_sessions().await {
//! tracing::info!(session_id = %info.id, age = ?info.created_at, "active");
//! }
//! # Ok(()) }
//! ```
//!
//! # Containerized and Sidecar Patterns
//!
//! For pod-local or sidecar deployments where only processes on the same
//! host need to talk to the server, use the Unix socket transport. It
//! avoids TCP overhead and removes the need to pick a port:
//!
//! ```rust,no_run
//! # #[cfg(unix)]
//! # async fn run() -> Result<(), tower_mcp::BoxError> {
//! # use tower_mcp::{McpRouter, UnixSocketTransport};
//! let router = McpRouter::new().server_info("sidecar", "1.0.0");
//! UnixSocketTransport::new(router)
//! .serve("/run/mcp/server.sock")
//! .await?;
//! # Ok(()) }
//! ```
//!
//! Pair this with a volume mount in Kubernetes to expose the socket to
//! other containers in the same pod, or to the host filesystem for
//! systemd socket activation.
//!
//! # Capacity Planning
//!
//! - **Per-session memory.** Each live session holds a broadcast channel
//! (100 messages) and an event buffer (default 1000 events). Plan for
//! roughly 100 KB/session as a rough ceiling.
//! - **Cleanup interval.** Default 1 minute. Expired sessions are removed
//! from the registry and purged from the session/event stores on the
//! cleanup pass.
//! - **Session TTL.** Default 30 minutes. Tune via
//! [`HttpTransport::session_ttl`](crate::HttpTransport::session_ttl).
//! Balance between memory pressure and forcing clients to re-initialize.
//! - **Max sessions.** Cap with
//! [`HttpTransport::max_sessions`](crate::HttpTransport::max_sessions)
//! to backpressure clients when the server is saturated.
//!
//! # See Also
//!
//! - [`session_store`] for the `SessionStore` trait and implementations.
//! - [`event_store`] for the `EventStore` trait and SSE event persistence.
//! - The `session_store` and `event_store` examples in the repo show
//! wrapping stores with logging/caching wrappers.
//!
//! [`session_store`]: crate::session_store
//! [`event_store`]: crate::event_store
//! [`HttpTransport`]: crate::HttpTransport