pmcp-code-mode 0.2.0

Code Mode validation and execution framework for MCP servers
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
# pmcp-code-mode

Code Mode validation and execution framework for MCP servers built on the PMCP SDK.

Enables LLM-generated code (GraphQL, JavaScript, SQL, MCP compositions) to be **validated, explained, and executed** with HMAC-signed approval tokens that cryptographically bind code to its validation result.

> **Status:** v0.1.0 — migrated from `pmcp-run/built-in/shared/pmcp-code-mode` into the SDK workspace in Phase 67.1. The public API is stabilizing; feedback is welcome before the 1.0 contract is locked.

## How It Works

```text
                       ┌──────────────┐
                       │   LLM Client │
                       └──────┬───────┘
               1. describe_schema()  <- schema exposed per exposure policy
               2. LLM generates code (GraphQL, JS, SQL, MCP composition)
               3. validate_code(code) ──────────────────────┐
                              │                              │
                    ┌─────────▼──────────┐                   │
                    │ ValidationPipeline │                   │
                    │  ┌───────────────┐ │     ┌────────────▼────────────┐
                    │  │ Parse         │ │     │ PolicyEvaluator (Cedar, │
                    │  │ Security scan │ │────>│ AVP, or custom)         │
                    │  │ Explain       │ │     └─────────────────────────┘
                    │  │ HMAC sign     │ │
                    │  └───────────────┘ │
                    └─────────┬──────────┘
                    approval_token (HMAC-SHA256 signed)
               4. User reviews explanation, approves
               5. execute_code(code, token) ────────────────┐
                              │                              │
                    ┌─────────▼──────────┐     ┌────────────▼──────┐
                    │ Token verification │     │ CodeExecutor impl │
                    │ (hash, expiry, sig)│────>│ (your backend)    │
                    └────────────────────┘     └───────────────────┘
                    execution result (JSON)
```

The token ensures that the **exact code** the user approved is what gets executed — any modification after validation invalidates the token.

## Supported Languages

The `language` attribute on `#[derive(CodeMode)]` selects the validation path at compile time. Each language maps to a feature-gated validation method on `ValidationPipeline`:

| Language | Derive Attribute | Validation Method | Feature Required |
|----------|-----------------|-------------------|------------------|
| GraphQL | `"graphql"` (default) | `validate_graphql_query_async` | *(none)* |
| JavaScript | `"javascript"` or `"js"` | `validate_javascript_code` | `openapi-code-mode` |
| SQL | `"sql"` | `validate_sql_query` | `sql-code-mode` |
| MCP | `"mcp"` | `validate_mcp_composition` | `mcp-code-mode` |

The `CodeLanguage` enum in `pmcp_code_mode::types` is the runtime representation of these values. Unknown language strings produce a compile error at macro expansion time.

## Quick Start

### Minimal: Direct Pipeline Usage

All pipeline constructors return `Result` — invalid configuration (such as an HMAC secret shorter than 16 bytes) is caught at startup, not at runtime.

```rust
use pmcp_code_mode::{
    CodeModeConfig, TokenSecret, ValidationPipeline, ValidationContext,
};

let config = CodeModeConfig::enabled();
let secret = TokenSecret::new(b"my-secret-key-at-least-16-bytes!".to_vec());
let pipeline = ValidationPipeline::from_token_secret(config, &secret)?;

let ctx = ValidationContext::new("user-123", "session-456", "schema-hash", "perms-hash");
let result = pipeline.validate_graphql_query("query { users { id name } }", &ctx)?;

assert!(result.is_valid);
assert!(result.approval_token.is_some()); // HMAC-signed token
```

### With Policy Evaluator

Wire a policy evaluator (Cedar, AWS Verified Permissions, or custom) into the pipeline for authorization checks between parsing and token signing:

```rust
use pmcp_code_mode::{
    CodeModeConfig, TokenSecret, ValidationPipeline, NoopPolicyEvaluator,
};
use std::sync::Arc;

let config = CodeModeConfig::enabled();
let secret = TokenSecret::new(b"my-secret-key-at-least-16-bytes!".to_vec());
let evaluator = Arc::new(NoopPolicyEvaluator::new()); // Use a real evaluator in production

let pipeline = ValidationPipeline::with_policy_evaluator(
    config, secret.expose_secret().to_vec(), evaluator
)?;
```

The policy evaluator is stored as `Arc<dyn PolicyEvaluator>`, enabling shared ownership across handlers and async tasks.

### With `#[derive(CodeMode)]` (Recommended)

The derive macro eliminates ~80 lines of boilerplate per server and supports all four languages. See the [pmcp-code-mode-derive README](../pmcp-code-mode-derive/README.md) for the full derive guide.

**GraphQL server (default):**

```rust
use pmcp_code_mode::{CodeModeConfig, TokenSecret, NoopPolicyEvaluator, CodeExecutor};
use pmcp_code_mode_derive::CodeMode;
use std::sync::Arc;

#[derive(CodeMode)]
#[code_mode(context_from = "get_context")]
struct MyGraphQLServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<MyGraphQLExecutor>,
}
```

**JavaScript/OpenAPI server (Cost Coach, etc.):**

```rust
#[derive(CodeMode)]
#[code_mode(context_from = "get_context", language = "javascript")]
struct MyCostCoachServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<MyJsExecutor>,
}
```

**SQL server:**

```rust
#[derive(CodeMode)]
#[code_mode(context_from = "get_context", language = "sql")]
struct MySqlServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<MySqlExecutor>,
}
```

All derive-generated servers share the same pattern: the `language` attribute selects the parser, the `context_from` method binds tokens to real user identity, and `CodeExecutor` handles your backend-specific execution.

**Field name convention:** The derive macro identifies required fields by fixed names. Missing any field produces a compile error listing all absent fields.

| Field Name | Type | Purpose |
|------------|------|---------|
| `code_mode_config` | `CodeModeConfig` | Validation pipeline config |
| `token_secret` | `TokenSecret` | HMAC signing secret |
| `policy_evaluator` | `Arc<impl PolicyEvaluator>` | Authorization backend |
| `code_executor` | `Arc<impl CodeExecutor>` | Your execution backend |

### Implementing `CodeExecutor`

This is the only trait you need to implement. The executor holds its own configuration (timeouts, limits, etc.) — `CodeExecutor::execute()` is intentionally kept simple:

```rust
use pmcp_code_mode::{CodeExecutor, ExecutionError, async_trait};
use serde_json::Value;

struct MyGraphQLExecutor { pool: PgPool }

#[async_trait]
impl CodeExecutor for MyGraphQLExecutor {
    async fn execute(
        &self,
        code: &str,          // Validated code (already token-verified)
        variables: Option<&Value>,
    ) -> Result<Value, ExecutionError> {
        // Execute against your backend. The framework has already verified
        // the HMAC token — do NOT re-verify here.
        let result = self.pool.execute_graphql(code, variables).await?;
        Ok(serde_json::to_value(result)?)
    }
}
```

For **GraphQL** and **SQL** servers, you implement `CodeExecutor` directly — your executor calls your database or GraphQL backend.

For **JavaScript/OpenAPI**, **SDK**, and **MCP** servers, use the standard adapters instead of implementing `CodeExecutor` manually.

### Standard Adapters (JS/SDK/MCP)

These adapters bridge the low-level execution traits to `CodeExecutor`, eliminating ~75 lines of manual handler boilerplate per server. Each compiles JavaScript code via `PlanCompiler`, executes via `PlanExecutor`, and logs execution metadata automatically.

**`JsCodeExecutor<H>`** — JavaScript + HTTP calls (Pattern B). Requires `js-runtime` feature.

```rust
use pmcp_code_mode::{JsCodeExecutor, ExecutionConfig};

// Your HttpExecutor implementation (e.g., CostExplorerHttpExecutor)
let http = CostExplorerHttpExecutor::new(clients.clone());
let config = ExecutionConfig::default()
    .with_blocked_fields(["password", "ssn"]);
let code_executor = Arc::new(JsCodeExecutor::new(http, config));
// Pass as code_executor field in your #[derive(CodeMode)] struct
```

**`SdkCodeExecutor<S>`** — JavaScript + SDK operations (Pattern C). Requires `js-runtime` feature.

```rust
use pmcp_code_mode::{SdkCodeExecutor, ExecutionConfig};

let sdk = MyCostExplorerSdk::new(credentials);
let config = ExecutionConfig::default();
let code_executor = Arc::new(SdkCodeExecutor::new(sdk, config));
```

**`McpCodeExecutor<M>`** — JavaScript + MCP tool composition (Pattern D). Requires `mcp-code-mode` feature.

```rust
use pmcp_code_mode::{McpCodeExecutor, ExecutionConfig};

let mcp = MyMcpRouter::new(foundation_servers);
let config = ExecutionConfig::default();
let code_executor = Arc::new(McpCodeExecutor::new(mcp, config));
```

All three adapters:
- Create a fresh `PlanCompiler` + `PlanExecutor` per call (cheap — your `HttpExecutor`/`SdkExecutor`/`McpExecutor` holds `Arc`'d state)
- Forward `variables` into the execution plan as `args` (available in JS code as the `args` variable)
- Log `api_calls` count and `execution_time_ms` via `tracing::debug!`

### End-to-End: Cost Coach with Derive Macro

Before (manual handlers, ~75 lines):
```rust,ignore
struct ValidateState { pipeline: Arc<ValidationPipeline>, config: CodeModeConfig }
struct ExecuteState { pipeline: Arc<ValidationPipeline>, http: CostExplorerHttpExecutor, config: ExecutionConfig }
// ... implement ToolHandler for both, wire manually ...
```

After (derive macro + adapter, 8 lines):
```rust,ignore
#[derive(CodeMode)]
#[code_mode(context_from = "get_context", language = "javascript")]
struct CostCoachServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<JsCodeExecutor<CostExplorerHttpExecutor>>,
}

let http = CostExplorerHttpExecutor::new(clients.clone());
let code_executor = Arc::new(JsCodeExecutor::new(http, ExecutionConfig::default()));
let server = Arc::new(CostCoachServer { /* ... */ });
let builder = server.register_code_mode_tools(pmcp::Server::builder())?;
```

## Key Types

| Type | What It Does |
|------|-------------|
| `ValidationPipeline` | Orchestrates: parse -> policy check -> security analysis -> explanation -> token |
| `CodeModeConfig` | Controls what's allowed: mutations, introspection, blocked fields, max depth, TTL |
| `CodeLanguage` | Enum of supported languages: `GraphQL`, `JavaScript`, `Sql`, `Mcp` |
| `PolicyEvaluator` | Trait for pluggable authorization (Cedar, AWS Verified Permissions, custom) |
| `CodeExecutor` | Trait for executing validated code against your backend |
| `JsCodeExecutor<H>` | Standard adapter: `HttpExecutor` -> `CodeExecutor` (JS+HTTP, `js-runtime` feature) |
| `SdkCodeExecutor<S>` | Standard adapter: `SdkExecutor` -> `CodeExecutor` (JS+SDK, `js-runtime` feature) |
| `McpCodeExecutor<M>` | Standard adapter: `McpExecutor` -> `CodeExecutor` (JS+MCP, `mcp-code-mode` feature) |
| `ExecutionConfig` | JS execution limits: `max_api_calls`, `timeout_seconds`, `max_loop_iterations`, blocked fields |
| `TokenSecret` | Zeroizing HMAC secret — backed by `secrecy::SecretBox<[u8]>`, no Debug/Clone/Serialize |
| `HmacTokenGenerator` | Creates HMAC-SHA256 tokens binding code hash + context to approval |
| `TokenError` | Error type for constructor failures (e.g. HMAC secret too short) |
| `ApprovalToken` | Signed token: code hash, user ID, session ID, expiry, risk level, context hash |
| `NoopPolicyEvaluator` | **Test-only** evaluator that allows everything — NOT for production |
| `ValidationResponse` | Handler-level response wrapping `ValidationResult` + auto-approval, action, code hash |
| `ExecutionConfig` | JS execution limits: `max_api_calls`, `timeout_seconds`, `max_loop_iterations` |
| `CodeModeHandler` | Server-side handler trait with tool builder, pre-handle hooks, soft-disable |

## Configuration

`CodeModeConfig` controls the validation pipeline behavior:

```rust
let config = CodeModeConfig {
    enabled: true,
    allow_mutations: false,          // Block mutations by default
    blocked_mutations: HashSet::from(["deleteAll".into()]),
    allowed_mutations: HashSet::new(), // Empty = all non-blocked mutations allowed
    blocked_queries: HashSet::new(),
    allowed_queries: HashSet::new(),
    allow_introspection: false,      // Block schema introspection
    blocked_fields: HashSet::from(["User.ssn".into(), "User.password".into()]),
    max_query_depth: 10,
    max_query_fields: 100,
    token_ttl_seconds: 300,          // 5-minute token expiry
    auto_approve_threshold: Some(RiskLevel::Low), // Auto-approve low-risk queries
    ..CodeModeConfig::enabled()
};
```

### Query and Mutation Authorization

The pipeline enforces config-level authorization checks before policy evaluation:

- **Mutation control:** `allow_mutations` (global toggle), `blocked_mutations` (blocklist), `allowed_mutations` (allowlist). If `allowed_mutations` is non-empty, only listed mutations pass.
- **Query control:** `blocked_queries` (blocklist), `allowed_queries` (allowlist). Same allowlist-takes-precedence semantics as mutations.
- **Policy evaluation:** After config checks pass, `PolicyEvaluator::evaluate_operation()` runs (if configured) for fine-grained authorization.

## Feature Flags

| Feature | Default | What It Adds |
|---------|---------|-------------|
| *(none)* | yes | GraphQL validation via `graphql-parser` |
| `openapi-code-mode` | no | JavaScript/OpenAPI validation via SWC parser |
| `js-runtime` | no | JavaScript AST-based execution in pure Rust (implies `openapi-code-mode`) |
| `sql-code-mode` | no | SQL query validation and parameterization |
| `mcp-code-mode` | no | MCP-to-MCP tool composition (implies `js-runtime`) |
| `cedar` | no | Local Cedar policy evaluation via `cedar-policy 4.9` |

**Dependency chain:** `mcp-code-mode` -> `js-runtime` -> `openapi-code-mode`

## Security Design

See [SECURITY.md](./SECURITY.md) for the full threat model.

**Token security:**
- HMAC-SHA256 binds: code hash + user ID + session ID + server ID + context hash + risk level + expiry
- Token TTL default: 5 minutes
- Code canonicalization prevents whitespace-based bypass
- Any code modification after validation invalidates the token

**Secret handling:**
- `TokenSecret` backed by `secrecy::SecretBox<[u8]>`, zeroed on drop
- Explicitly **does not implement**: `Debug`, `Display`, `Clone`, `Serialize`, `Deserialize`, `PartialEq`
- Minimum 16-byte secret enforced at construction — `HmacTokenGenerator::new` returns `Result<Self, TokenError>` (no panic)
- Access only via `expose_secret()` — framework-internal, never needed by server code

**Policy evaluation:**
- Default-deny: without a configured `PolicyEvaluator`, only basic config checks run
- Policy evaluator stored as `Arc<dyn PolicyEvaluator>` — shared safely across async handlers
- Cedar support via `cedar` feature flag (local evaluation, no network)
- `NoopPolicyEvaluator` for tests only — prominently documented with warnings

## Schema Exposure Architecture

The three-layer schema model controls what the LLM sees:

```text
Full Schema -> Exposure Policy -> Derived Schema -> LLM
              (filter/redact)   (what the LLM sees)
```

- `ExposureMode::Full` — expose everything
- `ExposureMode::ReadOnly` — expose reads, hide mutations
- `ExposureMode::Allowlist` — only specified operations
- `ExposureMode::Custom` — per-operation overrides via `ToolOverride`

## Breaking Changes in v0.1.0

### Constructors now return `Result`

All `ValidationPipeline` constructors and `HmacTokenGenerator::new` return `Result` instead of panicking on invalid input. This catches misconfiguration at startup.

```rust
// Before (v0.0.x):
let pipeline = ValidationPipeline::new(config, secret);

// After (v0.1.0):
let pipeline = ValidationPipeline::new(config, secret)?;
```

### Policy evaluator uses `Arc` (not `Box`)

`with_policy_evaluator` and `set_policy_evaluator` now accept `Arc<dyn PolicyEvaluator>` instead of `Box<dyn PolicyEvaluator>`. This enables shared ownership needed by the derive macro's generated handlers.

```rust
// Before:
pipeline.set_policy_evaluator(Box::new(my_evaluator));

// After:
pipeline.set_policy_evaluator(Arc::new(my_evaluator));
```

### `language` attribute selects validation path

`#[code_mode(language = "...")]` now dispatches to the correct language-specific validation method at compile time, not just tool metadata. Servers using JavaScript, SQL, or MCP can now use `#[derive(CodeMode)]` instead of manual handler structs.

See [CHANGELOG.md](./CHANGELOG.md) for the full list of changes.

## Known Limitations (v0.1.0)

1. **`TokenSecret::new` does not zeroize the source `Vec`.** The bytes are copied into `SecretBox` but the original `Vec` is not zeroed. Use `TokenSecret::from_env()` in production for maximum security.

2. **GraphQL only in default features.** JavaScript/OpenAPI validation requires the `openapi-code-mode` feature flag and pulls in SWC (~25MB compile artifact).

3. **No server-side token revocation.** Tokens are stateless (verified by HMAC). Once issued, a token is valid until it expires. Short TTL (5 min default) mitigates this.

4. **JavaScript validation is sync only.** `validate_javascript_code` is synchronous (no async variant). The derive macro handles this transparently — the generated async handler calls the sync method without `.await`.

5. **SQL and MCP validators are stub.** The `validate_sql_query` and `validate_mcp_composition` methods require their respective feature flags. These validators are being implemented — the derive macro dispatch is ready.

## Crate Dependencies

Minimal in the default feature set:

```
graphql-parser 0.4    — GraphQL parsing (pure Rust, no proc macros)
hmac 0.13 + sha2 0.11 — HMAC-SHA256 token signing
secrecy 0.10          — Secret memory management
zeroize 1.8           — Memory zeroing on drop
chrono 0.4            — Token timestamps
hex 0.4               — Hash encoding
base64 0.22           — Token encoding
serde + serde_json    — Serialization
thiserror             — Error types
async-trait           — Async trait support
```

The `cedar` feature adds `cedar-policy 4.9` (~3MB). The `openapi-code-mode` feature adds SWC.

## Running the Example

```bash
cargo run --example s41_code_mode_graphql --features full
```

This demonstrates the full validate -> approve -> execute round trip, including a rejection path for blocked mutations.

## Feedback Welcome

This is a pre-1.0 API. Key areas where we'd like team input:

- **Standard adapters** — does `JsCodeExecutor`/`SdkCodeExecutor`/`McpCodeExecutor` cover your execution pattern, or do you need a different adapter shape?
- **Variables forwarding** — the adapters pass `variables` as the `args` variable in JS plans. Does your server need a different variable binding strategy?
- **Derive macro ergonomics** — are the fixed field names (`code_mode_config`, `token_secret`, etc.) workable, or do you need attribute-based field mapping?
- **`context_from` pattern** — does returning `ValidationContext` from a sync method work for your auth integration, or do you need an async version?
- **SQL validation** — what SQL dialects do you need? Parameterized queries, prepared statements, or raw SQL only?
- **MCP composition** — what should `validate_mcp_composition` check? Schema compatibility, tool existence, or structural validation?
- **Policy evaluation** — any use cases beyond Cedar and AVP?

File issues or discuss in the `#pmcp-sdk` channel.

## License

MIT