pmcp-code-mode 0.5.0

Code Mode validation and execution framework for MCP servers
Documentation

pmcp-code-mode

Code Mode validation and execution framework for MCP servers built on the PMCP SDK.

Enables LLM-generated code (GraphQL, JavaScript, SQL, MCP compositions) to be validated, explained, and executed with HMAC-signed approval tokens that cryptographically bind code to its validation result.

Status: v0.3.0 — multi-language validation with policy enforcement, standard adapters, deploy-time config. The public API is stabilizing; feedback is welcome before the 1.0 contract is locked.

How It Works

                       ┌──────────────┐
                       │   LLM Client │
                       └──────┬───────┘
                              │
               1. describe_schema()  <- schema exposed per exposure policy
                              │
               2. LLM generates code (GraphQL, JS, SQL, MCP composition)
                              │
               3. validate_code(code) ──────────────────────┐
                              │                              │
                    ┌─────────▼──────────┐                   │
                    │ ValidationPipeline │                   │
                    │  ┌───────────────┐ │     ┌────────────▼────────────┐
                    │  │ Parse         │ │     │ PolicyEvaluator (Cedar, │
                    │  │ Security scan │ │────>│ AVP, or custom)         │
                    │  │ Explain       │ │     └─────────────────────────┘
                    │  │ HMAC sign     │ │
                    │  └───────────────┘ │
                    └─────────┬──────────┘
                              │
                    approval_token (HMAC-SHA256 signed)
                              │
               4. User reviews explanation, approves
                              │
               5. execute_code(code, token) ────────────────┐
                              │                              │
                    ┌─────────▼──────────┐     ┌────────────▼──────┐
                    │ Token verification │     │ CodeExecutor impl │
                    │ (hash, expiry, sig)│────>│ (your backend)    │
                    └────────────────────┘     └───────────────────┘
                              │
                    execution result (JSON)

The token ensures that the exact code the user approved is what gets executed — any modification after validation invalidates the token.

Supported Languages

The language attribute on #[derive(CodeMode)] selects the validation path at compile time. Each language maps to a feature-gated validation method on ValidationPipeline:

Language Derive Attribute Validation Method Feature Required
GraphQL "graphql" (default) validate_graphql_query_async (none)
JavaScript "javascript" or "js" validate_javascript_code openapi-code-mode
SQL "sql" validate_sql_query sql-code-mode
MCP "mcp" validate_mcp_composition mcp-code-mode

The CodeLanguage enum in pmcp_code_mode::types is the runtime representation of these values. Unknown language strings produce a compile error at macro expansion time.

Quick Start

Minimal: Direct Pipeline Usage

All pipeline constructors return Result — invalid configuration (such as an HMAC secret shorter than 16 bytes) is caught at startup, not at runtime.

use pmcp_code_mode::{
    CodeModeConfig, TokenSecret, ValidationPipeline, ValidationContext,
};

let config = CodeModeConfig::enabled();
let secret = TokenSecret::new(b"my-secret-key-at-least-16-bytes!".to_vec());
let pipeline = ValidationPipeline::from_token_secret(config, &secret)?;

let ctx = ValidationContext::new("user-123", "session-456", "schema-hash", "perms-hash");
let result = pipeline.validate_graphql_query("query { users { id name } }", &ctx)?;

assert!(result.is_valid);
assert!(result.approval_token.is_some()); // HMAC-signed token

With Policy Evaluator

Wire a policy evaluator (Cedar, AWS Verified Permissions, or custom) into the pipeline for authorization checks between parsing and token signing:

use pmcp_code_mode::{
    CodeModeConfig, TokenSecret, ValidationPipeline, NoopPolicyEvaluator,
};
use std::sync::Arc;

let config = CodeModeConfig::enabled();
let secret = TokenSecret::new(b"my-secret-key-at-least-16-bytes!".to_vec());
let evaluator = Arc::new(NoopPolicyEvaluator::new()); // Use a real evaluator in production

let pipeline = ValidationPipeline::with_policy_evaluator(
    config, secret.expose_secret().to_vec(), evaluator
)?;

The policy evaluator is stored as Arc<dyn PolicyEvaluator>, enabling shared ownership across handlers and async tasks.

With #[derive(CodeMode)] (Recommended)

The derive macro eliminates ~80 lines of boilerplate per server and supports all four languages. See the pmcp-code-mode-derive README for the full derive guide.

GraphQL server (default):

use pmcp_code_mode::{CodeModeConfig, TokenSecret, NoopPolicyEvaluator, CodeExecutor};
use pmcp_code_mode_derive::CodeMode;
use std::sync::Arc;

#[derive(CodeMode)]
#[code_mode(context_from = "get_context")]
struct MyGraphQLServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<MyGraphQLExecutor>,
}

JavaScript/OpenAPI server (Cost Coach, etc.):

#[derive(CodeMode)]
#[code_mode(context_from = "get_context", language = "javascript")]
struct MyCostCoachServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<MyJsExecutor>,
}

SQL server:

#[derive(CodeMode)]
#[code_mode(context_from = "get_context", language = "sql")]
struct MySqlServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<MySqlExecutor>,
}

All derive-generated servers share the same pattern: the language attribute selects the parser, the context_from method binds tokens to real user identity, and CodeExecutor handles your backend-specific execution.

Field name convention: The derive macro identifies required fields by fixed names. Missing any field produces a compile error listing all absent fields.

Field Name Type Purpose
code_mode_config CodeModeConfig Validation pipeline config
token_secret TokenSecret HMAC signing secret
policy_evaluator Arc<impl PolicyEvaluator> Authorization backend
code_executor Arc<impl CodeExecutor> Your execution backend

Implementing CodeExecutor

This is the only trait you need to implement. The executor holds its own configuration (timeouts, limits, etc.) — CodeExecutor::execute() is intentionally kept simple:

use pmcp_code_mode::{CodeExecutor, ExecutionError, async_trait};
use serde_json::Value;

struct MyGraphQLExecutor { pool: PgPool }

#[async_trait]
impl CodeExecutor for MyGraphQLExecutor {
    async fn execute(
        &self,
        code: &str,          // Validated code (already token-verified)
        variables: Option<&Value>,
    ) -> Result<Value, ExecutionError> {
        // Execute against your backend. The framework has already verified
        // the HMAC token — do NOT re-verify here.
        let result = self.pool.execute_graphql(code, variables).await?;
        Ok(serde_json::to_value(result)?)
    }
}

For GraphQL and SQL servers, you implement CodeExecutor directly — your executor calls your database or GraphQL backend.

For JavaScript/OpenAPI, SDK, and MCP servers, use the standard adapters instead of implementing CodeExecutor manually.

Standard Adapters (JS/SDK/MCP)

These adapters bridge the low-level execution traits to CodeExecutor, eliminating ~75 lines of manual handler boilerplate per server. Each compiles JavaScript code via PlanCompiler, executes via PlanExecutor, and logs execution metadata automatically.

JsCodeExecutor<H> — JavaScript + HTTP calls (Pattern B). Requires js-runtime feature.

use pmcp_code_mode::{JsCodeExecutor, ExecutionConfig};

// Your HttpExecutor implementation (e.g., CostExplorerHttpExecutor)
let http = CostExplorerHttpExecutor::new(clients.clone());
let config = ExecutionConfig::default()
    .with_blocked_fields(["password", "ssn"]);
let code_executor = Arc::new(JsCodeExecutor::new(http, config));
// Pass as code_executor field in your #[derive(CodeMode)] struct

SdkCodeExecutor<S> — JavaScript + SDK operations (Pattern C). Requires js-runtime feature.

use pmcp_code_mode::{SdkCodeExecutor, ExecutionConfig};

let sdk = MyCostExplorerSdk::new(credentials);
let config = ExecutionConfig::default();
let code_executor = Arc::new(SdkCodeExecutor::new(sdk, config));

McpCodeExecutor<M> — JavaScript + MCP tool composition (Pattern D). Requires mcp-code-mode feature.

use pmcp_code_mode::{McpCodeExecutor, ExecutionConfig};

let mcp = MyMcpRouter::new(foundation_servers);
let config = ExecutionConfig::default();
let code_executor = Arc::new(McpCodeExecutor::new(mcp, config));

All three adapters:

  • Create a fresh PlanCompiler + PlanExecutor per call (cheap — your HttpExecutor/SdkExecutor/McpExecutor holds Arc'd state)
  • Forward variables into the execution plan as args (available in JS code as the args variable)
  • Log api_calls count and execution_time_ms via tracing::debug!

End-to-End: Cost Coach with Derive Macro

Before (manual handlers, ~75 lines):

struct ValidateState { pipeline: Arc<ValidationPipeline>, config: CodeModeConfig }
struct ExecuteState { pipeline: Arc<ValidationPipeline>, http: CostExplorerHttpExecutor, config: ExecutionConfig }
// ... implement ToolHandler for both, wire manually ...

After (derive macro + adapter, 8 lines):

#[derive(CodeMode)]
#[code_mode(context_from = "get_context", language = "javascript")]
struct CostCoachServer {
    code_mode_config: CodeModeConfig,
    token_secret: TokenSecret,
    policy_evaluator: Arc<NoopPolicyEvaluator>,
    code_executor: Arc<JsCodeExecutor<CostExplorerHttpExecutor>>,
}

let http = CostExplorerHttpExecutor::new(clients.clone());
let code_executor = Arc::new(JsCodeExecutor::new(http, ExecutionConfig::default()));
let server = Arc::new(CostCoachServer { /* ... */ });
let builder = server.register_code_mode_tools(pmcp::Server::builder())?;

Key Types

Type What It Does
ValidationPipeline Orchestrates: parse -> policy check -> security analysis -> explanation -> token
CodeModeConfig Controls what's allowed: mutations, introspection, blocked fields, max depth, TTL
CodeLanguage Enum of supported languages: GraphQL, JavaScript, Sql, Mcp
PolicyEvaluator Trait for pluggable authorization (Cedar, AWS Verified Permissions, custom)
CodeExecutor Trait for executing validated code against your backend
JsCodeExecutor<H> Standard adapter: HttpExecutor -> CodeExecutor (JS+HTTP, js-runtime feature)
SdkCodeExecutor<S> Standard adapter: SdkExecutor -> CodeExecutor (JS+SDK, js-runtime feature)
McpCodeExecutor<M> Standard adapter: McpExecutor -> CodeExecutor (JS+MCP, mcp-code-mode feature)
ExecutionConfig JS execution limits: max_api_calls, timeout_seconds, max_loop_iterations, blocked fields
TokenSecret Zeroizing HMAC secret — backed by secrecy::SecretBox<[u8]>, no Debug/Clone/Serialize
HmacTokenGenerator Creates HMAC-SHA256 tokens binding code hash + context to approval
TokenError Error type for constructor failures (e.g. HMAC secret too short)
ApprovalToken Signed token: code hash, user ID, session ID, expiry, risk level, context hash
NoopPolicyEvaluator Test-only evaluator that allows everything — NOT for production
ValidationResponse Handler-level response wrapping ValidationResult + auto-approval, action, code hash
ExecutionConfig JS execution limits: max_api_calls, timeout_seconds, max_loop_iterations
CodeModeHandler Server-side handler trait with tool builder, pre-handle hooks, soft-disable

Configuration

CodeModeConfig controls the validation pipeline behavior:

let config = CodeModeConfig {
    enabled: true,
    allow_mutations: false,          // Block mutations by default
    blocked_mutations: HashSet::from(["deleteAll".into()]),
    allowed_mutations: HashSet::new(), // Empty = all non-blocked mutations allowed
    blocked_queries: HashSet::new(),
    allowed_queries: HashSet::new(),
    allow_introspection: false,      // Block schema introspection
    blocked_fields: HashSet::from(["User.ssn".into(), "User.password".into()]),
    max_query_depth: 10,
    max_query_fields: 100,
    token_ttl_seconds: 300,          // 5-minute token expiry
    auto_approve_threshold: Some(RiskLevel::Low), // Auto-approve low-risk queries
    ..CodeModeConfig::enabled()
};

Query and Mutation Authorization

The pipeline enforces config-level authorization checks before policy evaluation:

  • Mutation control: allow_mutations (global toggle), blocked_mutations (blocklist), allowed_mutations (allowlist). If allowed_mutations is non-empty, only listed mutations pass.
  • Query control: blocked_queries (blocklist), allowed_queries (allowlist). Same allowlist-takes-precedence semantics as mutations.
  • Policy evaluation: After config checks pass, PolicyEvaluator::evaluate_operation() runs (if configured) for fine-grained authorization.

Deployment Configuration (config.toml)

When deploying with cargo pmcp deploy, the server's config.toml is automatically included in the deploy ZIP. The pmcp.run platform extracts operation metadata from this file to populate the Code Mode policy page in the admin UI — administrators can then enable/disable individual operations by category.

config.toml Schema

The [[code_mode.operations]] section declares available operations. When present, it takes priority over [[tools]] for policy categorization.

OpenAPI server:

[server]
name = "cost-coach"
type = "openapi-api"

[code_mode]
allow_writes = false
allow_deletes = false

[[code_mode.operations]]
name = "getCostAndUsage"
description = "Retrieve AWS cost and usage data"
path = "/ce/GetCostAndUsage"
method = "POST"

[[code_mode.operations]]
name = "getRecommendations"
description = "Get cost optimization recommendations"
path = "/ce/GetRightsizingRecommendation"
method = "POST"

[[code_mode.operations]]
name = "deleteBudget"
description = "Delete a budget"
path = "/budgets/DeleteBudget"
method = "POST"
destructive_hint = true

GraphQL server:

[server]
name = "open-images"
type = "graphql-api"

[code_mode]
allow_writes = false

[[code_mode.operations]]
name = "searchImages"
operation_type = "query"
description = "Search the image catalog"

[[code_mode.operations]]
name = "createCollection"
operation_type = "mutation"
description = "Create a new image collection"

[[code_mode.operations]]
name = "deleteImage"
operation_type = "mutation"
description = "Permanently delete an image"
destructive_hint = true

SQL server:

[server]
name = "analytics"
type = "sql"

[code_mode]
allow_writes = true
allow_deletes = false
blocked_tables = ["audit_log", "credentials"]

[database]
[[database.tables]]
name = "orders"
description = "Customer order history"

[[database.tables]]
name = "products"
description = "Product catalog"

MCP composition server:

[server]
name = "orchestrator"
type = "mcp-api"

[[code_mode.operations]]
name = "analyze_costs"
description = "Multi-step cost analysis workflow"
operation_category = "read"

[[code_mode.operations]]
name = "provision_resources"
description = "Provision cloud resources"
operation_category = "admin"

Categorization Rules

Operations are automatically categorized based on server type:

Server Type Read Write Delete Admin
OpenAPI GET, HEAD, OPTIONS POST, PUT, PATCH DELETE operation_category = "admin"
GraphQL query mutation mutation with destructive_hint or delete/remove/destroy prefix operation_category = "admin"
SQL SELECT on non-blocked tables INSERT, UPDATE (if allow_writes) DELETE (if allow_deletes)
MCP-API read_only_hint / default create/add/update/set name patterns delete/remove/destroy patterns operation_category = "admin"

The operation_category field overrides automatic categorization when set explicitly.

Config File Resolution

cargo pmcp deploy finds the config file using this resolution order:

  1. config.toml in the server crate root
  2. Single .toml file in instances/ directory

The same file the server embeds via include_str!() in main.rs.

Feature Flags

Feature Default What It Adds
(none) yes GraphQL validation via graphql-parser
openapi-code-mode no JavaScript/OpenAPI validation via SWC parser
js-runtime no JavaScript AST-based execution in pure Rust (implies openapi-code-mode)
sql-code-mode no SQL query validation and parameterization
mcp-code-mode no MCP-to-MCP tool composition (implies js-runtime)
cedar no Local Cedar policy evaluation via cedar-policy 4.9

Dependency chain: mcp-code-mode -> js-runtime -> openapi-code-mode

Security Design

See SECURITY.md for the full threat model.

Token security:

  • HMAC-SHA256 binds: code hash + user ID + session ID + server ID + context hash + risk level + expiry
  • Token TTL default: 5 minutes
  • Code canonicalization prevents whitespace-based bypass
  • Any code modification after validation invalidates the token

Secret handling:

  • TokenSecret backed by secrecy::SecretBox<[u8]>, zeroed on drop
  • Explicitly does not implement: Debug, Display, Clone, Serialize, Deserialize, PartialEq
  • Minimum 16-byte secret enforced at construction — HmacTokenGenerator::new returns Result<Self, TokenError> (no panic)
  • Access only via expose_secret() — framework-internal, never needed by server code

Policy evaluation:

  • Default-deny: without a configured PolicyEvaluator, only basic config checks run
  • Policy evaluator stored as Arc<dyn PolicyEvaluator> — shared safely across async handlers
  • Both GraphQL and JavaScript validation call their respective policy evaluation methods (evaluate_operation / evaluate_script) — fail-closed on policy errors
  • Cedar support via cedar feature flag (local evaluation, no network)
  • AVP (AWS Verified Permissions) support via external evaluator — policies configured in pmcp.run admin UI
  • NoopPolicyEvaluator for tests only — prominently documented with warnings

Schema Exposure Architecture

The three-layer schema model controls what the LLM sees:

Full Schema -> Exposure Policy -> Derived Schema -> LLM
              (filter/redact)   (what the LLM sees)
  • ExposureMode::Full — expose everything
  • ExposureMode::ReadOnly — expose reads, hide mutations
  • ExposureMode::Allowlist — only specified operations
  • ExposureMode::Custom — per-operation overrides via ToolOverride

Breaking Changes in v0.1.0

Constructors now return Result

All ValidationPipeline constructors and HmacTokenGenerator::new return Result instead of panicking on invalid input. This catches misconfiguration at startup.

// Before (v0.0.x):
let pipeline = ValidationPipeline::new(config, secret);

// After (v0.1.0):
let pipeline = ValidationPipeline::new(config, secret)?;

Policy evaluator uses Arc (not Box)

with_policy_evaluator and set_policy_evaluator now accept Arc<dyn PolicyEvaluator> instead of Box<dyn PolicyEvaluator>. This enables shared ownership needed by the derive macro's generated handlers.

// Before:
pipeline.set_policy_evaluator(Box::new(my_evaluator));

// After:
pipeline.set_policy_evaluator(Arc::new(my_evaluator));

language attribute selects validation path

#[code_mode(language = "...")] now dispatches to the correct language-specific validation method at compile time, not just tool metadata. Servers using JavaScript, SQL, or MCP can now use #[derive(CodeMode)] instead of manual handler structs.

Breaking Changes in v0.3.0

JavaScript derive macro now calls async validation with policy enforcement

#[derive(CodeMode)] with language = "javascript" now calls validate_javascript_code_async instead of the sync validate_javascript_code. This means:

  • Cedar policies are now enforced for JavaScript servers using the derive macro
  • AVP policies are now enforced when deployed with POLICY_STORE_ID on pmcp.run
  • Policy evaluation failures are fail-closed (same as GraphQL) — a policy backend outage blocks requests rather than silently allowing them

If your JavaScript server was relying on the absence of policy enforcement (e.g., using a custom PolicyEvaluator that only implemented evaluate_operation but not evaluate_script), the default evaluate_script implementation denies all scripts. Override evaluate_script in your evaluator to allow scripts, or use NoopPolicyEvaluator for testing.

Standard adapters added

JsCodeExecutor, SdkCodeExecutor, and McpCodeExecutor are new. They don't break existing code, but if you were manually implementing CodeExecutor for JS plan execution, you can now replace ~75 lines of boilerplate with:

let code_executor = Arc::new(JsCodeExecutor::new(http_client, ExecutionConfig::default()));

See CHANGELOG.md for the full list of changes.

Known Limitations (v0.1.0)

  1. TokenSecret::new does not zeroize the source Vec. The bytes are copied into SecretBox but the original Vec is not zeroed. Use TokenSecret::from_env() in production for maximum security.

  2. GraphQL only in default features. JavaScript/OpenAPI validation requires the openapi-code-mode feature flag and pulls in SWC (~25MB compile artifact).

  3. No server-side token revocation. Tokens are stateless (verified by HMAC). Once issued, a token is valid until it expires. Short TTL (5 min default) mitigates this.

  4. SQL and MCP validators are stub. The validate_sql_query and validate_mcp_composition methods require their respective feature flags. These validators are being implemented — the derive macro dispatch is ready.

Crate Dependencies

Minimal in the default feature set:

graphql-parser 0.4    — GraphQL parsing (pure Rust, no proc macros)
hmac 0.13 + sha2 0.11 — HMAC-SHA256 token signing
secrecy 0.10          — Secret memory management
zeroize 1.8           — Memory zeroing on drop
chrono 0.4            — Token timestamps
hex 0.4               — Hash encoding
base64 0.22           — Token encoding
serde + serde_json    — Serialization
thiserror             — Error types
async-trait           — Async trait support

The cedar feature adds cedar-policy 4.9 (~3MB). The openapi-code-mode feature adds SWC.

Running the Example

cargo run --example s41_code_mode_graphql --features full

This demonstrates the full validate -> approve -> execute round trip, including a rejection path for blocked mutations.

Feedback Welcome

This is a pre-1.0 API. Key areas where we'd like team input:

  • Standard adapters — does JsCodeExecutor/SdkCodeExecutor/McpCodeExecutor cover your execution pattern, or do you need a different adapter shape?
  • Variables forwarding — the adapters pass variables as the args variable in JS plans. Does your server need a different variable binding strategy?
  • Derive macro ergonomics — are the fixed field names (code_mode_config, token_secret, etc.) workable, or do you need attribute-based field mapping?
  • context_from pattern — does returning ValidationContext from a sync method work for your auth integration, or do you need an async version?
  • SQL validation — what SQL dialects do you need? Parameterized queries, prepared statements, or raw SQL only?
  • MCP composition — what should validate_mcp_composition check? Schema compatibility, tool existence, or structural validation?
  • Policy evaluation — any use cases beyond Cedar and AVP?

File issues or discuss in the #pmcp-sdk channel.

License

MIT