Memory MCP Integration
MCP (Model Context Protocol) server integration for the self-learning memory system with secure code execution capabilities.
Features
- MCP Server: Standard MCP protocol implementation with 19 tools
- Episode Lifecycle Management: Programmatic episode creation, tracking, and completion (NEW in v0.1.13)
- Secure Code Sandbox: WASM-based code execution with comprehensive security
- Memory Integration: Query episodic memory and analyze learned patterns
- Pattern Analysis: Advanced pattern extraction and recommendations
- Embeddings Support: Multiple providers (OpenAI, Ollama, local models)
- Progressive Tool Disclosure: Tools prioritized based on usage patterns
- Execution Monitoring: Detailed statistics and performance tracking
Implementation Status
Phase 2A: Wasmtime WASM Sandbox ✅ COMPLETE
Status: Production-ready POC eliminating rquickjs GC crashes
- ✅ wasmtime 24.0.5 integration
- ✅ Concurrent execution without SIGABRT crashes
- ✅ 100-parallel stress test passing
- ✅ Semaphore-based pooling (max 20 concurrent)
- ✅ Comprehensive metrics and health monitoring
- ✅ All tests passing (5/5)
Key Achievement: Zero GC crashes under high concurrency (100 parallel executions)
Phase 2B: JavaScript Support via Javy (Next)
Goal: Enable JavaScript/TypeScript execution through Javy compiler
- ⏳ Javy v8.0.0 integration (JavaScript→WASM)
- ⏳ WASI preview1 (stdout/stderr capture)
- ⏳ Fuel-based timeout enforcement
- ⏳ Performance benchmarking vs baseline
Note: The
javybackend requires either a bundledjavy-plugin.wasmplugin (set viaJAVY_PLUGIN) or thejavyCLI available on PATH. CI will attempt to install the CLI when running thejavy-backendfeature; if neither is present, Javy tests will be skipped gracefully.
Phase 1: rquickjs Migration ✅ COMPLETE
Problem Solved: rquickjs v0.6.2 had critical GC race conditions causing SIGABRT crashes under concurrent test execution.
Solution: Disabled WASM sandbox in all tests (via MCP_USE_WASM=false) until wasmtime replacement complete.
Security Architecture
The sandbox implements defense-in-depth security with multiple layers:
1. Input Validation
- Code length limits (100KB max)
- Malicious pattern detection
- Syntax validation
2. Process Isolation
- Separate Node.js process per execution
- Restricted global access
- No require/import capabilities (by default)
3. Resource Limits
- Configurable timeout (default: 5 seconds)
- Memory limits (default: 128MB)
- CPU usage constraints (default: 50%)
4. Access Controls
- File System: Denied by default, whitelist approach when enabled
- Network: Denied by default, no external connections
- Subprocesses: Denied, no command execution
5. Pattern Detection
Automatically blocks:
require('fs'),require('http'),require('https')require('child_process'),exec(),spawn()eval(),new Function()while(true),for(;;)infinite loopsfetch(),WebSocket,XMLHttpRequest
Usage
Basic Example
use ;
use json;
async
Sandbox Configurations
Restrictive (Recommended for Untrusted Code)
let config = restrictive;
// - 3 second timeout
// - 64MB memory limit
// - 30% CPU limit
// - No network, no filesystem, no subprocesses
Default (Balanced)
let config = default;
// - 5 second timeout
// - 128MB memory limit
// - 50% CPU limit
// - No network, no filesystem, no subprocesses
Permissive (For Trusted Code)
let config = permissive;
// - 10 second timeout
// - 256MB memory limit
// - 80% CPU limit
// - Filesystem access to whitelisted paths
Custom Configuration
let config = SandboxConfig ;
Available Tools
The MCP server provides 22 tools organized into categories:
Episode Lifecycle Management (NEW in v0.1.13)
Programmatically manage episodes through the MCP interface:
create_episode- Start tracking a new task with metadataadd_episode_step- Log execution steps to track progresscomplete_episode- Finalize episode and trigger learning cycleget_episode- Retrieve complete episode detailsget_episode_timeline- Visualize chronological task progressiondelete_episode- Remove episodes permanently (with safeguards)
📖 Complete Episode Lifecycle Documentation
Batch Operations Contract Status
The MCP JSON-RPC endpoint supports batch/execute (multi-operation transport).
However, tool-level batch analytics names are currently deferred and not advertised:
batch_query_episodesbatch_pattern_analysisbatch_compare_episodes
These names intentionally return Tool not found until dedicated handlers are implemented.
Memory & Query Tools
query_memory- Query episodic memory for relevant past experiencesquery_semantic_memory- Semantic search using embeddingsbulk_episodes- Retrieve multiple episodes efficiently
Code Execution
execute_agent_code- Execute TypeScript/JavaScript in secure WASM sandbox
Pattern Analysis
analyze_patterns- Analyze patterns from past episodesadvanced_pattern_analysis- Deep pattern analysis with statistical methodssearch_patterns- Search for specific patternsrecommend_patterns- Get pattern recommendations for tasks
Embeddings & Configuration
configure_embeddings- Configure embedding providers (OpenAI, Ollama, local)test_embeddings- Test embedding generation
Monitoring & Health
health_check- Server health and statusget_metrics- Performance metrics and statisticsquality_metrics- Episode quality assessment
Quick Reference
1. query_memory
2. execute_agent_code
3. analyze_patterns
Security Testing
The crate includes comprehensive security tests:
# Run all tests
# Run only security tests
# Run integration tests
Security Test Coverage
- File system access blocking (12 tests)
- Network access blocking (4 tests)
- Process execution blocking (3 tests)
- Infinite loop detection (2 tests)
- Code injection blocking (2 tests)
- Resource exhaustion (2 tests)
- Path traversal attacks (3 tests)
- Legitimate code execution (4 tests)
Execution Results
The sandbox returns detailed execution results:
Performance
- Average execution time: ~50-200ms for simple code
- Timeout overhead: <10ms
- Memory footprint: ~5MB per execution
- Concurrent executions: Supported via async runtime
Limitations
- Node.js Required: The sandbox requires Node.js to be installed
- Pattern-Based Detection: Some obfuscated attacks may bypass detection
- Resource Monitoring: CPU/memory limits are advisory, not enforced
- Async Timeout: Async code may run slightly beyond timeout
Best Practices
For Untrusted Code
// Use restrictive config
let config = restrictive;
let server = new.await?;
// Always check result type
match server.execute_agent_code.await?
For Trusted Code
// Use permissive config with specific whitelist
let mut config = permissive;
config.allowed_paths = vec!;
config.allowed_network = vec!;
let server = new.await?;
Error Handling
use ;
let result = server.execute_agent_code.await?;
match result
Contributing
When adding new features:
- Security First: Always consider security implications
- Test Coverage: Add tests for both success and failure cases
- Documentation: Update README and inline docs
- Performance: Profile code execution paths
License
MIT License - See LICENSE file for details