do-memory-mcp 0.1.31

Model Context Protocol (MCP) server for AI agents
Documentation

Memory MCP Integration

MCP (Model Context Protocol) server integration for the self-learning memory system with secure code execution capabilities.

Features

  • MCP Server: Standard MCP protocol implementation with 19 tools
  • Episode Lifecycle Management: Programmatic episode creation, tracking, and completion (NEW in v0.1.13)
  • Secure Code Sandbox: WASM-based code execution with comprehensive security
  • Memory Integration: Query episodic memory and analyze learned patterns
  • Pattern Analysis: Advanced pattern extraction and recommendations
  • Embeddings Support: Multiple providers (OpenAI, Ollama, local models)
  • Progressive Tool Disclosure: Tools prioritized based on usage patterns
  • Execution Monitoring: Detailed statistics and performance tracking

Implementation Status

Phase 2A: Wasmtime WASM Sandbox ✅ COMPLETE

Status: Production-ready POC eliminating rquickjs GC crashes

  • ✅ wasmtime 24.0.5 integration
  • ✅ Concurrent execution without SIGABRT crashes
  • ✅ 100-parallel stress test passing
  • ✅ Semaphore-based pooling (max 20 concurrent)
  • ✅ Comprehensive metrics and health monitoring
  • ✅ All tests passing (5/5)

Key Achievement: Zero GC crashes under high concurrency (100 parallel executions)

Phase 2B: JavaScript Support via Javy (Next)

Goal: Enable JavaScript/TypeScript execution through Javy compiler

  • ⏳ Javy v8.0.0 integration (JavaScript→WASM)
  • ⏳ WASI preview1 (stdout/stderr capture)
  • ⏳ Fuel-based timeout enforcement
  • ⏳ Performance benchmarking vs baseline

Note: The javy backend requires either a bundled javy-plugin.wasm plugin (set via JAVY_PLUGIN) or the javy CLI available on PATH. CI will attempt to install the CLI when running the javy-backend feature; if neither is present, Javy tests will be skipped gracefully.

Phase 1: rquickjs Migration ✅ COMPLETE

Problem Solved: rquickjs v0.6.2 had critical GC race conditions causing SIGABRT crashes under concurrent test execution.

Solution: Disabled WASM sandbox in all tests (via MCP_USE_WASM=false) until wasmtime replacement complete.

Security Architecture

The sandbox implements defense-in-depth security with multiple layers:

1. Input Validation

  • Code length limits (100KB max)
  • Malicious pattern detection
  • Syntax validation

2. Process Isolation

  • Separate Node.js process per execution
  • Restricted global access
  • No require/import capabilities (by default)

3. Resource Limits

  • Configurable timeout (default: 5 seconds)
  • Memory limits (default: 128MB)
  • CPU usage constraints (default: 50%)

4. Access Controls

  • File System: Denied by default, whitelist approach when enabled
  • Network: Denied by default, no external connections
  • Subprocesses: Denied, no command execution

5. Pattern Detection

Automatically blocks:

  • require('fs'), require('http'), require('https')
  • require('child_process'), exec(), spawn()
  • eval(), new Function()
  • while(true), for(;;) infinite loops
  • fetch(), WebSocket, XMLHttpRequest

Usage

Basic Example

use memory_mcp::{MemoryMCPServer, SandboxConfig, ExecutionContext};
use serde_json::json;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create server with restrictive sandbox
    let server = MemoryMCPServer::new(SandboxConfig::restrictive()).await?;

    // Execute code securely
    let code = r#"
        const result = {
            sum: 1 + 1,
            message: "Hello from sandbox"
        };
        console.log("Calculating sum...");
        return result;
    "#;

    let context = ExecutionContext::new(
        "Calculate sum".to_string(),
        json!({"a": 1, "b": 1}),
    );

    let result = server.execute_agent_code(code.to_string(), context).await?;
    println!("Result: {:?}", result);

    Ok(())
}

Sandbox Configurations

Restrictive (Recommended for Untrusted Code)

let config = SandboxConfig::restrictive();
// - 3 second timeout
// - 64MB memory limit
// - 30% CPU limit
// - No network, no filesystem, no subprocesses

Default (Balanced)

let config = SandboxConfig::default();
// - 5 second timeout
// - 128MB memory limit
// - 50% CPU limit
// - No network, no filesystem, no subprocesses

Permissive (For Trusted Code)

let config = SandboxConfig::permissive();
// - 10 second timeout
// - 256MB memory limit
// - 80% CPU limit
// - Filesystem access to whitelisted paths

Custom Configuration

let config = SandboxConfig {
    max_execution_time_ms: 3000,
    max_memory_mb: 64,
    max_cpu_percent: 30,
    allowed_paths: vec!["/tmp/safe-dir".to_string()],
    allowed_network: vec![],
    allow_network: false,
    allow_filesystem: false,
    allow_subprocesses: false,
};

Available Tools

The MCP server provides 22 tools organized into categories:

Episode Lifecycle Management (NEW in v0.1.13)

Programmatically manage episodes through the MCP interface:

  • create_episode - Start tracking a new task with metadata
  • add_episode_step - Log execution steps to track progress
  • complete_episode - Finalize episode and trigger learning cycle
  • get_episode - Retrieve complete episode details
  • get_episode_timeline - Visualize chronological task progression
  • delete_episode - Remove episodes permanently (with safeguards)

📖 Complete Episode Lifecycle Documentation

Batch Operations Contract Status

The MCP JSON-RPC endpoint supports batch/execute (multi-operation transport). However, tool-level batch analytics names are currently deferred and not advertised:

  • batch_query_episodes
  • batch_pattern_analysis
  • batch_compare_episodes

These names intentionally return Tool not found until dedicated handlers are implemented.

📖 Batch Tool Status (WG-053)

Memory & Query Tools

  • query_memory - Query episodic memory for relevant past experiences
  • query_semantic_memory - Semantic search using embeddings
  • bulk_episodes - Retrieve multiple episodes efficiently

Code Execution

  • execute_agent_code - Execute TypeScript/JavaScript in secure WASM sandbox

Pattern Analysis

  • analyze_patterns - Analyze patterns from past episodes
  • advanced_pattern_analysis - Deep pattern analysis with statistical methods
  • search_patterns - Search for specific patterns
  • recommend_patterns - Get pattern recommendations for tasks

Embeddings & Configuration

  • configure_embeddings - Configure embedding providers (OpenAI, Ollama, local)
  • test_embeddings - Test embedding generation

Monitoring & Health

  • health_check - Server health and status
  • get_metrics - Performance metrics and statistics
  • quality_metrics - Episode quality assessment

Quick Reference

1. query_memory

{
  "query": "Search query describing task",
  "domain": "Task domain (e.g., 'web-api')",
  "task_type": "code_generation | debugging | refactoring | testing | analysis | documentation",
  "limit": 10
}

2. execute_agent_code

{
  "code": "TypeScript/JavaScript code to execute",
  "context": {
    "task": "Task description",
    "input": { "data": "as JSON" }
  }
}

3. analyze_patterns

{
  "task_type": "Type of task to analyze",
  "min_success_rate": 0.7,
  "limit": 20
}

Security Testing

The crate includes comprehensive security tests:

# Run all tests
cargo test --package do-memory-mcp

# Run only security tests
cargo test --package do-memory-mcp --test security_test

# Run integration tests
cargo test --package do-memory-mcp --test integration_test

Security Test Coverage

  • File system access blocking (12 tests)
  • Network access blocking (4 tests)
  • Process execution blocking (3 tests)
  • Infinite loop detection (2 tests)
  • Code injection blocking (2 tests)
  • Resource exhaustion (2 tests)
  • Path traversal attacks (3 tests)
  • Legitimate code execution (4 tests)

Execution Results

The sandbox returns detailed execution results:

pub enum ExecutionResult {
    Success {
        output: String,
        stdout: String,
        stderr: String,
        execution_time_ms: u64,
    },
    Error {
        message: String,
        error_type: ErrorType,
        stdout: String,
        stderr: String,
    },
    Timeout {
        elapsed_ms: u64,
        partial_output: Option<String>,
    },
    SecurityViolation {
        reason: String,
        violation_type: SecurityViolationType,
    },
}

Performance

  • Average execution time: ~50-200ms for simple code
  • Timeout overhead: <10ms
  • Memory footprint: ~5MB per execution
  • Concurrent executions: Supported via async runtime

Limitations

  1. Node.js Required: The sandbox requires Node.js to be installed
  2. Pattern-Based Detection: Some obfuscated attacks may bypass detection
  3. Resource Monitoring: CPU/memory limits are advisory, not enforced
  4. Async Timeout: Async code may run slightly beyond timeout

Best Practices

For Untrusted Code

// Use restrictive config
let config = SandboxConfig::restrictive();
let server = MemoryMCPServer::new(config).await?;

// Always check result type
match server.execute_agent_code(code, context).await? {
    ExecutionResult::Success { .. } => { /* handle success */ },
    ExecutionResult::SecurityViolation { reason, .. } => {
        eprintln!("Security violation: {}", reason);
    },
    _ => { /* handle other cases */ }
}

For Trusted Code

// Use permissive config with specific whitelist
let mut config = SandboxConfig::permissive();
config.allowed_paths = vec!["/app/data".to_string()];
config.allowed_network = vec!["api.example.com".to_string()];

let server = MemoryMCPServer::new(config).await?;

Error Handling

use memory_mcp::{ExecutionResult, ErrorType};

let result = server.execute_agent_code(code, context).await?;

match result {
    ExecutionResult::Success { output, .. } => {
        println!("Success: {}", output);
    },
    ExecutionResult::Error { error_type: ErrorType::Syntax, message, .. } => {
        eprintln!("Syntax error: {}", message);
    },
    ExecutionResult::Error { error_type: ErrorType::Runtime, message, .. } => {
        eprintln!("Runtime error: {}", message);
    },
    ExecutionResult::Timeout { elapsed_ms, .. } => {
        eprintln!("Timeout after {}ms", elapsed_ms);
    },
    ExecutionResult::SecurityViolation { reason, violation_type, .. } => {
        eprintln!("Security violation ({:?}): {}", violation_type, reason);
    },
}

Contributing

When adding new features:

  1. Security First: Always consider security implications
  2. Test Coverage: Add tests for both success and failure cases
  3. Documentation: Update README and inline docs
  4. Performance: Profile code execution paths

License

MIT License - See LICENSE file for details