do-memory-mcp 0.1.29

# Memory MCP Integration

MCP (Model Context Protocol) server integration for the self-learning memory system with secure code execution capabilities.

## Features

- **MCP Server**: Standard MCP protocol implementation with 19 tools
- **Episode Lifecycle Management**: Programmatic episode creation, tracking, and completion (NEW in v0.1.13)
- **Secure Code Sandbox**: WASM-based code execution with comprehensive security
- **Memory Integration**: Query episodic memory and analyze learned patterns
- **Pattern Analysis**: Advanced pattern extraction and recommendations
- **Embeddings Support**: Multiple providers (OpenAI, Ollama, local models)
- **Progressive Tool Disclosure**: Tools prioritized based on usage patterns
- **Execution Monitoring**: Detailed statistics and performance tracking

## Implementation Status

### Phase 2A: Wasmtime WASM Sandbox ✅ **COMPLETE**

**Status**: Production-ready POC eliminating rquickjs GC crashes

- ✅ wasmtime 24.0.5 integration
- ✅ Concurrent execution without SIGABRT crashes
- ✅ 100-parallel stress test passing
- ✅ Semaphore-based pooling (max 20 concurrent)
- ✅ Comprehensive metrics and health monitoring
- ✅ All tests passing (5/5)

**Key Achievement**: Zero GC crashes under high concurrency (100 parallel executions)

<!-- NOTE: Phase 2A documentation has been archived. See plans/archive/ for historical documents. -->

### Phase 2B: JavaScript Support via Javy (Next)

**Goal**: Enable JavaScript/TypeScript execution through Javy compiler

- ⏳ Javy v8.0.0 integration (JavaScript→WASM)
- ⏳ WASI preview1 (stdout/stderr capture)
- ⏳ Fuel-based timeout enforcement
- ⏳ Performance benchmarking vs baseline

> **Note:** The `javy` backend requires either a bundled `javy-plugin.wasm` plugin (set via `JAVY_PLUGIN`) or the `javy` CLI available on PATH. CI will attempt to install the CLI when running the `javy-backend` feature; if neither is present, Javy tests will be skipped gracefully.

### Phase 1: rquickjs Migration ✅ **COMPLETE**

**Problem Solved**: rquickjs v0.6.2 had critical GC race conditions causing SIGABRT crashes under concurrent test execution.

**Solution**: Disabled WASM sandbox in all tests (via `MCP_USE_WASM=false`) until wasmtime replacement complete.

## Security Architecture

The sandbox implements **defense-in-depth** security with multiple layers:

### 1. Input Validation
- Code length limits (100KB max)
- Malicious pattern detection
- Syntax validation

### 2. Process Isolation
- Separate Node.js process per execution
- Restricted global access
- No require/import capabilities (by default)

### 3. Resource Limits
- Configurable timeout (default: 5 seconds)
- Memory limits (default: 128MB)
- CPU usage constraints (default: 50%)

### 4. Access Controls
- **File System**: Denied by default, whitelist approach when enabled
- **Network**: Denied by default, no external connections
- **Subprocesses**: Denied, no command execution

### 5. Pattern Detection
Automatically blocks:
- `require('fs')`, `require('http')`, `require('https')`
- `require('child_process')`, `exec()`, `spawn()`
- `eval()`, `new Function()`
- `while(true)`, `for(;;)` infinite loops
- `fetch()`, `WebSocket`, `XMLHttpRequest`

## Usage

### Basic Example

```rust
use memory_mcp::{MemoryMCPServer, SandboxConfig, ExecutionContext};
use serde_json::json;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create server with restrictive sandbox
    let server = MemoryMCPServer::new(SandboxConfig::restrictive()).await?;

    // Execute code securely
    let code = r#"
        const result = {
            sum: 1 + 1,
            message: "Hello from sandbox"
        };
        console.log("Calculating sum...");
        return result;
    "#;

    let context = ExecutionContext::new(
        "Calculate sum".to_string(),
        json!({"a": 1, "b": 1}),
    );

    let result = server.execute_agent_code(code.to_string(), context).await?;
    println!("Result: {:?}", result);

    Ok(())
}
```

### Sandbox Configurations

#### Restrictive (Recommended for Untrusted Code)

```rust
let config = SandboxConfig::restrictive();
// - 3 second timeout
// - 64MB memory limit
// - 30% CPU limit
// - No network, no filesystem, no subprocesses
```

#### Default (Balanced)

```rust
let config = SandboxConfig::default();
// - 5 second timeout
// - 128MB memory limit
// - 50% CPU limit
// - No network, no filesystem, no subprocesses
```

#### Permissive (For Trusted Code)

```rust
let config = SandboxConfig::permissive();
// - 10 second timeout
// - 256MB memory limit
// - 80% CPU limit
// - Filesystem access to whitelisted paths
```

### Custom Configuration

```rust
let config = SandboxConfig {
    max_execution_time_ms: 3000,
    max_memory_mb: 64,
    max_cpu_percent: 30,
    allowed_paths: vec!["/tmp/safe-dir".to_string()],
    allowed_network: vec![],
    allow_network: false,
    allow_filesystem: false,
    allow_subprocesses: false,
};
```

## Available Tools

The MCP server provides **22 tools** organized into categories:

### Episode Lifecycle Management (NEW in v0.1.13)

Programmatically manage episodes through the MCP interface:

- **`create_episode`** - Start tracking a new task with metadata
- **`add_episode_step`** - Log execution steps to track progress
- **`complete_episode`** - Finalize episode and trigger learning cycle
- **`get_episode`** - Retrieve complete episode details
- **`get_episode_timeline`** - Visualize chronological task progression
- **`delete_episode`** - Remove episodes permanently (with safeguards)

📖 **[Complete Episode Lifecycle Documentation](EPISODE_LIFECYCLE_TOOLS.md)**

### Batch Operations Contract Status

The MCP JSON-RPC endpoint supports `batch/execute` (multi-operation transport).
However, tool-level batch analytics names are currently **deferred and not advertised**:

- `batch_query_episodes`
- `batch_pattern_analysis`
- `batch_compare_episodes`

These names intentionally return `Tool not found` until dedicated handlers are implemented.

📖 **[Batch Tool Status (WG-053)](BATCH_OPERATIONS_TOOLS.md)**

### Memory & Query Tools

- **`query_memory`** - Query episodic memory for relevant past experiences
- **`query_semantic_memory`** - Semantic search using embeddings
- **`bulk_episodes`** - Retrieve multiple episodes efficiently

### Code Execution

- **`execute_agent_code`** - Execute TypeScript/JavaScript in secure WASM sandbox

### Pattern Analysis

- **`analyze_patterns`** - Analyze patterns from past episodes
- **`advanced_pattern_analysis`** - Deep pattern analysis with statistical methods
- **`search_patterns`** - Search for specific patterns
- **`recommend_patterns`** - Get pattern recommendations for tasks

### Embeddings & Configuration

- **`configure_embeddings`** - Configure embedding providers (OpenAI, Ollama, local)
- **`test_embeddings`** - Test embedding generation

### Monitoring & Health

- **`health_check`** - Server health and status
- **`get_metrics`** - Performance metrics and statistics
- **`quality_metrics`** - Episode quality assessment

### Quick Reference

#### 1. `query_memory`

```json
{
  "query": "Search query describing task",
  "domain": "Task domain (e.g., 'web-api')",
  "task_type": "code_generation | debugging | refactoring | testing | analysis | documentation",
  "limit": 10
}
```

#### 2. `execute_agent_code`

```json
{
  "code": "TypeScript/JavaScript code to execute",
  "context": {
    "task": "Task description",
    "input": { "data": "as JSON" }
  }
}
```

#### 3. `analyze_patterns`

```json
{
  "task_type": "Type of task to analyze",
  "min_success_rate": 0.7,
  "limit": 20
}
```

## Security Testing

The crate includes comprehensive security tests:

```bash
# Run all tests
cargo test --package do-memory-mcp

# Run only security tests
cargo test --package do-memory-mcp --test security_test

# Run integration tests
cargo test --package do-memory-mcp --test integration_test
```

### Security Test Coverage

- File system access blocking (12 tests)
- Network access blocking (4 tests)
- Process execution blocking (3 tests)
- Infinite loop detection (2 tests)
- Code injection blocking (2 tests)
- Resource exhaustion (2 tests)
- Path traversal attacks (3 tests)
- Legitimate code execution (4 tests)

## Execution Results

The sandbox returns detailed execution results:

```rust
pub enum ExecutionResult {
    Success {
        output: String,
        stdout: String,
        stderr: String,
        execution_time_ms: u64,
    },
    Error {
        message: String,
        error_type: ErrorType,
        stdout: String,
        stderr: String,
    },
    Timeout {
        elapsed_ms: u64,
        partial_output: Option<String>,
    },
    SecurityViolation {
        reason: String,
        violation_type: SecurityViolationType,
    },
}
```

## Performance

- **Average execution time**: ~50-200ms for simple code
- **Timeout overhead**: <10ms
- **Memory footprint**: ~5MB per execution
- **Concurrent executions**: Supported via async runtime

## Limitations

1. **Node.js Required**: The sandbox requires Node.js to be installed
2. **Pattern-Based Detection**: Some obfuscated attacks may bypass detection
3. **Resource Monitoring**: CPU/memory limits are advisory, not enforced
4. **Async Timeout**: Async code may run slightly beyond timeout

## Best Practices

### For Untrusted Code

```rust
// Use restrictive config
let config = SandboxConfig::restrictive();
let server = MemoryMCPServer::new(config).await?;

// Always check result type
match server.execute_agent_code(code, context).await? {
    ExecutionResult::Success { .. } => { /* handle success */ },
    ExecutionResult::SecurityViolation { reason, .. } => {
        eprintln!("Security violation: {}", reason);
    },
    _ => { /* handle other cases */ }
}
```

### For Trusted Code

```rust
// Use permissive config with specific whitelist
let mut config = SandboxConfig::permissive();
config.allowed_paths = vec!["/app/data".to_string()];
config.allowed_network = vec!["api.example.com".to_string()];

let server = MemoryMCPServer::new(config).await?;
```

### Error Handling

```rust
use memory_mcp::{ExecutionResult, ErrorType};

let result = server.execute_agent_code(code, context).await?;

match result {
    ExecutionResult::Success { output, .. } => {
        println!("Success: {}", output);
    },
    ExecutionResult::Error { error_type: ErrorType::Syntax, message, .. } => {
        eprintln!("Syntax error: {}", message);
    },
    ExecutionResult::Error { error_type: ErrorType::Runtime, message, .. } => {
        eprintln!("Runtime error: {}", message);
    },
    ExecutionResult::Timeout { elapsed_ms, .. } => {
        eprintln!("Timeout after {}ms", elapsed_ms);
    },
    ExecutionResult::SecurityViolation { reason, violation_type, .. } => {
        eprintln!("Security violation ({:?}): {}", violation_type, reason);
    },
}
```

## Contributing

When adding new features:

1. **Security First**: Always consider security implications
2. **Test Coverage**: Add tests for both success and failure cases
3. **Documentation**: Update README and inline docs
4. **Performance**: Profile code execution paths

## License

MIT License - See LICENSE file for details