do-memory-core 0.1.26

Core episodic learning system for AI agents with pattern extraction, reward scoring, and dual storage backend
Documentation
# Bulk Episode Operations API

## Overview

This document describes the new bulk episode operations API added to `do-memory-core` to enable efficient episode retrieval by ID.

## Motivation

**Problem**: Prior to this enhancement, users could only:
- List all episodes with filters using `list_episodes()` or `list_episodes_filtered()`
- Access `get_episode()` internally but it wasn't exposed in the public query API

This meant:
- ❌ No direct way to fetch a single episode by ID from public API
- ❌ Inefficient to fetch multiple specific episodes (requires N separate calls or filtering all episodes)
- ❌ CLI and MCP tools had to use workarounds

**Solution**: Expose existing `get_episode()` and add new `get_episodes_by_ids()` for bulk retrieval.

## API Reference

### `get_episode(episode_id: Uuid) -> Result<Episode>`

Retrieve a single episode by its unique identifier.

**Already existed** in `memory/episode.rs` but now properly exposed through the public API.

**Lazy Loading Pattern**: 
1. Checks in-memory cache first (fastest)
2. Falls back to redb cache storage
3. Falls back to Turso durable storage
4. Populates higher-level caches on cache miss

**Returns**: 
- `Ok(Episode)` if found
- `Err(Error::NotFound)` if episode doesn't exist

**Example**:
```rust
use memory_core::SelfLearningMemory;
use uuid::Uuid;

let memory = SelfLearningMemory::new();
let episode_id = Uuid::parse_str("...")?;

match memory.get_episode(episode_id).await {
    Ok(episode) => println!("Found: {}", episode.task_description),
    Err(e) => eprintln!("Not found: {}", e),
}
```

### `get_episodes_by_ids(episode_ids: &[Uuid]) -> Result<Vec<Episode>>`

Retrieve multiple episodes by their IDs in a single operation.

**New API** added in `memory/queries.rs` and exposed in `memory/query_api.rs`.

**Efficiency Benefits**:
- ✅ Single lock acquisition for in-memory cache
- ✅ Batched storage queries (reduces round-trips)
- ✅ Reduced lock contention vs. individual calls
- ✅ Early return optimizations at each storage layer

**Behavior**:
- Returns only episodes that exist
- Missing episodes are silently omitted (no error)
- Empty input returns empty vector
- Maintains cache consistency across all storage layers

**Example**:
```rust
use memory_core::SelfLearningMemory;
use uuid::Uuid;

let memory = SelfLearningMemory::new();
let ids = vec![id1, id2, id3, id4];

// Returns only found episodes (e.g., might return 3 out of 4)
let episodes = memory.get_episodes_by_ids(&ids).await?;
println!("Found {} out of {} episodes", episodes.len(), ids.len());
```

## Implementation Details

### File Structure

**Modified Files**:
- `do-memory-core/src/memory/query_api.rs` - Public API method `get_episodes_by_ids()`
- `do-memory-core/src/memory/queries.rs` - Internal implementation `get_episodes_by_ids()`

**Note**: `get_episode()` was already implemented in `memory/episode.rs`, so we only added the bulk operation.

### Storage Backend Integration

The bulk retrieval function works with all storage backends:

```rust
pub async fn get_episodes_by_ids(
    episode_ids: &[Uuid],
    episodes_fallback: &RwLock<HashMap<Uuid, Arc<Episode>>>,
    cache_storage: Option<&Arc<dyn StorageBackend>>,
    turso_storage: Option<&Arc<dyn StorageBackend>>,
) -> Result<Vec<Episode>>
```

**Algorithm**:
1. **In-memory lookup** (single lock, batch check)
2. **Cache storage lookup** (redb) for missing episodes
3. **Durable storage lookup** (Turso) for remaining missing episodes
4. **Cache backfill** - populate in-memory cache with newly found episodes

### Performance Characteristics

| Operation | Time Complexity | Space Complexity |
|-----------|----------------|------------------|
| `get_episode()` | O(1) amortized | O(1) |
| `get_episodes_by_ids(&[id; N])` | O(N) amortized | O(N) |

**Benchmark Results** (from tests):
- Single episode: ~1-5 µs (in-memory cache hit)
- Bulk 50 episodes: < 100ms (first call, cold cache)
- Bulk 50 episodes: < 1ms (subsequent calls, warm cache)

## Testing

### Test Coverage

**Integration Tests** (`do-memory-core/tests/bulk_episode_retrieval_test.rs`):

1. `test_get_episode_single_retrieval` - Basic single episode fetch
2.`test_get_episode_not_found` - Error handling for missing episodes
3.`test_get_episodes_by_ids_bulk_retrieval` - Bulk fetch all existing
4.`test_get_episodes_by_ids_partial_found` - Mixed existing/missing IDs
5.`test_get_episodes_by_ids_empty_input` - Edge case: empty input
6.`test_get_episodes_by_ids_all_missing` - All non-existent IDs
7.`test_get_episode_caching_behavior` - Cache consistency
8.`test_bulk_retrieval_performance` - Performance validation
9.`test_get_episode_with_steps` - Data integrity with steps
10.`test_bulk_retrieval_preserves_episode_data` - Context preservation

### Running Tests

```bash
# Run all bulk operation tests
cargo test --test bulk_episode_retrieval_test

# Run specific test
cargo test test_get_episodes_by_ids_bulk_retrieval

# Run with logging
RUST_LOG=debug cargo test --test bulk_episode_retrieval_test
```

## Examples

### Complete Example

See `do-memory-core/examples/bulk_episode_operations.rs` for a full working example.

```bash
cargo run --example bulk_episode_operations
```

**Output**:
```
=== Bulk Episode Operations Demo ===

Creating 5 test episodes...
  Created episode 1: abc123...
  ...

--- Single Episode Retrieval ---
✓ Retrieved episode: abc123...
  Task: Task 1: Implement async feature
  Steps: 1

--- Bulk Episode Retrieval ---
✓ Retrieved 5 episodes

--- Performance Comparison ---
Individual lookups (5 calls): 245µs
Bulk lookup (1 call):          52µs
✓ Bulk lookup is 4.71x faster!
```

## Usage in CLI

The CLI can now use these APIs for more efficient episode operations:

```rust
// Before (inefficient)
let all_episodes = memory.get_all_episodes().await?;
let episode = all_episodes.iter()
    .find(|e| e.episode_id == target_id)
    .ok_or(Error::NotFound)?;

// After (efficient)
let episode = memory.get_episode(target_id).await?;
```

## Usage in MCP Server

The MCP server can now implement tools that fetch specific episodes efficiently:

```json
{
  "name": "get_episode",
  "description": "Get episode by ID",
  "parameters": {
    "episode_id": "uuid"
  }
}
```

```json
{
  "name": "get_related_episodes", 
  "description": "Get multiple related episodes",
  "parameters": {
    "episode_ids": ["uuid"]
  }
}
```

## Migration Guide

### For Existing Code

If you were using workarounds to fetch specific episodes:

**Old Pattern**:
```rust
// Inefficient: fetch all then filter
let all = memory.get_all_episodes().await?;
let target = all.iter().find(|e| e.episode_id == id).cloned();
```

**New Pattern**:
```rust
// Efficient: direct lookup
let target = memory.get_episode(id).await?;
```

### For Multiple Episodes

**Old Pattern**:
```rust
// N separate calls
let mut episodes = Vec::new();
for id in ids {
    if let Ok(ep) = memory.get_episode(id).await {
        episodes.push(ep);
    }
}
```

**New Pattern**:
```rust
// Single bulk call
let episodes = memory.get_episodes_by_ids(&ids).await?;
```

## Impact Assessment

### Benefits

1. **Performance**: 4-5x faster for multiple lookups
2. **Developer Experience**: Simpler API for common use cases
3. **Resource Efficiency**: Fewer lock acquisitions, reduced memory allocations
4. **Scalability**: Enables efficient related-episode queries

### Backward Compatibility

✅ **Fully backward compatible**
- No breaking changes to existing APIs
- `get_episode()` was already used internally
- Only adds new functionality

### Code Size

- **Lines Added**: ~140 LOC (implementation + docs)
- **Lines in Tests**: ~280 LOC
- **Lines in Examples**: ~120 LOC
- **Total Impact**: ~540 LOC

All files comply with <500 LOC limit per file.

## Future Enhancements

Possible follow-up improvements:

1. **Batch Size Limits**: Add configuration for max bulk retrieval size
2. **Parallel Storage Queries**: Query redb and Turso in parallel for missing episodes
3. **Streaming API**: Return episodes as they're found (async iterator)
4. **Smart Prefetching**: Predict and prefetch related episodes
5. **GraphQL Integration**: Expose bulk operations through GraphQL resolvers

## Related Documentation

- `do-memory-core/EPISODE_MANAGEMENT.md` - Episode lifecycle overview
- `do-memory-core/EPISODE_FILTERING.md` - Advanced filtering capabilities
- `do-memory-core/README.md` - General memory system documentation
- `agent_docs/service_architecture.md` - System architecture

## Questions & Support

For questions about this feature:
1. See examples in `examples/bulk_episode_operations.rs`
2. Run tests with `cargo test --test bulk_episode_retrieval_test`
3. Check `do-memory-core/src/memory/query_api.rs` for API signatures

## Changelog

**Version**: Added in v0.1.13 (unreleased)

**Author**: Rovo Dev AI Agent

**Date**: 2026-01-19