rmcp-memex
RAG/Memory MCP Server with LanceDB vector storage for AI agents.
Overview
rmcp-memex is an MCP (Model Context Protocol) server providing:
- RAG (Retrieval-Augmented Generation) - document indexing and semantic search
- Hybrid Search - BM25 keyword + semantic vector search (Tantivy-based)
- Vector Memory - semantic storage and retrieval of text chunks
- Namespace Isolation - data isolation in namespaces
- Security - token-based access control for protected namespaces
- Onion Slice Architecture - hierarchical embeddings (OUTER→MIDDLE→INNER→CORE)
- Preprocessing - automatic noise filtering from conversation exports (~36-40% reduction)
- Exact-Match Deduplication - SHA256-based dedup for overlapping exports
Architecture
┌─────────────────────────────────────────────────────────────┐
│ rmcp-memex │
├─────────────────────────────────────────────────────────────┤
│ MCP Server (JSON-RPC over stdio) │
│ ├── handlers/mod.rs - Request routing & validation │
│ ├── security/mod.rs - Namespace access control │
│ └── rag/mod.rs - RAG pipeline │
├─────────────────────────────────────────────────────────────┤
│ Storage Layer │
│ ├── LanceDB - Vector embeddings │
│ ├── Tantivy - BM25 keyword index │
│ └── moka - In-memory cache │
├─────────────────────────────────────────────────────────────┤
│ Embeddings (External Providers) │
│ ├── Ollama - Local models (recommended) │
│ ├── MLX Bridge - Apple Silicon acceleration │
│ └── OpenAI-compatible - Any compatible endpoint │
└─────────────────────────────────────────────────────────────┘
Features
RAG Tools
| Tool | Description |
|---|---|
rag_index |
Index document from file |
rag_index_text |
Index raw text |
rag_search |
Search documents semantically (supports auto_route) |
Memory Tools
| Tool | Description |
|---|---|
memory_upsert |
Add/update chunk in namespace |
memory_get |
Get chunk by ID |
memory_search |
Search semantically in namespace (supports auto_route) |
memory_delete |
Delete chunk |
memory_purge_namespace |
Delete all chunks in namespace |
dive |
Deep exploration with all onion layers (outer/middle/inner/core) |
Security Tools
| Tool | Description |
|---|---|
namespace_create_token |
Create access token for namespace |
namespace_revoke_token |
Revoke token (namespace becomes public) |
namespace_list_protected |
List protected namespaces |
namespace_security_status |
Security system status |
Library Usage
rmcp-memex can be used as a library in your Rust applications. It provides a high-level MemexEngine API for vector storage operations.
Add to Cargo.toml
# Full library with CLI
= "0.3"
# Library only (no CLI dependencies)
= { = "0.3", = false }
Basic Usage
use ;
use json;
async
Vista Integration
For Vista PIMS, use the optimized constructor:
use MemexEngine;
// Vista-optimized: 1024 dims, qwen3-embedding:0.6b model
let engine = for_vista.await?;
// Store visit notes
engine.store.await?;
Batch Operations
use ;
use json;
let engine = for_app.await?;
let items = vec!;
let result = engine.store_batch.await?;
println!;
GDPR-Compliant Deletion
use ;
let engine = for_app.await?;
// Delete all documents for a specific patient
let filter = for_patient;
let deleted = engine.delete_by_filter.await?;
println!;
Hybrid Search (BM25 + Vector)
use ;
let engine = for_app.await?;
// Hybrid search with BM25 + vector fusion (recommended)
let results = engine.search_hybrid.await?;
for r in &results
// Explicit mode selection
let results = engine.search_with_mode.await?;
let results = engine.search_with_mode.await?;
let results = engine.search_with_mode.await?;
Agent Tools API
For MCP-compatible AI agents:
use ;
use json;
let engine = for_app.await?;
// Get tool definitions for MCP registration
let tools = tool_definitions;
for tool in &tools
// Use tool functions
let result = memory_store.await?;
assert!;
let results = memory_search.await?;
Feature Flags
| Feature | Description | Default |
|---|---|---|
cli |
CLI binary, TUI wizard, progress bars | Yes |
provider-cascade |
Ollama/OpenAI-compatible embeddings | Yes |
# Build library only (no CLI)
# Build with CLI
Configuration Guide
Complete guide for integrating rmcp-memex as a library in any Rust project.
Prerequisites
Ollama (recommended) or any OpenAI-compatible embedding API:
# Install Ollama
|
# Pull an embedding model (choose based on your needs)
# Verify it's running
Environment Variables
Configure via .env or environment:
# =============================================================================
# EMBEDDING PROVIDER CONFIGURATION
# =============================================================================
# Ollama (default, recommended)
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=qwen3-embedding:0.6b
EMBEDDING_DIMENSION=1024
# Database storage (auto-created)
MEMEX_DB_PATH=/.rmcp-servers/myapp/lancedb
# Optional: BM25 keyword search index
MEMEX_BM25_PATH=/.rmcp-servers/myapp/bm25
# =============================================================================
# ADVANCED: Multiple providers (fallback cascade)
# =============================================================================
# Remote embedding server fallback
# DRAGON_BASE_URL=http://your-server.local
# DRAGON_EMBEDDER_PORT=12345
# MLX embedder for Apple Silicon
# EMBEDDER_PORT=12300
# MLX_MAX_BATCH_CHARS=32000
# MLX_MAX_BATCH_ITEMS=16
# DISABLE_MLX=1 # Set to disable MLX fallback
Quick Start (Auto-config)
use MemexEngine;
use json;
// Auto-configures from defaults + environment
let engine = for_app.await?;
engine.store.await?;
let results = engine.search.await?;
Custom Configuration
use ;
use ;
// Read from your app's environment
let ollama_url = var
.unwrap_or_else;
let model = var
.unwrap_or_else;
let dimension: usize = var
.unwrap_or_else
.parse
.unwrap_or;
let db_path = var
.unwrap_or_else;
let config = MemexConfig ;
let engine = new.await?;
Provider Cascade (Multiple Fallbacks)
Configure multiple providers - library tries them in priority order:
use ;
let config = EmbeddingConfig ;
Namespace Strategy
Recommended: One namespace per application, use metadata for filtering:
// ✅ CORRECT: Single namespace, filter by user_id/entity_id in metadata
let engine = for_app.await?;
// Store with entity IDs in metadata
engine.store.await?;
// Search within user context
let filter = default.with_custom;
let results = engine.search_filtered.await?;
// GDPR deletion: remove all user data
let deleted = engine.delete_by_filter.await?;
Embedding Models Reference
| Model | Dimensions | Size | Use Case |
|---|---|---|---|
qwen3-embedding:0.6b |
1024 | ~600MB | Fast, good quality (recommended) |
qwen3-embedding:8b |
4096 | ~4GB | Best quality, slower |
nomic-embed-text |
768 | ~274MB | Lightweight, fast |
mxbai-embed-large |
1024 | ~670MB | Good multilingual |
all-minilm |
384 | ~46MB | Very fast, lower quality |
Troubleshooting
Error: "No embedding providers available"
# Check if Ollama is running
# Start Ollama
# Pull model if missing
Error: "Dimension mismatch"
- LanceDB dimension is fixed per table after creation
- Use different
db_pathfor different dimensions - Delete old database to change dimensions
Error: "Connection refused"
# Linux
# macOS
# Or run manually
Performance tuning:
# Larger batches (requires more VRAM)
MLX_MAX_BATCH_CHARS=64000
MLX_MAX_BATCH_ITEMS=32
Quick Start
Installation
Quick install (recommended):
|
From source:
Running
# Default mode (all features)
# Memory-only mode (no filesystem access)
# With security enabled
# With HTTP/SSE server for multi-agent access
# HTTP-only daemon mode (no MCP stdio)
HTTP/SSE Server (Multi-Agent Access)
LanceDB uses exclusive file locks - only one process can access the database at a time. The HTTP/SSE server solves this by providing a central access point for multiple agents.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ rmcp-memex daemon │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ MCP Server │ │ HTTP/SSE │ │
│ │ (stdio) │ │ (port 6660) │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └──────────┬───────────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ RAGPipeline │ ← Single lock holder │
│ └──────┬──────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ LanceDB │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲ ▲
│ │
Claude Desktop HTTP Agents
(MCP stdio) (curl, fetch)
HTTP Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check (status, db_path, embedding_provider) |
/search |
POST | Vector search with optional layer filter |
/sse/search |
GET | SSE streaming search (real-time results) |
/upsert |
POST | Add/update document |
/index |
POST | Full pipeline indexing with onion slices |
/expand/{ns}/{id} |
GET | Expand onion slice (get children) |
/parent/{ns}/{id} |
GET | Drill up to parent slice |
/get/{ns}/{id} |
GET | Get document by ID |
/delete/{ns}/{id} |
POST | Delete document |
/ns/{namespace} |
DELETE | Purge entire namespace |
MCP-over-SSE Endpoints (Claude Code compatibility)
| Endpoint | Method | Description |
|---|---|---|
/sse/ |
GET | SSE stream - sends endpoint event with messages URL |
/messages/ |
POST | JSON-RPC messages with ?session_id=xxx |
Configure in ~/.claude.json:
Usage Examples
# Start daemon
&
# Health check
# Store document
# Search
# SSE streaming search
Multi-Host Database Paths
For setups with multiple machines (e.g., dragon, mgbook16), use per-host database paths:
# Per-host paths (each machine gets own database)
# Or use the wizard for machine-agnostic configuration
The TUI wizard auto-detects hostname and offers:
- Shared mode:
~/.ai-memories/lancedb(same path everywhere) - Per-host mode:
~/.ai-memories/lancedb.dragon,~/.ai-memories/lancedb.mgbook16, etc.
Configuration (TOML)
# ~/.rmcp-servers/config/rmcp-memex.toml
= "full"
= "~/.rmcp-servers/rmcp-memex/lancedb"
= 4096
= "info"
# Whitelist of allowed paths
= [
"~",
"/Volumes/ExternalDrive/data"
]
# Security
= true
= "~/.rmcp-servers/rmcp-memex/tokens.json"
Documentation
- 01_security.md - Security system (namespace tokens)
- 02_configuration.md - Configuration and CLI options
Onion Slice Architecture
Instead of traditional flat chunking, rmcp-memex offers hierarchical "onion slices":
┌─────────────────────────────────────────┐
│ OUTER (~100 chars) │ ← Minimum context, maximum navigation
│ Keywords + ultra-compression │
├─────────────────────────────────────────┤
│ MIDDLE (~300 chars) │ ← Key sentences + context
├─────────────────────────────────────────┤
│ INNER (~600 chars) │ ← Expanded content
├─────────────────────────────────────────┤
│ CORE (full text) │ ← Complete document
└─────────────────────────────────────────┘
Philosophy: "Minimum info → Maximum navigation paths"
QueryRouter & Auto-Route
Intelligent query intent detection for automatic search mode selection:
# Auto-detect query intent and select optimal mode
# Output: Query intent: temporal (confidence: 0.70)
# Selects: hybrid mode with date boosting
# Structural queries suggest loctree
# Output: Query intent: structural (confidence: 0.80)
# Consider: loctree query --kind who-imports --target main.rs
# Deep exploration with all onion layers
Intent Types:
| Intent | Trigger Keywords | Recommended Mode |
|---|---|---|
| Temporal | when, date, yesterday, ago, 2024 | Hybrid (date boost) |
| Structural | import, depends, module, who uses | BM25 + loctree suggestion |
| Semantic | similar, related, explain | Vector |
| Exact | "quoted strings" | BM25 |
| Hybrid | (default) | Vector + BM25 fusion |
CLI Commands
# Index with onion slicing (default)
# Index with progress bar and ETA
# Index with flat chunking (backward compatible)
# Search in namespace
# Search only in specific layer
# Drill down in hierarchy (expand children)
# Get chunk by ID
# RAG search (cross-namespace)
# List namespaces with stats
# Export namespace to JSON
Preprocessing (Noise Filtering)
Automatic removal of ~36-40% noise from conversation exports:
- MCP tool artifacts (
<function_calls>,<invoke>, etc.) - CLI output (git status, cargo build, npm install)
- Metadata (UUIDs, timestamps → placeholders)
- Empty/boilerplate content
# Index with preprocessing
Exact-Match Deduplication
SHA256-based dedup for overlapping exports (e.g., quarterly exports containing 6 months of data):
# Dedup enabled (default)
# Disable dedup
Output with statistics:
Indexing complete:
New chunks: 234
Files indexed: 67
Skipped (duplicate): 33
Deduplication: enabled
Code Structure
rmcp-memex/
├── src/
│ ├── lib.rs # Public API & ServerConfig
│ ├── bin/
│ │ └── rmcp-memex.rs # CLI binary (serve, index, search, get, expand, etc.)
│ ├── handlers/
│ │ └── mod.rs # MCP request handlers
│ ├── security/
│ │ └── mod.rs # Namespace access control
│ ├── rag/
│ │ └── mod.rs # RAG pipeline + OnionSlice architecture
│ ├── preprocessing/
│ │ └── mod.rs # Noise filtering for conversation exports
│ ├── storage/
│ │ └── mod.rs # LanceDB + Tantivy (schema v3 with content_hash)
│ ├── embeddings/
│ │ └── mod.rs # MLX/FastEmbed bridge
│ └── tui/
│ └── mod.rs # Configuration wizard
└── Cargo.toml
Claude/MCP Integration
Add to ~/.claude.json:
Created by M&K (c)2025 The LibraxisAI Team Co-Authored-By: Maciej & Klaudiusz