rust-memex
rust-memex is a custom Rust MCP kernel providing RAG and long-term memory capabilities to AI agents via LanceDB.
It exposes two explicit transport modes from a single canonical surface:
stdio(Standard MCP): Native MCP integration for local agents (e.g., Claude Desktop).HTTP/SSE(Multi-Agent Daemon): A central daemon mode allowing concurrent AI agents to access the same memory pool over the network, resolving LanceDB's exclusive lock constraints.
Binary Name:
rust-memexis the only supported binary name. The GitHub installer also createsrust_memexas a legacy compatibility symlink for older scripts.MCP Contract: The current MCP surface is intentionally tools-only.
initializeadvertisestools, whileresources/*is not implemented yet.
Release Surface
- Quick install:
curl -LsSf https://raw.githubusercontent.com/Loctree/rust-memex/main/install.sh | sh - Prebuilt binary bundles: GitHub Releases uploaded from locally built and signed artifacts
- Release runbook: docs/RELEASE.md
- Configuration guide: docs/02_configuration.md
- HTTP/SSE reference: docs/HTTP_API.md
- Static launch page source:
docs/index.htmlpublished by.github/workflows/pages.yml
Quick Start
# Install from the latest GitHub Release
|
# Start the MCP server
# Open the local dashboard in your browser
# Or run the multi-agent HTTP/SSE daemon
Overview
As an MCP (Model Context Protocol) server, rust-memex provides:
- RAG (Retrieval-Augmented Generation) - document indexing and semantic search
- Hybrid Search - BM25 keyword + semantic vector search (Tantivy-based)
- Vector Memory - semantic storage and retrieval of text chunks
- Namespace Isolation - data isolation in namespaces
- Security - token-based access control for protected namespaces
- Onion Slice Architecture - hierarchical embeddings (OUTER→MIDDLE→INNER→CORE)
- Preprocessing - automatic noise filtering from conversation exports (~36-40% reduction)
- Exact-Match Deduplication - SHA256-based dedup for overlapping exports
Architecture
┌─────────────────────────────────────────────────────────────┐
│ rust-memex │
├─────────────────────────────────────────────────────────────┤
│ MCP Server (JSON-RPC over stdio) │
│ ├── handlers/mod.rs - Request routing & validation │
│ ├── security/mod.rs - Namespace access control │
│ └── rag/mod.rs - RAG pipeline │
├─────────────────────────────────────────────────────────────┤
│ Storage Layer │
│ ├── LanceDB - Vector embeddings │
│ ├── Tantivy - BM25 keyword index │
│ └── moka - In-memory cache │
├─────────────────────────────────────────────────────────────┤
│ Embeddings (External Providers) │
│ ├── Ollama - Local models (recommended) │
│ ├── MLX Bridge - Apple Silicon acceleration │
│ └── OpenAI-compatible - Any compatible endpoint │
└─────────────────────────────────────────────────────────────┘
Features
RAG Tools
| Tool | Description |
|---|---|
rag_index |
Index document from file |
rag_index_text |
Index raw text |
rag_search |
Search documents semantically (supports auto_route) |
Memory Tools
| Tool | Description |
|---|---|
memory_upsert |
Add/update chunk in namespace |
memory_get |
Get chunk by ID |
memory_search |
Search semantically in namespace (supports auto_route) |
memory_delete |
Delete chunk |
memory_purge_namespace |
Delete all chunks in namespace |
dive |
Deep exploration with all onion layers (outer/middle/inner/core) |
Security Tools
| Tool | Description |
|---|---|
namespace_create_token |
Create access token for namespace |
namespace_revoke_token |
Revoke token (namespace becomes public) |
namespace_list_protected |
List protected namespaces |
namespace_security_status |
Security system status |
Library Usage
rust-memex can be used as a library in your Rust applications. It provides a high-level MemexEngine API for vector storage operations.
Add to Cargo.toml
# Full library with CLI
= "0.5"
# Library only (no CLI dependencies)
= { = "0.5", = false }
Basic Usage
use ;
use json;
async
Vista Integration
For Vista PIMS, use the optimized constructor:
use MemexEngine;
// Vista-optimized: 1024 dims, qwen3-embedding:0.6b model
let engine = for_vista.await?;
// Store visit notes
engine.store.await?;
Batch Operations
use ;
use json;
let engine = for_app.await?;
let items = vec!;
let result = engine.store_batch.await?;
println!;
GDPR-Compliant Deletion
use ;
let engine = for_app.await?;
// Delete all documents for a specific patient
let filter = for_patient;
let deleted = engine.delete_by_filter.await?;
println!;
Hybrid Search (BM25 + Vector)
use ;
let engine = for_app.await?;
// Hybrid search with BM25 + vector fusion (recommended)
let results = engine.search_hybrid.await?;
for r in &results
// Explicit mode selection
let results = engine.search_with_mode.await?;
let results = engine.search_with_mode.await?;
let results = engine.search_with_mode.await?;
MCP Contract and In-Process Helpers
Use tool_definitions() when you need the exact MCP tool metadata exposed by
the stdio and HTTP/SSE transports. The helper functions below are for in-process
Rust callers; they are a local convenience layer, not a second public MCP
contract. Their names intentionally differ from MCP tool names so the two
surfaces do not drift together by accident.
use ;
use json;
let engine = for_app.await?;
// Canonical MCP tool surface exposed by rust-memex transports
let tools = tool_definitions;
assert!;
// Local Rust helpers for in-process callers
let result = store_document.await?;
assert!;
let results = search_documents.await?;
Feature Flags
| Feature | Description | Default |
|---|---|---|
cli |
CLI binary, TUI wizard, progress bars | Yes |
provider-cascade |
Ollama/OpenAI-compatible embeddings | Yes |
# Build library only (no CLI)
# Build with CLI
Configuration Guide
Complete guide for integrating rust-memex as a library in any Rust project.
Prerequisites
Ollama (recommended) or any OpenAI-compatible embedding API:
# Install Ollama
|
# Pull an embedding model (choose based on your needs)
# Verify it's running
Environment Variables
Configure via .env or environment:
# =============================================================================
# EMBEDDING PROVIDER CONFIGURATION
# =============================================================================
# Ollama (default, recommended)
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=qwen3-embedding:0.6b
EMBEDDING_DIMENSION=1024
# Database storage (auto-created)
MEMEX_DB_PATH=/.rmcp-servers/myapp/lancedb
# Optional: BM25 keyword search index
MEMEX_BM25_PATH=/.rmcp-servers/myapp/bm25
# =============================================================================
# ADVANCED: Multiple providers (fallback cascade)
# =============================================================================
# Remote embedding server fallback
# DRAGON_BASE_URL=http://your-server.local
# DRAGON_EMBEDDER_PORT=12345
# MLX embedder for Apple Silicon
# EMBEDDER_PORT=12300
# MLX_MAX_BATCH_CHARS=32000
# MLX_MAX_BATCH_ITEMS=16
# DISABLE_MLX=1 # Set to disable MLX fallback
Quick Start (Auto-config)
use MemexEngine;
use json;
// Auto-configures from defaults + environment
let engine = for_app.await?;
engine.store.await?;
let results = engine.search.await?;
Custom Configuration
use ;
use ;
// Read from your app's environment
let ollama_url = var
.unwrap_or_else;
let model = var
.unwrap_or_else;
let dimension: usize = var
.unwrap_or_else
.parse
.unwrap_or;
let db_path = var
.unwrap_or_else;
let config = MemexConfig ;
let engine = new.await?;
Provider Cascade (Multiple Fallbacks)
Configure multiple providers - library tries them in priority order:
use ;
let config = EmbeddingConfig ;
Namespace Strategy
Recommended: One namespace per application, use metadata for filtering:
// ✅ CORRECT: Single namespace, filter by user_id/entity_id in metadata
let engine = for_app.await?;
// Store with entity IDs in metadata
engine.store.await?;
// Search within user context
let filter = default.with_custom;
let results = engine.search_filtered.await?;
// GDPR deletion: remove all user data
let deleted = engine.delete_by_filter.await?;
Embedding Models Reference
| Model | Dimensions | Size | Use Case |
|---|---|---|---|
qwen3-embedding:0.6b |
1024 | ~600MB | Fast, good quality (recommended) |
qwen3-embedding:8b |
4096 | ~4GB | Best quality, slower |
nomic-embed-text |
768 | ~274MB | Lightweight, fast |
mxbai-embed-large |
1024 | ~670MB | Good multilingual |
all-minilm |
384 | ~46MB | Very fast, lower quality |
Troubleshooting
Error: "No embedding providers available"
# Check if Ollama is running
# Start Ollama
# Pull model if missing
Error: "Dimension mismatch"
- LanceDB dimension is fixed per table after creation
- Use different
db_pathfor different dimensions - Delete old database to change dimensions
Error: "Connection refused"
# Linux
# macOS
# Or run manually
Performance tuning:
# Larger batches (requires more VRAM)
MLX_MAX_BATCH_CHARS=64000
MLX_MAX_BATCH_ITEMS=32
Quick Start
Installation
Quick install (recommended):
|
Prebuilt GitHub Release bundles are the canonical install path and avoid compiling the full LanceDB-heavy Rust dependency graph on the target machine.
From source (development only):
Running
# Canonical MCP surface
# With security enabled
# With HTTP/SSE server for multi-agent access
# HTTP-only daemon mode (no MCP stdio)
rust-memex always exposes one canonical MCP tool surface. To narrow runtime
access, use --allowed-paths, HTTP auth, or namespace security rather than a
separate "memory/full" mode switch.
HTTP/SSE Server (Multi-Agent Access)
LanceDB uses exclusive file locks - only one process can access the database at a time. The HTTP/SSE server solves this by providing a central access point for multiple agents.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ rust-memex daemon │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ MCP Server │ │ HTTP/SSE │ │
│ │ (stdio) │ │ (port 8997) │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └──────────┬───────────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ RAGPipeline │ ← Single lock holder │
│ └──────┬──────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ LanceDB │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲ ▲
│ │
Claude Desktop HTTP Agents
(MCP stdio) (curl, fetch)
HTTP Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check (status, db_path, embedding_provider) |
/search |
POST | Search with optional project, layer, and deep filters (k alias supported); response includes collapsed clusters and duplicate_count |
/sse/search |
GET | SSE streaming search with optional project, layer, and deep filters |
/api/context-pack |
POST | Build a markdown context pack from a query or explicit chunk IDs, with grouped evidence and rebuilt indexed source chunks |
/upsert |
POST | Add/update document |
/index |
POST | Full pipeline indexing with onion slices |
/expand/{ns}/{id} |
GET | Expand onion slice (get children) |
/parent/{ns}/{id} |
GET | Drill up to parent slice |
/get/{ns}/{id} |
GET | Get document by ID |
/delete/{ns}/{id} |
POST | Delete document |
/ns/{namespace} |
DELETE | Purge entire namespace |
Diagnostic & lifecycle endpoints (require Bearer auth_token):
| Endpoint | Method | Description |
|---|---|---|
/api/audit |
GET | Per-namespace quality audit (chunk completeness, hash coverage, score) |
/api/stats / /api/stats/{ns} |
GET | Database / namespace statistics |
/api/timeline |
GET | Indexing timeline aggregates |
/api/purge-quality |
POST | Purge low-quality chunks under a threshold (gated by approval key) |
/api/dedup |
POST | Run post-index deduplication (group-by, keep, dry_run body fields) |
/api/backfill-hashes |
POST | Spec P0 backfill: populate per-chunk content_hash + source_hash for pre-v4 namespaces |
MCP-over-SSE Endpoints (Claude Code compatibility)
| Endpoint | Method | Description |
|---|---|---|
/sse/ |
GET | SSE stream - sends endpoint event with messages URL |
/messages/ |
POST | JSON-RPC messages with ?session_id=xxx |
Configure in ~/.claude.json:
Usage Examples
# Open the local dashboard
# Start daemon
&
# Health check
# Store document
# Search
# SSE streaming search
Multi-Host Database Paths
For setups with multiple machines (e.g., dragon, mgbook16), use per-host database paths:
# Per-host paths (each machine gets own database)
# Or use the wizard for machine-agnostic configuration
The TUI wizard auto-detects hostname and offers:
- Shared mode:
~/.ai-memories/lancedb(same path everywhere) - Per-host mode:
~/.ai-memories/lancedb.dragon,~/.ai-memories/lancedb.mgbook16, etc.
Configuration (TOML)
# ~/.rmcp-servers/rust-memex/config.toml
= "~/.rmcp-servers/rust-memex/lancedb"
= 4096
= "info"
# Whitelist of allowed paths
= [
"~",
"/Volumes/ExternalDrive/data"
]
# Security
= true
= "~/.rmcp-servers/rust-memex/tokens.json"
# Optional: dashboard-only OIDC for browser users.
# API / SSE / MCP still stay Bearer-authenticated via auth_token.
= "replace-me"
[]
= "https://issuer.example"
= "rust-memex-dashboard"
= "optional-confidential-client-secret"
= "https://memex.example.com"
= ["openid", "profile", "email"]
Documentation
- 01_security.md - Security system (namespace tokens)
- 02_configuration.md - Configuration and CLI options
Onion Slice Architecture
Instead of traditional flat chunking, rust-memex offers hierarchical "onion slices":
┌─────────────────────────────────────────┐
│ OUTER (~100 chars) │ ← Minimum context, maximum navigation
│ Keywords + ultra-compression │
├─────────────────────────────────────────┤
│ MIDDLE (~300 chars) │ ← Key sentences + context
├─────────────────────────────────────────┤
│ INNER (~600 chars) │ ← Expanded content
├─────────────────────────────────────────┤
│ CORE (full text) │ ← Complete document
└─────────────────────────────────────────┘
Philosophy: "Minimum info → Maximum navigation paths"
QueryRouter & Auto-Route
Intelligent query intent detection for automatic search mode selection:
# Auto-detect query intent and select optimal mode
# Output: Query intent: temporal (confidence: 0.70)
# Selects: hybrid mode with date boosting
# Structural queries suggest loctree
# Output: Query intent: structural (confidence: 0.80)
# Consider: loctree query --kind who-imports --target main.rs
# Deep exploration with all onion layers
Intent Types:
| Intent | Trigger Keywords | Recommended Mode |
|---|---|---|
| Temporal | when, date, yesterday, ago, 2024 | Hybrid (date boost) |
| Structural | import, depends, module, who uses | BM25 + loctree suggestion |
| Semantic | similar, related, explain | Vector |
| Exact | "quoted strings" | BM25 |
| Hybrid | (default) | Vector + BM25 fusion |
CLI Commands
# Index with onion slicing (default)
# Index with progress bar and ETA
# Index with flat chunking (backward compatible)
# Search in namespace
# Search only in specific layer
# Drill down in hierarchy (expand children)
# Get chunk by ID
# RAG search (cross-namespace)
# List namespaces with stats
# Export namespace to JSON
Preprocessing (Noise Filtering)
Automatic removal of ~36-40% noise from conversation exports:
- MCP tool artifacts (
<function_calls>,<invoke>, etc.) - CLI output (git status, cargo build, npm install)
- Metadata (UUIDs, timestamps → placeholders)
- Empty/boilerplate content
# Index with preprocessing
Deduplication & Hash Hygiene
Two layers of dedup work together:
1. Pre-index source dedup (during index)
The pipeline computes sha256(file_text) and skips files whose source_hash
already exists in the namespace (with a fallback to content_hash for pre-v4
namespaces). The skip line is logged at info! so it shows up in the default
operator run log:
Skip duplicate source: /path/to/file.md (source_hash 8ee43c1e7393b432)
# Dedup enabled (default)
# Disable dedup for this run
# Spec P4 escape hatch: force re-index a known-duplicate source
2. Post-index dedup CLI (standalone command)
# Default grouping: source-hash + layer (preserves onion structure,
# removes only true source repeats while keeping outer/middle/inner/core)
# Collapse all layers per source (legacy aggressive grouping)
# Per-chunk content_hash grouping (legacy pre-v4 behavior)
# Cross-namespace dedup pool
# Keep newest duplicates instead of oldest
3. Hash backfill (spec P0) — fills content_hash (per-chunk) and
source_hash (per-source) for namespaces indexed before v4. Without backfill,
dedup reports "Without hash: N (cannot deduplicate)" and is blind to legacy
chunks:
# Dry-run backfill across all namespaces
# Backfill one namespace, then commit
# Machine-readable JSON for scripts / CI
Output with statistics:
Indexing complete:
New chunks: 234
Files indexed: 67
Skipped (duplicate): 33
Deduplication: enabled
LLM-Synthesized Outer Layer (Spec P3)
The default outer layer is a TF-based keyword extract (--outer-synthesis keyword). For transcript-heavy namespaces where keyword splat is noise (CLI
animation gerunds, structural markdown, file-path tokens), the outer layer can
be replaced with a 1-3 sentence summary from a local Ollama model:
# Wire the outer layer through Ollama (requires --pipeline mode)
Failure modes (network, non-2xx, malformed JSON, empty completion) silently
fall back to the keyword outer so the pipeline never stalls. Reachable only
through --pipeline — passing --outer-synthesis llm without --pipeline is
rejected up-front by clap.
Code Structure
rust-memex/
├── src/
│ ├── lib.rs # Public API & ServerConfig
│ ├── bin/
│ │ └── rust-memex.rs # CLI binary (serve, index, search, get, expand, etc.)
│ ├── handlers/
│ │ └── mod.rs # MCP request handlers
│ ├── security/
│ │ └── mod.rs # Namespace access control
│ ├── rag/
│ │ └── mod.rs # RAG pipeline + OnionSlice architecture
│ ├── preprocessing/
│ │ └── mod.rs # Noise filtering for conversation exports
│ ├── storage/
│ │ └── mod.rs # LanceDB + Tantivy (schema v4: source_hash + per-chunk content_hash)
│ ├── embeddings/
│ │ └── mod.rs # MLX/FastEmbed bridge
│ └── tui/
│ └── mod.rs # Configuration wizard
└── Cargo.toml
Claude/MCP Integration
Add to ~/.claude.json:
Vibecrafted with AI Agents by Loctree (c)2025 The LibraxisAI Team Co-Authored-By: Maciej & Klaudiusz