Codemem
A standalone Rust memory engine for AI coding assistants. Single binary, zero runtime deps.
Codemem stores what your AI assistant discovers -- files read, symbols searched, edits made -- so repositories don't need re-exploring across sessions. It supports cross-repo structural relationships for monorepo intelligence and wires into any MCP-compatible tool.
Quick Start
1. Install
Or download a prebuilt binary from GitHub Releases (macOS, Linux -- x86_64 and ARM64).
2. Initialize
This downloads the local embedding model (~440MB, one-time), registers 4 lifecycle hooks, and configures the MCP server (33 tools) for your AI assistant.
3. Done
Codemem now automatically:
- Injects context at the start of each session (recent sessions, key decisions, file hotspots)
- Captures what your AI reads, searches, and edits via PostToolUse hooks
- Generates summaries when conversations end
- Provides 33 MCP tools for active recall, graph traversal, and code search
Optional: remote embeddings
By default, Codemem runs a local BERT model -- no API key needed. To use a remote provider instead:
# Ollama (local server)
# OpenAI
# Any OpenAI-compatible API (Voyage AI, Together, Azure, etc.)
Optional: observation compression
Compress raw tool observations into concise summaries via LLM before storage:
# or openai, anthropic
See Observation Compression for full configuration.
How It Works
Codemem does three things:
- Passively captures what your AI assistant reads, searches, and edits via lifecycle hooks
- Actively recalls relevant context via MCP tools when your assistant needs it
- Injects context at session start so your assistant picks up where it left off
graph LR
A[AI Assistant] -->|SessionStart hook| B[codemem context]
A -->|PostToolUse hooks| C[codemem ingest]
A -->|Stop hook| E[codemem summarize]
A -->|MCP tools| D[codemem serve]
B -->|Inject context| A
C --> F[Storage + Vector + Graph]
D --> F
F -->|Recall| A
At a Glance
| Metric | Value |
|---|---|
| Codebase | ~23,500 LOC Rust across 12 crates |
| Tests | 352 (all passing) |
| MCP Tools | 33 |
| CLI Commands | 15 |
| Lifecycle Hooks | 4 (SessionStart, UserPromptSubmit, PostToolUse, Stop) |
| Language Extractors | 6 (Rust, TypeScript, Python, Go, C/C++, Java) |
| Embeddings | 768-dim, pluggable via CODEMEM_EMBED_PROVIDER: Candle (local, default), Ollama, OpenAI-compatible |
| Graph Algorithms | 25 (PageRank, Louvain, betweenness centrality, BFS/DFS, etc.) |
| Memory Types | 7 |
| Relationship Types | 23 |
| Consolidation Cycles | 4 (Decay, Creative/REM, Cluster, Forget) |
| Scoring | BM25 token scoring + 9-component hybrid |
| Observation Compression | Pluggable: Ollama (local), OpenAI, Anthropic |
| File Watching | Real-time via notify (<50ms debounce) |
| Session Tracking | Cross-session continuity with pattern detection |
| Storage | System-wide at ~/.codemem/ with namespace scoping |
Embeddings are contextual -- metadata and graph context are enriched before embedding for higher recall quality.
Hybrid Scoring
Recall uses a 9-component hybrid scoring formula with BM25 for token overlap:
| Component | Weight |
|---|---|
| Vector similarity | 25% |
| Graph strength (PageRank + betweenness + degree + cluster) | 25% |
| BM25 token overlap | 15% |
| Temporal | 10% |
| Tags | 10% |
| Importance | 5% |
| Confidence | 5% |
| Recency | 5% |
Lifecycle Hooks
Codemem registers 4 hooks during codemem init for full session lifecycle coverage:
| Hook | Command | What It Does |
|---|---|---|
| SessionStart | codemem context |
Queries recent sessions, key memories, file hotspots, and detected patterns; injects compact context via additionalContext |
| UserPromptSubmit | codemem prompt |
Stores user prompts as Context memories for session tracking |
| PostToolUse | codemem ingest |
Captures Read/Grep/Glob/Edit/Write observations with optional LLM compression |
| Stop | codemem summarize |
Builds structured session summary (files read/edited, decisions, searches) and stores as Insight memory |
Context injection at SessionStart uses progressive disclosure -- compact markdown tables with enough to orient the AI, with full details available via MCP tools.
Observation Compression
Raw tool observations can be compressed into concise structural summaries via an LLM before storage. This improves both memory density and embedding quality.
Raw: "File read: src/main.rs\n\nuse clap::{Parser, Subcommand};\nuse codemem_core..." (2000 chars)
->
Compressed: "CLI entry point with 15 clap subcommands dispatching to handler functions.
Key commands: init (model download + hook registration), serve (MCP JSON-RPC),
ingest (PostToolUse capture with optional compression). Uses codemem-storage
for SQLite persistence and codemem-embeddings for local BERT inference." (300 chars)
Configure via environment variables:
| Variable | Values | Default |
|---|---|---|
CODEMEM_COMPRESS_PROVIDER |
ollama, openai, anthropic |
disabled |
CODEMEM_COMPRESS_MODEL |
any model name | llama3.2 / gpt-4o-mini / claude-haiku-4-5-20251001 |
CODEMEM_COMPRESS_URL |
base URL override | provider default |
CODEMEM_API_KEY |
API key | also reads OPENAI_API_KEY / ANTHROPIC_API_KEY |
Compression is disabled by default -- no external dependencies unless you opt in. On failure, falls back to raw content silently.
Embedding Providers
Embeddings power vector search, contextual recall, and code search. Codemem selects the provider at runtime via environment variables, defaulting to local Candle inference (no API key needed).
| Provider | Model Default | Use Case |
|---|---|---|
candle (default) |
BAAI/bge-base-en-v1.5 (768-dim) | Fully offline, zero deps, Metal/CUDA GPU |
ollama |
nomic-embed-text (768-dim) | Local server, swap models freely |
openai |
text-embedding-3-small (768-dim) | OpenAI, Voyage AI, Together, Azure, any compatible API |
Configure via environment variables:
| Variable | Values | Default |
|---|---|---|
CODEMEM_EMBED_PROVIDER |
candle, ollama, openai |
candle |
CODEMEM_EMBED_MODEL |
any model name | provider default |
CODEMEM_EMBED_URL |
base URL override | provider default |
CODEMEM_EMBED_API_KEY |
API key | also reads OPENAI_API_KEY |
CODEMEM_EMBED_DIMENSIONS |
integer | 768 |
All providers include an LRU cache (10K entries). Embeddings are contextual -- metadata and graph context are enriched before embedding for higher recall quality.
Memory Types
| Type | Source | Example |
|---|---|---|
| Decision | Explicit choices captured during coding | "Chose serde over manual JSON parsing for config" |
| Pattern | Recurring structures detected across sessions | "All service modules follow init/run/shutdown lifecycle" |
| Preference | User or project-level conventions | "Prefer thiserror over anyhow in library crates" |
| Style | Code style and formatting observations | "Project uses 4-space indentation, trailing commas" |
| Habit | Repeated workflows and tool usage patterns | "Always runs cargo clippy before committing" |
| Insight | Inferred understanding from exploration | "The graph module is the most interconnected component" |
| Context | Ambient project and environment context | "Workspace root is a Cargo workspace with 12 crates" |
MCP Tools
33 tools organized by category. See MCP Tools Reference for full parameters and usage.
| Category | Tools |
|---|---|
| Core Memory (8) | store_memory, recall_memory, update_memory, delete_memory, associate_memories, graph_traverse, codemem_stats, codemem_health |
| Structural Index (10) | index_codebase, search_symbols, get_symbol_info, get_dependencies, get_impact, get_clusters, get_cross_repo, get_pagerank, search_code, set_scoring_weights |
| Export/Import (2) | export_memories, import_memories |
| Graph-Expanded Recall & Namespace (4) | recall_with_expansion, list_namespaces, namespace_stats, delete_namespace |
| Consolidation (5) | consolidate_decay, consolidate_creative, consolidate_cluster, consolidate_forget, consolidation_status |
| Impact & Patterns (4) | recall_with_impact, get_decision_chain, detect_patterns, pattern_insights |
CLI Commands
| Command | Description |
|---|---|
init |
Initialize project -- downloads model, registers hooks and MCP |
search |
Search memories by query |
stats |
Show database and index statistics |
serve |
Start the MCP server (JSON-RPC over stdio) |
ingest |
Ingest PostToolUse hook data with optional compression |
consolidate |
Run memory consolidation cycles |
viz |
Visualize memory graph |
index |
Index codebase for structural analysis |
export |
Export memories to file |
import |
Import memories from file |
watch |
Watch directory for file changes and re-index in real-time |
sessions |
Manage memory sessions (list, start, end) |
context |
SessionStart hook -- inject prior context into new sessions |
prompt |
UserPromptSubmit hook -- record user prompts |
summarize |
Stop hook -- generate and store session summary |
Performance Targets
| Operation | Target |
|---|---|
| HNSW search k=10 (100K vectors) | < 2ms |
| Embedding (single sentence) | < 50ms |
| Embedding (cache hit) | < 0.01ms |
| Graph BFS depth=2 | < 1ms |
| Hook ingest (Read) | < 200ms |
| Warm recall | < 10ms |
| Namespace-filtered search | < 10ms |
Documentation
- Architecture -- System design, data flow, Mermaid diagrams
- MCP Tools Reference -- All 33 tools with parameters
- CLI Reference -- All 15 commands
- Comparison -- vs claude-mem, AgentDB, AutoMem, and more
Project Status
Active development. All phases complete:
- Foundation/MVP -- core types, storage, vector index, CLI
- Embeddings + MCP -- candle embeddings, 33 MCP tools
- Hooks + Graph -- passive capture via PostToolUse hooks, graph engine
- Cross-repo -- structural indexing, language extractors, namespace scoping
- Consolidation -- decay, creative/REM, cluster, forget cycles
- Intelligence -- BM25 scoring, impact-aware recall, cross-session pattern detection, pluggable embeddings, real-time file watching, session continuity, diff-aware memory
- Lifecycle Hooks -- SessionStart context injection, UserPromptSubmit capture, Stop session summaries, observation compression
Research and Inspirations
Codemem builds on ideas from several research papers, blog posts, and open-source projects.
Papers
| Paper | Authors | Key Contribution |
|---|---|---|
| HippoRAG (NeurIPS 2024) | Gutierrez et al. | Neurobiologically-inspired long-term memory using LLMs + knowledge graphs + Personalized PageRank. Up to 20% improvement on multi-hop QA. |
| From RAG to Memory (HippoRAG 2, ICML 2025) | Gutierrez et al. | Non-parametric continual learning for LLMs. 7% improvement in associative memory tasks. |
| A-MEM (2025) | Xu et al. | Zettelkasten-inspired agentic memory with dynamic indexing, linking, and memory evolution. |
| MemGPT (ICLR 2024) | Packer et al. | OS-inspired hierarchical memory tiers for LLMs -- self-editing memory via function calls. |
| MELODI (Google DeepMind, 2024) | Chen et al. | Hierarchical short-term + long-term memory compression. 8x memory footprint reduction. |
| ReadAgent (Google DeepMind, 2024) | Lee et al. | Human-inspired reading agent with episodic gist memories for 20x context extension. |
| LoCoMo (ACL 2024) | Maharana et al. | Benchmark for evaluating very long-term conversational memory (300-turn, 9K-token conversations). |
| Mem0 (2025) | Mem0 team | Production-ready AI agents with scalable long-term memory. 26% accuracy improvement over OpenAI Memory. |
| Zep (2025) | Rasmussen | Temporal knowledge graph architecture for agent memory with bi-temporal data model. |
| Memory in the Age of AI Agents (Survey, 2024) | Hu, Liu et al. | Comprehensive taxonomy of agent memory: factual, experiential, working memory. |
| AriGraph (2024) | Episodic + semantic memory in knowledge graphs for LLM agent exploration. |
Blog Posts and Techniques
- Contextual Retrieval (Anthropic, 2024) -- Prepending chunk-specific context before embedding reduces failed retrievals by 49%. Codemem adapts this as template-based contextual enrichment using metadata + graph relationships.
- Contextual Embeddings Cookbook (Anthropic) -- Implementation guide for contextual embeddings with prompt caching.
Open-Source Projects
- AutoMem -- Graph-vector hybrid memory achieving 90.53% on LoCoMo. Direct inspiration for Codemem's hybrid scoring and consolidation cycles.
- claude-mem -- Persistent memory compression via Claude Agent SDK. Inspired Codemem's lifecycle hooks and observation compression architecture.
- Mem0 -- Production memory layer for AI (47K+ stars). Informed Codemem's memory type design.
- Zep/Graphiti -- Temporal knowledge graph engine. Inspired Codemem's graph persistence model.
- Letta (MemGPT) -- Stateful AI agents with self-editing memory.
- Cognee -- Knowledge graph memory via triplet extraction.
- claude-context -- AST-aware code search via MCP (by Zilliz).
See docs/comparison.md for detailed feature comparisons.
License
Apache 2.0