Durable memory for AI agents — temporal knowledge graph, hybrid retrieval, SQLite-backed.
java-ai-memory.dev · Main repo · JamJet docs · Discord
Engram is a durable memory layer for AI agents. It extracts facts from conversations, stores them in a temporal knowledge graph, and retrieves them with hybrid semantic + keyword search — all backed by a single SQLite file.
It ships in two shapes:
jamjet-engram— a Rust library you embed in your own application.jamjet-engram-server(this crate) — a standalone binary that speaks MCP over stdio and REST over HTTP, so Claude Desktop, Cursor, and any HTTP client can use it with no code.
Engram is provider-agnostic. Five LLM backends are wired in out of the box, and a sixth — command — lets you shell out to any external script, so you can plug in a provider Engram does not ship natively without touching Rust code:
ENGRAM_LLM_PROVIDER= |
What it does |
|---|---|
ollama (default) |
Local Ollama via /api/chat. Free, no API keys, runs on your laptop. |
openai-compatible |
Any endpoint that speaks OpenAI's chat-completions protocol — see the long list below. |
anthropic |
Anthropic Claude via the Messages API. |
google |
Google Gemini via generateContent with native JSON mode. |
command |
Shell out to a user-supplied script. Infinite extensibility, zero recompile. |
mock |
Deterministic tests-only backend — returns empty facts. |
Pick one with ENGRAM_LLM_PROVIDER=… — the same binary handles all of them.
State of the project, April 2026. Engram is new — v0.3.2, small community, no public LongMemEval / DMR numbers yet. The architecture below works, the tests pass, the Docker image runs. If you need production-scale memory today, Mem0 Cloud and Zep Cloud are more mature. If you need a tryable, self-hostable, single-binary memory layer that doesn't require Python, Postgres, Qdrant, or Neo4j, Engram is built for you.
Why Engram?
| Problem | Engram's answer |
|---|---|
| Every agent memory library is Python-first | Rust core with native Python, Java, and MCP clients — no sidecar required |
| Needs Postgres + Qdrant + Neo4j just to try | Single SQLite file, zero infra |
| Conversation history is not knowledge memory | Fact extraction pipeline — pulls structured facts out of messages |
| Old facts drift and contradict each other | Conflict detection + consolidation — decay, promote, dedup, summarize, reflect |
| Memory recall is either semantic OR keyword | Hybrid retrieval — vector search + SQLite FTS5 in one query |
| Agents lose memory across processes | Durable by default — one SQLite file, crash-safe, portable |
| MCP support is an afterthought | MCP-native — 7 tools exposed by a single binary |
| No time-travel over what the agent knew | Temporal knowledge graph — every fact is scoped and timestamped |
| Can't isolate memory per user or tenant | First-class scopes — org / user / session built into every query |
Quickstart — 30 seconds
Engram speaks to four LLM providers out of the box: Ollama (default — free, local), OpenAI, Anthropic Claude, and Google Gemini. Pick one.
Option A — Ollama (free, local, zero API keys)
Requirements: Ollama with llama3.2 and nomic-embed-text pulled:
Docker (easiest):
The container defaults to Ollama at host.docker.internal:11434 — works out of the box on Docker Desktop for Mac and Windows. Linux users need --add-host=host.docker.internal:host-gateway.
Option B — Anthropic Claude (hosted, highest quality)
Defaults to claude-haiku-4-5-20251001. Override with -e ENGRAM_ANTHROPIC_MODEL=claude-sonnet-4-6.
Note: Anthropic has no native JSON mode, so Engram parses JSON from text responses. The
llm_util::extract_json_payloadhelper strips markdown fences that Claude occasionally emits.
Option C — Google Gemini
Defaults to gemini-flash-latest. Override with -e ENGRAM_GOOGLE_MODEL=gemini-2.5-flash.
Option D — Any OpenAI-compatible endpoint
Defaults to OpenAI itself (https://api.openai.com/v1, gpt-4o-mini). Change one env var to point at any of these providers without recompiling:
| Provider | ENGRAM_OPENAI_BASE_URL= |
Note |
|---|---|---|
| OpenAI | https://api.openai.com/v1 |
default |
| Azure OpenAI | https://<resource>.openai.azure.com/openai/deployments/<deployment> |
|
| Groq | https://api.groq.com/openai/v1 |
very fast inference |
| Together.ai | https://api.together.xyz/v1 |
|
| Mistral | https://api.mistral.ai/v1 |
|
| DeepSeek | https://api.deepseek.com/v1 |
|
| Perplexity | https://api.perplexity.ai |
|
| OpenRouter | https://openrouter.ai/api/v1 |
one key, many models |
| Fireworks | https://api.fireworks.ai/inference/v1 |
|
| vLLM (self-hosted) | http://your-host:8000/v1 |
|
| LM Studio | http://localhost:1234/v1 |
local desktop app |
| LocalAI | http://localhost:8080/v1 |
self-hosted |
Ollama /v1 compat layer |
http://localhost:11434/v1 |
alternate path for Ollama |
| Your corporate LLM gateway | whatever URL your infra team exposes |
openai is accepted as a backwards-compatible alias for openai-compatible.
Option E — command (shell out to anything)
For providers Engram does not ship natively — an internal model behind a custom RPC, a raw SOAP endpoint, a local inference binary, a quick wrapper over a Python SDK, or tomorrow's new provider — use the command backend. Engram will spawn your script per extraction call, pipe a JSON request to its stdin, and read a JSON response from its stdout.
The contract. Your script reads one JSON object from stdin:
Then writes one JSON value to stdout. Either a bare value:
Or an envelope (useful for surfacing errors):
Exit 0 on success, non-zero (with stderr) on failure.
Example Python wrapper (~15 lines):
#!/usr/bin/env python3
# my-llm.py — wraps any Python SDK for Engram.
# your SDK
=
=
Point Engram at it:
Timeout is 120 seconds by default — override with ENGRAM_LLM_COMMAND_TIMEOUT=30.
Security: the
commandprovider runs arbitrary commands as the Engram process user. Never use it in a multi-tenant deployment where untrusted users can setENGRAM_LLM_COMMAND. It is a local and single-tenant feature.
Option F — cargo install (all providers same binary)
ENGRAM_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=...
ENGRAM_LLM_PROVIDER=command ENGRAM_LLM_COMMAND="python /path/to/wrapper.py"
Option G — REST mode for testing
No server. No config. No Python sidecar. One binary.
MCP client configuration
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json)
Cursor / any MCP-aware IDE
Point it at the same docker run command, or at a locally-installed engram serve binary. After restart, seven memory_* tools are available to the model.
The seven MCP tools
| Tool | What it does |
|---|---|
memory_add |
Extract and store facts from conversation messages (calls the LLM for extraction) |
memory_recall |
Semantic search over stored facts, scoped by user_id / org_id |
memory_context |
Assemble a token-budgeted context block for an LLM prompt, with tier-aware selection |
memory_search |
Keyword search over facts (SQLite FTS5) |
memory_forget |
Soft-delete a fact by ID with an optional reason |
memory_stats |
Aggregate counts: total facts, valid facts, entities, relationships |
memory_consolidate |
Run a consolidation cycle — decay stale facts, promote high-value ones, dedup near-duplicates |
All scoped by (org_id, user_id, session_id) — org is the coarsest, session the finest.
REST API
Nine endpoints, rooted at /v1/memory. Full OpenAPI-style surface:
| Method | Path | Handler |
|---|---|---|
| GET | /health |
Liveness probe |
| POST | /v1/memory |
Add messages (fact extraction) |
| GET | /v1/memory/recall?q=…&user_id=… |
Semantic recall |
| POST | /v1/memory/context |
Token-budgeted context assembly |
| GET | /v1/memory/search?q=…&user_id=… |
Keyword search |
| GET | /v1/memory/stats |
Aggregate statistics |
| POST | /v1/memory/consolidate |
Trigger consolidation |
| DELETE | /v1/memory/facts/:id |
Forget a fact |
| DELETE | /v1/memory/users/:id |
GDPR user-data delete |
Configuration
All settings flow through CLI flags or environment variables. Env vars are the recommended way to configure the Docker image.
Core
| CLI flag | Env var | Default | Notes |
|---|---|---|---|
--db |
ENGRAM_DB_PATH |
engram.db |
SQLite file path |
--mode |
ENGRAM_MODE |
mcp |
mcp (stdio) or rest (HTTP) |
--port |
ENGRAM_PORT |
9090 |
HTTP port in REST mode |
--llm-provider |
ENGRAM_LLM_PROVIDER |
ollama |
ollama, openai-compatible (alias openai), anthropic, google, command, mock |
--embedding-provider |
ENGRAM_EMBEDDING_PROVIDER |
ollama |
ollama or mock (cloud embedding providers are planned) |
--embedding-model |
ENGRAM_EMBEDDING_MODEL |
nomic-embed-text |
Ollama embedding model |
--embedding-dims |
ENGRAM_EMBEDDING_DIMS |
768 |
Must match the embedding model's output |
Ollama
| CLI flag | Env var | Default |
|---|---|---|
--ollama-url |
ENGRAM_OLLAMA_URL |
http://localhost:11434 (Docker image: http://host.docker.internal:11434) |
--ollama-llm-model |
ENGRAM_OLLAMA_LLM_MODEL |
llama3.2 |
OpenAI-compatible (OpenAI, Groq, Together, Mistral, DeepSeek, Azure, vLLM, …)
| CLI flag | Env var | Default |
|---|---|---|
--openai-api-key |
OPENAI_API_KEY |
(required when provider is openai-compatible) |
--openai-base-url |
ENGRAM_OPENAI_BASE_URL |
https://api.openai.com/v1 — change this to switch provider, see the big table above |
--openai-model |
ENGRAM_OPENAI_MODEL |
gpt-4o-mini |
Anthropic
| CLI flag | Env var | Default |
|---|---|---|
--anthropic-api-key |
ANTHROPIC_API_KEY |
(required when provider is anthropic) |
--anthropic-base-url |
ENGRAM_ANTHROPIC_BASE_URL |
https://api.anthropic.com |
--anthropic-model |
ENGRAM_ANTHROPIC_MODEL |
claude-haiku-4-5-20251001 |
Google Gemini
| CLI flag | Env var | Default |
|---|---|---|
--google-api-key |
GOOGLE_API_KEY |
(required when provider is google) |
--google-base-url |
ENGRAM_GOOGLE_BASE_URL |
https://generativelanguage.googleapis.com/v1beta |
--google-model |
ENGRAM_GOOGLE_MODEL |
gemini-flash-latest |
Command (shell-out)
| CLI flag | Env var | Default |
|---|---|---|
--llm-command |
ENGRAM_LLM_COMMAND |
(required when provider is command) — run via sh -c |
--llm-command-timeout |
ENGRAM_LLM_COMMAND_TIMEOUT |
120 — seconds before the child is killed |
The
mockbackend returns empty facts and deterministic byte-cycled vectors. It exists for tests and CI. The server prints a clear warning on startup if it detects mock mode in use. API keys and command contents are never echoed to stdout or logs — the server only describesprovider model at base_urlorcommand \<first 60 chars>``.
Embedding the library
If you don't want a separate process, depend on jamjet-engram directly:
[]
= "0.3.2"
use ;
let embedding = Boxnew;
let memory = open.await?;
let scope = user;
let messages = vec!;
let llm = Boxnew;
let fact_ids = memory
.add_messages
.await?;
// Later, recall
let facts = memory.recall.await?;
Other languages
Engram is part of the JamJet ecosystem. These clients speak to engram-server over REST:
| Language | Package | Install |
|---|---|---|
| Python | jamjet (includes EngramClient) |
pip install jamjet |
| Java | dev.jamjet:jamjet-sdk (includes EngramClient) |
Maven Central |
| Spring Boot | dev.jamjet:engram-spring-boot-starter |
Maven Central — zero-config @Bean EngramMemory |
| Rust | jamjet-engram (embed directly) |
cargo add jamjet-engram |
See java-ai-memory.dev for how Engram compares to LangChain4j ChatMemory, Spring AI ChatMemory, Koog, Embabel, Google ADK Memory Bank, the Mem0 community wrapper, and Zep — with honest notes on where Engram fits and where the alternatives are more mature.
Architecture
┌──────────────────────────────────────────────────────────┐
│ Clients: Claude Desktop, Cursor, JamJet, │
│ Python SDK, Java SDK, Spring Boot, cURL │
├─────────────────┬────────────────────────────────────────┤
│ MCP (stdio) │ REST (HTTP + JSON) │
├─────────────────┴────────────────────────────────────────┤
│ engram-server │
│ (jamjet-engram-server) │
├──────────────────────────────────────────────────────────┤
│ engram (lib) │
│ Extraction pipeline │ Hybrid retrieval │ Scopes │
│ Conflict detection │ Consolidation │ Context │
├──────────────────────────────────────────────────────────┤
│ LlmClient trait │ EmbeddingProvider trait │
│ Ollama · Mock │ Ollama · Mock │
├──────────────────────────────────────────────────────────┤
│ Storage │
│ SQLite (facts, entities, FTS5, vectors) │
└──────────────────────────────────────────────────────────┘
What Engram is not
- Not a chat-history store. If all you need is the last N messages of a conversation, use LangChain4j
ChatMemory, Spring AIChatMemory, or any framework's built-in window. - Not a state checkpointer. If you need to snapshot agent execution state for resume and replay, that's what LangGraph, Koog persistence, or JamJet's own durable runtime does. Pair them with Engram — they solve different problems.
- Not a managed service. No hosted plane, no auth layer beyond scopes, no SLA. Bring your own process manager.
- Not benchmarked yet. LongMemEval and DMR numbers are on the roadmap. Until they exist, treat comparative claims with the skepticism they deserve.
Roadmap
| Status | Item |
|---|---|
| ✅ | Fact extraction pipeline + SQLite storage |
| ✅ | Hybrid retrieval (vector + FTS5 with prefix matching) |
| ✅ | Consolidation engine (decay, promote, dedup, summarize, reflect) |
| ✅ | MCP stdio server (7 tools) |
| ✅ | REST API (9 endpoints) |
| ✅ | Multi-provider LLM backends: Ollama, OpenAI-compatible (OpenAI/Azure/Groq/Together/Mistral/vLLM/LM Studio/…), Anthropic, Google, command shell-out |
| ✅ | Python, Java, Spring Boot clients |
| ✅ | Docker image, MCP Registry publish |
| 🔄 | Postgres backend (in parallel to SQLite) |
| 🔄 | Spring AI ChatMemoryRepository implementation |
| 📋 | LongMemEval + DMR benchmark scores |
| 📋 | Quarkus extension |
| 📋 | Cloud embedding providers (OpenAI text-embedding-3, Google text-embedding-004) |
License
Apache 2.0 — see LICENSE.