Durable memory for AI agents — temporal knowledge graph, hybrid retrieval, SQLite or PostgreSQL.
java-ai-memory.dev · Main repo · JamJet docs · Discord
Engram is a durable memory layer for AI agents. It extracts facts from conversations, stores them in a temporal knowledge graph, and retrieves them with hybrid semantic + keyword search — backed by a single SQLite file or a PostgreSQL database.
It ships in two shapes:
jamjet-engram— a Rust library you embed in your own application.jamjet-engram-server(this crate) — a standalone binary that speaks MCP over stdio and REST over HTTP, so Claude Desktop, Cursor, and any HTTP client can use it with no code.
Engram is provider-agnostic. Five LLM backends are wired in out of the box, and a sixth — command — lets you shell out to any external script, so you can plug in a provider Engram does not ship natively without touching Rust code:
ENGRAM_LLM_PROVIDER= |
What it does |
|---|---|
ollama (default) |
Local Ollama via /api/chat. Free, no API keys, runs on your laptop. |
openai-compatible |
Any endpoint that speaks OpenAI's chat-completions protocol — see the long list below. |
anthropic |
Anthropic Claude via the Messages API. |
google |
Google Gemini via generateContent with native JSON mode. |
command |
Shell out to a user-supplied script. Infinite extensibility, zero recompile. |
mock |
Deterministic tests-only backend — returns empty facts. |
Pick one with ENGRAM_LLM_PROVIDER=… — the same binary handles all of them.
State of the project, April 2026. Engram is new — v0.3.2, small community, no public LongMemEval / DMR numbers yet. The architecture below works, the tests pass, the Docker image runs. If you need production-scale memory today, Mem0 Cloud and Zep Cloud are more mature. If you need a tryable, self-hostable, single-binary memory layer that doesn't require Python, Postgres, Qdrant, or Neo4j, Engram is built for you.
Why Engram?
| Problem | Engram's answer |
|---|---|
| Every agent memory library is Python-first | Rust core with native Python, Java, and MCP clients — no sidecar required |
| Needs Postgres + Qdrant + Neo4j just to try | Single SQLite file (zero infra) or Postgres when you need it |
| Conversation history is not knowledge memory | Fact extraction pipeline — pulls structured facts out of messages |
| Old facts drift and contradict each other | Conflict detection + consolidation — decay, promote, dedup, summarize, reflect |
| Memory recall is either semantic OR keyword | Hybrid retrieval — vector search + SQLite FTS5 in one query |
| Agents lose memory across processes | Durable by default — SQLite or Postgres, crash-safe, portable |
| MCP support is an afterthought | MCP-native — 7 tools exposed by a single binary |
| No time-travel over what the agent knew | Temporal knowledge graph — every fact is scoped and timestamped |
| Can't isolate memory per user or tenant | First-class scopes — org / user / session built into every query |
Quickstart — 30 seconds
Engram speaks to four LLM providers out of the box: Ollama (default — free, local), OpenAI, Anthropic Claude, and Google Gemini. Pick one.
Option A — Ollama (free, local, zero API keys)
Requirements: Ollama with llama3.2 and nomic-embed-text pulled:
Docker (easiest):
The container defaults to Ollama at host.docker.internal:11434 — works out of the box on Docker Desktop for Mac and Windows. Linux users need --add-host=host.docker.internal:host-gateway.
Option B — Anthropic Claude (hosted, highest quality)
Defaults to claude-haiku-4-5-20251001. Override with -e ENGRAM_ANTHROPIC_MODEL=claude-sonnet-4-6.
Note: Anthropic has no native JSON mode, so Engram parses JSON from text responses. The
llm_util::extract_json_payloadhelper strips markdown fences that Claude occasionally emits.
Option C — Google Gemini
Defaults to gemini-flash-latest. Override with -e ENGRAM_GOOGLE_MODEL=gemini-2.5-flash.
Option D — Any OpenAI-compatible endpoint
Defaults to OpenAI itself (https://api.openai.com/v1, gpt-4o-mini). Change one env var to point at any of these providers without recompiling:
| Provider | ENGRAM_OPENAI_BASE_URL= |
Note |
|---|---|---|
| OpenAI | https://api.openai.com/v1 |
default |
| Azure OpenAI | https://<resource>.openai.azure.com/openai/deployments/<deployment> |
|
| Groq | https://api.groq.com/openai/v1 |
very fast inference |
| Together.ai | https://api.together.xyz/v1 |
|
| Mistral | https://api.mistral.ai/v1 |
|
| DeepSeek | https://api.deepseek.com/v1 |
|
| Perplexity | https://api.perplexity.ai |
|
| OpenRouter | https://openrouter.ai/api/v1 |
one key, many models |
| Fireworks | https://api.fireworks.ai/inference/v1 |
|
| vLLM (self-hosted) | http://your-host:8000/v1 |
|
| LM Studio | http://localhost:1234/v1 |
local desktop app |
| LocalAI | http://localhost:8080/v1 |
self-hosted |
Ollama /v1 compat layer |
http://localhost:11434/v1 |
alternate path for Ollama |
| Your corporate LLM gateway | whatever URL your infra team exposes |
openai is accepted as a backwards-compatible alias for openai-compatible.
Option E — command (shell out to anything)
For providers Engram does not ship natively — an internal model behind a custom RPC, a raw SOAP endpoint, a local inference binary, a quick wrapper over a Python SDK, or tomorrow's new provider — use the command backend. Engram will spawn your script per extraction call, pipe a JSON request to its stdin, and read a JSON response from its stdout.
The contract. Your script reads one JSON object from stdin:
Then writes one JSON value to stdout. Either a bare value:
Or an envelope (useful for surfacing errors):
Exit 0 on success, non-zero (with stderr) on failure.
Example Python wrapper (~15 lines):
#!/usr/bin/env python3
# my-llm.py — wraps any Python SDK for Engram.
# your SDK
=
=
Point Engram at it:
Timeout is 120 seconds by default — override with ENGRAM_LLM_COMMAND_TIMEOUT=30.
Security: the
commandprovider runs arbitrary commands as the Engram process user. Never use it in a multi-tenant deployment where untrusted users can setENGRAM_LLM_COMMAND. It is a local and single-tenant feature.
Option F — cargo install (all providers same binary)
ENGRAM_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=...
ENGRAM_LLM_PROVIDER=command ENGRAM_LLM_COMMAND="python /path/to/wrapper.py"
Option G — REST mode for testing
No server. No config. No Python sidecar. One binary.
MCP client configuration
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json)
Cursor / any MCP-aware IDE
Point it at the same docker run command, or at a locally-installed engram serve binary. After restart, eleven tools are available to the model (seven memory_* + four messages_*).
MCP tools
Memory tools (7)
| Tool | What it does |
|---|---|
memory_add |
Extract and store facts from conversation messages (calls the LLM for extraction) |
memory_recall |
Semantic search over stored facts, scoped by user_id / org_id |
memory_context |
Assemble a token-budgeted context block for an LLM prompt, with tier-aware selection |
memory_search |
Keyword search over facts (SQLite FTS5 / Postgres full-text) |
memory_forget |
Soft-delete a fact by ID with an optional reason |
memory_stats |
Aggregate counts: total facts, valid facts, entities, relationships |
memory_consolidate |
Run a consolidation cycle — decay stale facts, promote high-value ones, dedup near-duplicates |
All scoped by (org_id, user_id, session_id) — org is the coarsest, session the finest.
Message store tools (4)
| Tool | What it does |
|---|---|
messages_save |
Save chat messages for a conversation (optionally triggers fact extraction) |
messages_get |
Get all messages for a conversation by ID |
messages_list |
List all conversation IDs |
messages_delete |
Delete all messages for a conversation |
REST API
Thirteen endpoints, rooted at /v1/memory. Full OpenAPI-style surface:
| Method | Path | Handler |
|---|---|---|
| GET | /health |
Liveness probe |
| POST | /v1/memory |
Add messages (fact extraction) |
| GET | /v1/memory/recall?q=…&user_id=… |
Semantic recall |
| POST | /v1/memory/context |
Token-budgeted context assembly |
| GET | /v1/memory/search?q=…&user_id=… |
Keyword search |
| GET | /v1/memory/stats |
Aggregate statistics |
| POST | /v1/memory/consolidate |
Trigger consolidation |
| DELETE | /v1/memory/facts/:id |
Forget a fact |
| DELETE | /v1/memory/users/:id |
GDPR user-data delete |
| POST | /v1/memory/messages |
Save messages for a conversation |
| GET | /v1/memory/messages |
List conversation IDs |
| GET | /v1/memory/messages/:id |
Get messages for a conversation |
| DELETE | /v1/memory/messages/:id |
Delete a conversation |
Configuration
All settings flow through CLI flags or environment variables. Env vars are the recommended way to configure the Docker image.
Core
| CLI flag | Env var | Default | Notes |
|---|---|---|---|
--db |
ENGRAM_DB_PATH |
engram.db |
SQLite file path or postgres://… connection URL |
--extract-on-save |
ENGRAM_EXTRACT_ON_SAVE |
true |
Enable fact extraction when saving chat messages |
--mode |
ENGRAM_MODE |
mcp |
mcp (stdio) or rest (HTTP) |
--port |
ENGRAM_PORT |
9090 |
HTTP port in REST mode |
--llm-provider |
ENGRAM_LLM_PROVIDER |
ollama |
ollama, openai-compatible (alias openai), anthropic, google, command, mock |
--embedding-provider |
ENGRAM_EMBEDDING_PROVIDER |
ollama |
ollama or mock (cloud embedding providers are planned) |
--embedding-model |
ENGRAM_EMBEDDING_MODEL |
nomic-embed-text |
Ollama embedding model |
--embedding-dims |
ENGRAM_EMBEDDING_DIMS |
768 |
Must match the embedding model's output |
Ollama
| CLI flag | Env var | Default |
|---|---|---|
--ollama-url |
ENGRAM_OLLAMA_URL |
http://localhost:11434 (Docker image: http://host.docker.internal:11434) |
--ollama-llm-model |
ENGRAM_OLLAMA_LLM_MODEL |
llama3.2 |
OpenAI-compatible (OpenAI, Groq, Together, Mistral, DeepSeek, Azure, vLLM, …)
| CLI flag | Env var | Default |
|---|---|---|
--openai-api-key |
OPENAI_API_KEY |
(required when provider is openai-compatible) |
--openai-base-url |
ENGRAM_OPENAI_BASE_URL |
https://api.openai.com/v1 — change this to switch provider, see the big table above |
--openai-model |
ENGRAM_OPENAI_MODEL |
gpt-4o-mini |
Anthropic
| CLI flag | Env var | Default |
|---|---|---|
--anthropic-api-key |
ANTHROPIC_API_KEY |
(required when provider is anthropic) |
--anthropic-base-url |
ENGRAM_ANTHROPIC_BASE_URL |
https://api.anthropic.com |
--anthropic-model |
ENGRAM_ANTHROPIC_MODEL |
claude-haiku-4-5-20251001 |
Google Gemini
| CLI flag | Env var | Default |
|---|---|---|
--google-api-key |
GOOGLE_API_KEY |
(required when provider is google) |
--google-base-url |
ENGRAM_GOOGLE_BASE_URL |
https://generativelanguage.googleapis.com/v1beta |
--google-model |
ENGRAM_GOOGLE_MODEL |
gemini-flash-latest |
Command (shell-out)
| CLI flag | Env var | Default |
|---|---|---|
--llm-command |
ENGRAM_LLM_COMMAND |
(required when provider is command) — run via sh -c |
--llm-command-timeout |
ENGRAM_LLM_COMMAND_TIMEOUT |
120 — seconds before the child is killed |
The
mockbackend returns empty facts and deterministic byte-cycled vectors. It exists for tests and CI. The server prints a clear warning on startup if it detects mock mode in use. API keys and command contents are never echoed to stdout or logs — the server only describesprovider model at base_urlorcommand \<first 60 chars>``.
Database Backend
Engram supports SQLite (default) and PostgreSQL backends. The backend is selected by the --db URL:
# SQLite (default)
# PostgreSQL
When using PostgreSQL, tables are created automatically on first startup.
Chat Message Store
Engram can store raw chat messages alongside extracted facts. This enables Spring AI ChatMemoryRepository integration.
REST Endpoints:
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/memory/messages |
Save messages for a conversation |
| GET | /v1/memory/messages |
List conversation IDs |
| GET | /v1/memory/messages/{id} |
Get messages for a conversation |
| DELETE | /v1/memory/messages/{id} |
Delete a conversation |
MCP Tools: messages_save, messages_get, messages_list, messages_delete
Fact extraction on message save is controlled by --extract-on-save (default: true).
Embedding the library
If you don't want a separate process, depend on jamjet-engram directly:
[]
= "0.3.2"
use ;
let embedding = Boxnew;
let memory = open.await?;
let scope = user;
let messages = vec!;
let llm = Boxnew;
let fact_ids = memory
.add_messages
.await?;
// Later, recall
let facts = memory.recall.await?;
Other languages
Engram is part of the JamJet ecosystem. These clients speak to engram-server over REST:
| Language | Package | Install |
|---|---|---|
| Python | jamjet (includes EngramClient) |
pip install jamjet |
| Java | dev.jamjet:jamjet-sdk (includes EngramClient) |
Maven Central |
| Spring Boot | dev.jamjet:engram-spring-boot-starter |
Maven Central — zero-config @Bean EngramMemory |
| Rust | jamjet-engram (embed directly) |
cargo add jamjet-engram |
See java-ai-memory.dev for how Engram compares to LangChain4j ChatMemory, Spring AI ChatMemory, Koog, Embabel, Google ADK Memory Bank, the Mem0 community wrapper, and Zep — with honest notes on where Engram fits and where the alternatives are more mature.
Architecture
┌──────────────────────────────────────────────────────────┐
│ Clients: Claude Desktop, Cursor, JamJet, │
│ Python SDK, Java SDK, Spring Boot, cURL │
├─────────────────┬────────────────────────────────────────┤
│ MCP (stdio) │ REST (HTTP + JSON) │
├─────────────────┴────────────────────────────────────────┤
│ engram-server │
│ (jamjet-engram-server) │
├──────────────────────────────────────────────────────────┤
│ engram (lib) │
│ Extraction pipeline │ Hybrid retrieval │ Scopes │
│ Conflict detection │ Consolidation │ Context │
├──────────────────────────────────────────────────────────┤
│ LlmClient trait │ EmbeddingProvider trait │
│ Ollama · Mock │ Ollama · Mock │
├──────────────────────────────────────────────────────────┤
│ Storage │
│ SQLite or PostgreSQL (facts, entities, FTS, vectors) │
└──────────────────────────────────────────────────────────┘
What Engram is not
- Not just a chat-history store. Engram now includes a message store (see Chat Message Store), but its primary value is the fact extraction and knowledge graph layer on top. If all you need is the last N messages of a conversation, a framework's built-in window may be simpler.
- Not a state checkpointer. If you need to snapshot agent execution state for resume and replay, that's what LangGraph, Koog persistence, or JamJet's own durable runtime does. Pair them with Engram — they solve different problems.
- Not a managed service. No hosted plane, no auth layer beyond scopes, no SLA. Bring your own process manager.
- Not benchmarked yet. LongMemEval and DMR numbers are on the roadmap. Until they exist, treat comparative claims with the skepticism they deserve.
Roadmap
| Status | Item |
|---|---|
| ✅ | Fact extraction pipeline + SQLite storage |
| ✅ | Hybrid retrieval (vector + FTS5 with prefix matching) |
| ✅ | Consolidation engine (decay, promote, dedup, summarize, reflect) |
| ✅ | MCP stdio server (7 tools) |
| ✅ | REST API (9 endpoints) |
| ✅ | Multi-provider LLM backends: Ollama, OpenAI-compatible (OpenAI/Azure/Groq/Together/Mistral/vLLM/LM Studio/…), Anthropic, Google, command shell-out |
| ✅ | Python, Java, Spring Boot clients |
| ✅ | Docker image, MCP Registry publish |
| ✅ | Postgres backend (in parallel to SQLite) |
| ✅ | Chat message store + Spring AI ChatMemoryRepository integration |
| 📋 | LongMemEval + DMR benchmark scores |
| 📋 | Quarkus extension |
| 📋 | Cloud embedding providers (OpenAI text-embedding-3, Google text-embedding-004) |
License
Apache 2.0 — see LICENSE.