<div align="center">
<h1>⚡ Engram</h1>
**Durable memory for AI agents — temporal knowledge graph, hybrid retrieval, SQLite-backed.**
[](https://crates.io/crates/jamjet-engram)
[](https://crates.io/crates/jamjet-engram-server)
[](https://github.com/jamjet-labs/jamjet/pkgs/container/engram-server)
[](https://registry.modelcontextprotocol.io/servers/io.github.jamjet-labs/engram-server)
[](../../LICENSE)
[java-ai-memory.dev](https://java-ai-memory.dev) · [Main repo](https://github.com/jamjet-labs/jamjet) · [JamJet docs](https://docs.jamjet.dev) · [Discord](https://discord.gg/SAYnEj86fr)
</div>
---
Engram is a **durable memory layer for AI agents**. It extracts facts from conversations, stores them in a temporal knowledge graph, and retrieves them with hybrid semantic + keyword search — all backed by a single SQLite file.
It ships in two shapes:
- **`jamjet-engram`** — a Rust library you embed in your own application.
- **`jamjet-engram-server`** (this crate) — a standalone binary that speaks **MCP over stdio** and **REST over HTTP**, so Claude Desktop, Cursor, and any HTTP client can use it with no code.
Engram is **provider-agnostic**. Five LLM backends are wired in out of the box, and a sixth — `command` — lets you shell out to any external script, so you can plug in a provider Engram does not ship natively without touching Rust code:
| `ollama` (default) | Local Ollama via `/api/chat`. Free, no API keys, runs on your laptop. |
| `openai-compatible` | **Any endpoint that speaks OpenAI's chat-completions protocol** — see the long list below. |
| `anthropic` | Anthropic Claude via the Messages API. |
| `google` | Google Gemini via `generateContent` with native JSON mode. |
| `command` | Shell out to a user-supplied script. Infinite extensibility, zero recompile. |
| `mock` | Deterministic tests-only backend — returns empty facts. |
Pick one with `ENGRAM_LLM_PROVIDER=…` — the same binary handles all of them.
> **State of the project, April 2026.** Engram is new — v0.3.2, small community, no public LongMemEval / DMR numbers yet. The architecture below works, the tests pass, the Docker image runs. If you need production-scale memory today, Mem0 Cloud and Zep Cloud are more mature. If you need a tryable, self-hostable, single-binary memory layer that doesn't require Python, Postgres, Qdrant, or Neo4j, Engram is built for you.
## Why Engram?
| Every agent memory library is Python-first | **Rust core** with native Python, Java, and MCP clients — no sidecar required |
| Needs Postgres + Qdrant + Neo4j just to try | **Single SQLite file**, zero infra |
| Conversation history is not knowledge memory | **Fact extraction pipeline** — pulls structured facts out of messages |
| Old facts drift and contradict each other | **Conflict detection + consolidation** — decay, promote, dedup, summarize, reflect |
| Memory recall is either semantic OR keyword | **Hybrid retrieval** — vector search + SQLite FTS5 in one query |
| Agents lose memory across processes | **Durable by default** — one SQLite file, crash-safe, portable |
| MCP support is an afterthought | **MCP-native** — 7 tools exposed by a single binary |
| No time-travel over what the agent knew | **Temporal knowledge graph** — every fact is scoped and timestamped |
| Can't isolate memory per user or tenant | **First-class scopes** — org / user / session built into every query |
---
## Quickstart — 30 seconds
Engram speaks to four LLM providers out of the box: **Ollama** (default — free, local), **OpenAI**, **Anthropic Claude**, and **Google Gemini**. Pick one.
### Option A — Ollama (free, local, zero API keys)
**Requirements:** [Ollama](https://ollama.com/) with `llama3.2` and `nomic-embed-text` pulled:
```bash
ollama pull llama3.2
ollama pull nomic-embed-text
```
**Docker (easiest):**
```bash
docker run --rm -i \
-v engram-data:/data \
ghcr.io/jamjet-labs/engram-server:0.3.2
```
The container defaults to Ollama at `host.docker.internal:11434` — works out of the box on Docker Desktop for Mac and Windows. **Linux users** need `--add-host=host.docker.internal:host-gateway`.
### Option B — Anthropic Claude (hosted, highest quality)
```bash
docker run --rm -i \
-e ENGRAM_LLM_PROVIDER=anthropic \
-e ANTHROPIC_API_KEY=sk-ant-... \
-v engram-data:/data \
ghcr.io/jamjet-labs/engram-server:0.3.2
```
Defaults to `claude-haiku-4-5-20251001`. Override with `-e ENGRAM_ANTHROPIC_MODEL=claude-sonnet-4-6`.
> Note: Anthropic has no native JSON mode, so Engram parses JSON from text responses. The `llm_util::extract_json_payload` helper strips markdown fences that Claude occasionally emits.
### Option C — Google Gemini
```bash
docker run --rm -i \
-e ENGRAM_LLM_PROVIDER=google \
-e GOOGLE_API_KEY=AIza... \
-v engram-data:/data \
ghcr.io/jamjet-labs/engram-server:0.3.2
```
Defaults to `gemini-flash-latest`. Override with `-e ENGRAM_GOOGLE_MODEL=gemini-2.5-flash`.
### Option D — Any OpenAI-compatible endpoint
```bash
docker run --rm -i \
-e ENGRAM_LLM_PROVIDER=openai-compatible \
-e OPENAI_API_KEY=sk-... \
-v engram-data:/data \
ghcr.io/jamjet-labs/engram-server:0.3.2
```
Defaults to OpenAI itself (`https://api.openai.com/v1`, `gpt-4o-mini`). **Change one env var to point at any of these providers without recompiling:**
| OpenAI | `https://api.openai.com/v1` | default |
| Azure OpenAI | `https://<resource>.openai.azure.com/openai/deployments/<deployment>` | |
| Groq | `https://api.groq.com/openai/v1` | very fast inference |
| Together.ai | `https://api.together.xyz/v1` | |
| Mistral | `https://api.mistral.ai/v1` | |
| DeepSeek | `https://api.deepseek.com/v1` | |
| Perplexity | `https://api.perplexity.ai` | |
| OpenRouter | `https://openrouter.ai/api/v1` | one key, many models |
| Fireworks | `https://api.fireworks.ai/inference/v1` | |
| vLLM (self-hosted) | `http://your-host:8000/v1` | |
| LM Studio | `http://localhost:1234/v1` | local desktop app |
| LocalAI | `http://localhost:8080/v1` | self-hosted |
| Ollama `/v1` compat layer | `http://localhost:11434/v1` | alternate path for Ollama |
| Your corporate LLM gateway | whatever URL your infra team exposes | |
`openai` is accepted as a backwards-compatible alias for `openai-compatible`.
### Option E — `command` (shell out to anything)
For providers Engram does not ship natively — an internal model behind a custom RPC, a raw SOAP endpoint, a local inference binary, a quick wrapper over a Python SDK, or tomorrow's new provider — use the `command` backend. Engram will spawn your script per extraction call, pipe a JSON request to its stdin, and read a JSON response from its stdout.
**The contract.** Your script reads one JSON object from stdin:
```json
{"system": "<extraction prompt>", "user": "<conversation text>", "structured": true}
```
Then writes one JSON value to stdout. Either a bare value:
```json
{"facts": [{"text": "...", "entities": [...], "confidence": 0.95, "category": "..."}]}
```
Or an envelope (useful for surfacing errors):
```json
{"content": {"facts": [...]}}
{"error": "rate limited, try again"}
```
Exit 0 on success, non-zero (with stderr) on failure.
**Example Python wrapper (~15 lines):**
```python
#!/usr/bin/env python3
# my-llm.py — wraps any Python SDK for Engram.
import json, sys
from my_llm_sdk import chat # your SDK
req = json.loads(sys.stdin.read())
resp = chat(
system=req["system"],
user=req["user"],
json_mode=req.get("structured", False),
)
sys.stdout.write(json.dumps(resp))
```
**Point Engram at it:**
```bash
docker run --rm -i \
-e ENGRAM_LLM_PROVIDER=command \
-e ENGRAM_LLM_COMMAND="python /scripts/my-llm.py" \
-v $(pwd)/scripts:/scripts \
-v engram-data:/data \
ghcr.io/jamjet-labs/engram-server:0.3.2
```
Timeout is 120 seconds by default — override with `ENGRAM_LLM_COMMAND_TIMEOUT=30`.
> **Security:** the `command` provider runs **arbitrary commands** as the Engram process user. Never use it in a multi-tenant deployment where untrusted users can set `ENGRAM_LLM_COMMAND`. It is a local and single-tenant feature.
### Option F — cargo install (all providers same binary)
```bash
cargo install jamjet-engram-server
engram serve # ollama (default)
ENGRAM_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=... engram serve
ENGRAM_LLM_PROVIDER=command ENGRAM_LLM_COMMAND="python /path/to/wrapper.py" engram serve
```
### Option G — REST mode for testing
```bash
engram serve --mode rest --port 9090
```
```bash
curl -X POST http://localhost:9090/v1/memory \
-H 'content-type: application/json' \
-d '{
"user_id": "alice",
"messages": [
{"role": "user", "content": "I am allergic to peanuts and I love sourdough."}
]
}'
curl 'http://localhost:9090/v1/memory/recall?q=food%20allergies&user_id=alice'
```
No server. No config. No Python sidecar. One binary.
---
## MCP client configuration
### Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`)
```json
{
"mcpServers": {
"engram": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"-v", "engram-data:/data",
"ghcr.io/jamjet-labs/engram-server:0.3.2"
]
}
}
}
```
### Cursor / any MCP-aware IDE
Point it at the same `docker run` command, or at a locally-installed `engram serve` binary. After restart, seven `memory_*` tools are available to the model.
---
## The seven MCP tools
| `memory_add` | Extract and store facts from conversation messages (calls the LLM for extraction) |
| `memory_recall` | Semantic search over stored facts, scoped by `user_id` / `org_id` |
| `memory_context` | Assemble a token-budgeted context block for an LLM prompt, with tier-aware selection |
| `memory_search` | Keyword search over facts (SQLite FTS5) |
| `memory_forget` | Soft-delete a fact by ID with an optional reason |
| `memory_stats` | Aggregate counts: total facts, valid facts, entities, relationships |
| `memory_consolidate` | Run a consolidation cycle — decay stale facts, promote high-value ones, dedup near-duplicates |
All scoped by `(org_id, user_id, session_id)` — org is the coarsest, session the finest.
---
## REST API
Nine endpoints, rooted at `/v1/memory`. Full OpenAPI-style surface:
| GET | `/health` | Liveness probe |
| POST | `/v1/memory` | Add messages (fact extraction) |
| GET | `/v1/memory/recall?q=…&user_id=…` | Semantic recall |
| POST | `/v1/memory/context` | Token-budgeted context assembly |
| GET | `/v1/memory/search?q=…&user_id=…` | Keyword search |
| GET | `/v1/memory/stats` | Aggregate statistics |
| POST | `/v1/memory/consolidate` | Trigger consolidation |
| DELETE | `/v1/memory/facts/:id` | Forget a fact |
| DELETE | `/v1/memory/users/:id` | GDPR user-data delete |
---
## Configuration
All settings flow through CLI flags or environment variables. Env vars are the recommended way to configure the Docker image.
### Core
| `--db` | `ENGRAM_DB_PATH` | `engram.db` | SQLite file path |
| `--mode` | `ENGRAM_MODE` | `mcp` | `mcp` (stdio) or `rest` (HTTP) |
| `--port` | `ENGRAM_PORT` | `9090` | HTTP port in REST mode |
| `--llm-provider` | `ENGRAM_LLM_PROVIDER` | `ollama` | `ollama`, `openai-compatible` (alias `openai`), `anthropic`, `google`, `command`, `mock` |
| `--embedding-provider` | `ENGRAM_EMBEDDING_PROVIDER` | `ollama` | `ollama` or `mock` (cloud embedding providers are planned) |
| `--embedding-model` | `ENGRAM_EMBEDDING_MODEL` | `nomic-embed-text` | Ollama embedding model |
| `--embedding-dims` | `ENGRAM_EMBEDDING_DIMS` | `768` | Must match the embedding model's output |
### Ollama
| `--ollama-url` | `ENGRAM_OLLAMA_URL` | `http://localhost:11434` (Docker image: `http://host.docker.internal:11434`) |
| `--ollama-llm-model` | `ENGRAM_OLLAMA_LLM_MODEL` | `llama3.2` |
### OpenAI-compatible (OpenAI, Groq, Together, Mistral, DeepSeek, Azure, vLLM, …)
| `--openai-api-key` | `OPENAI_API_KEY` | *(required when provider is `openai-compatible`)* |
| `--openai-base-url` | `ENGRAM_OPENAI_BASE_URL` | `https://api.openai.com/v1` — **change this to switch provider**, see the big table above |
| `--openai-model` | `ENGRAM_OPENAI_MODEL` | `gpt-4o-mini` |
### Anthropic
| `--anthropic-api-key` | `ANTHROPIC_API_KEY` | *(required when provider is `anthropic`)* |
| `--anthropic-base-url` | `ENGRAM_ANTHROPIC_BASE_URL` | `https://api.anthropic.com` |
| `--anthropic-model` | `ENGRAM_ANTHROPIC_MODEL` | `claude-haiku-4-5-20251001` |
### Google Gemini
| `--google-api-key` | `GOOGLE_API_KEY` | *(required when provider is `google`)* |
| `--google-base-url` | `ENGRAM_GOOGLE_BASE_URL` | `https://generativelanguage.googleapis.com/v1beta` |
| `--google-model` | `ENGRAM_GOOGLE_MODEL` | `gemini-flash-latest` |
### Command (shell-out)
| `--llm-command` | `ENGRAM_LLM_COMMAND` | *(required when provider is `command`)* — run via `sh -c` |
| `--llm-command-timeout` | `ENGRAM_LLM_COMMAND_TIMEOUT` | `120` — seconds before the child is killed |
> The `mock` backend returns empty facts and deterministic byte-cycled vectors. It exists for tests and CI. The server prints a clear warning on startup if it detects mock mode in use. API keys and command contents are never echoed to stdout or logs — the server only describes `provider model at base_url` or `command \`<first 60 chars>\``.
---
## Embedding the library
If you don't want a separate process, depend on `jamjet-engram` directly:
```toml
[dependencies]
jamjet-engram = "0.3.2"
```
```rust
use engram::{Memory, OllamaEmbeddingProvider, OllamaLlmClient, Scope, ExtractionConfig, Message};
let embedding = Box::new(OllamaEmbeddingProvider::new());
let memory = Memory::open("sqlite:engram.db?mode=rwc", embedding).await?;
let scope = Scope::user("default", "alice");
let messages = vec![Message {
role: "user".into(),
content: "I am allergic to peanuts.".into(),
}];
let llm = Box::new(OllamaLlmClient::new());
let fact_ids = memory
.add_messages(&messages, scope.clone(), llm, ExtractionConfig::default())
.await?;
// Later, recall
let facts = memory.recall(&engram::memory::RecallQuery {
query: "food allergies".into(),
scope: Some(scope),
max_results: 5,
as_of: None,
min_score: None,
}).await?;
```
---
## Other languages
Engram is part of the JamJet ecosystem. These clients speak to `engram-server` over REST:
| Python | `jamjet` (includes `EngramClient`) | `pip install jamjet` |
| Java | `dev.jamjet:jamjet-sdk` (includes `EngramClient`) | Maven Central |
| Spring Boot | `dev.jamjet:engram-spring-boot-starter` | Maven Central — zero-config `@Bean EngramMemory` |
| Rust | `jamjet-engram` (embed directly) | `cargo add jamjet-engram` |
See [java-ai-memory.dev](https://java-ai-memory.dev) for how Engram compares to LangChain4j ChatMemory, Spring AI ChatMemory, Koog, Embabel, Google ADK Memory Bank, the Mem0 community wrapper, and Zep — with honest notes on where Engram fits and where the alternatives are more mature.
---
## Architecture
```
┌──────────────────────────────────────────────────────────┐
│ Clients: Claude Desktop, Cursor, JamJet, │
│ Python SDK, Java SDK, Spring Boot, cURL │
├─────────────────┬────────────────────────────────────────┤
│ MCP (stdio) │ REST (HTTP + JSON) │
├─────────────────┴────────────────────────────────────────┤
│ engram-server │
│ (jamjet-engram-server) │
├──────────────────────────────────────────────────────────┤
│ engram (lib) │
│ Extraction pipeline │ Hybrid retrieval │ Scopes │
│ Conflict detection │ Consolidation │ Context │
├──────────────────────────────────────────────────────────┤
│ LlmClient trait │ EmbeddingProvider trait │
│ Ollama · Mock │ Ollama · Mock │
├──────────────────────────────────────────────────────────┤
│ Storage │
│ SQLite (facts, entities, FTS5, vectors) │
└──────────────────────────────────────────────────────────┘
```
---
## What Engram is **not**
- **Not a chat-history store.** If all you need is the last N messages of a conversation, use LangChain4j `ChatMemory`, Spring AI `ChatMemory`, or any framework's built-in window.
- **Not a state checkpointer.** If you need to snapshot agent execution state for resume and replay, that's what LangGraph, Koog persistence, or JamJet's own durable runtime does. Pair them with Engram — they solve different problems.
- **Not a managed service.** No hosted plane, no auth layer beyond scopes, no SLA. Bring your own process manager.
- **Not benchmarked yet.** LongMemEval and DMR numbers are on the roadmap. Until they exist, treat comparative claims with the skepticism they deserve.
---
## Roadmap
| ✅ | Fact extraction pipeline + SQLite storage |
| ✅ | Hybrid retrieval (vector + FTS5 with prefix matching) |
| ✅ | Consolidation engine (decay, promote, dedup, summarize, reflect) |
| ✅ | MCP stdio server (7 tools) |
| ✅ | REST API (9 endpoints) |
| ✅ | Multi-provider LLM backends: Ollama, OpenAI-compatible (OpenAI/Azure/Groq/Together/Mistral/vLLM/LM Studio/…), Anthropic, Google, `command` shell-out |
| ✅ | Python, Java, Spring Boot clients |
| ✅ | Docker image, MCP Registry publish |
| 🔄 | Postgres backend (in parallel to SQLite) |
| 🔄 | Spring AI `ChatMemoryRepository` implementation |
| 📋 | LongMemEval + DMR benchmark scores |
| 📋 | Quarkus extension |
| 📋 | Cloud embedding providers (OpenAI `text-embedding-3`, Google `text-embedding-004`) |
---
## License
Apache 2.0 — see [LICENSE](../../LICENSE).
---
<div align="center">
<sub>Part of <a href="https://jamjet.dev">JamJet</a> · Built by <a href="https://github.com/sunilp">Sunil Prakash</a> · © 2026 JamJet Labs</sub>
</div>