jamjet-engram-server 0.5.0

Durable memory for AI agents — temporal knowledge graph, hybrid retrieval, SQLite or PostgreSQL.

java-ai-memory.dev · Main repo · JamJet docs · Discord

Engram is a durable memory layer for AI agents. It extracts facts from conversations, stores them in a temporal knowledge graph, and retrieves them with hybrid semantic + keyword search — backed by a single SQLite file or a PostgreSQL database.

It ships in two shapes:

jamjet-engram — a Rust library you embed in your own application.
jamjet-engram-server (this crate) — a standalone binary that speaks MCP over stdio and REST over HTTP, so Claude Desktop, Cursor, and any HTTP client can use it with no code.

Engram is provider-agnostic. Five LLM backends are wired in out of the box, and a sixth — command — lets you shell out to any external script, so you can plug in a provider Engram does not ship natively without touching Rust code:

`ENGRAM_LLM_PROVIDER=`	What it does
`ollama` (default)	Local Ollama via `/api/chat`. Free, no API keys, runs on your laptop.
`openai-compatible`	Any endpoint that speaks OpenAI's chat-completions protocol — see the long list below.
`anthropic`	Anthropic Claude via the Messages API.
`google`	Google Gemini via `generateContent` with native JSON mode.
`command`	Shell out to a user-supplied script. Infinite extensibility, zero recompile.
`mock`	Deterministic tests-only backend — returns empty facts.

Pick one with ENGRAM_LLM_PROVIDER=… — the same binary handles all of them.

State of the project, April 2026. Engram is new — v0.3.2, small community, no public LongMemEval / DMR numbers yet. The architecture below works, the tests pass, the Docker image runs. If you need production-scale memory today, Mem0 Cloud and Zep Cloud are more mature. If you need a tryable, self-hostable, single-binary memory layer that doesn't require Python, Postgres, Qdrant, or Neo4j, Engram is built for you.

Why Engram?

Problem	Engram's answer
Every agent memory library is Python-first	Rust core with native Python, Java, and MCP clients — no sidecar required
Needs Postgres + Qdrant + Neo4j just to try	Single SQLite file (zero infra) or Postgres when you need it
Conversation history is not knowledge memory	Fact extraction pipeline — pulls structured facts out of messages
Old facts drift and contradict each other	Conflict detection + consolidation — decay, promote, dedup, summarize, reflect
Memory recall is either semantic OR keyword	Hybrid retrieval — vector search + SQLite FTS5 in one query
Agents lose memory across processes	Durable by default — SQLite or Postgres, crash-safe, portable
MCP support is an afterthought	MCP-native — 7 tools exposed by a single binary
No time-travel over what the agent knew	Temporal knowledge graph — every fact is scoped and timestamped
Can't isolate memory per user or tenant	First-class scopes — org / user / session built into every query

Quickstart — 30 seconds

Engram speaks to four LLM providers out of the box: Ollama (default — free, local), OpenAI, Anthropic Claude, and Google Gemini. Pick one.

Option A — Ollama (free, local, zero API keys)

Requirements: Ollama with llama3.2 and nomic-embed-text pulled:

ollama pull llama3.2
ollama pull nomic-embed-text

Docker (easiest):

docker run --rm -i \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

The container defaults to Ollama at host.docker.internal:11434 — works out of the box on Docker Desktop for Mac and Windows. Linux users need --add-host=host.docker.internal:host-gateway.

Option B — Anthropic Claude (hosted, highest quality)

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=anthropic \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to claude-haiku-4-5-20251001. Override with -e ENGRAM_ANTHROPIC_MODEL=claude-sonnet-4-6.

Note: Anthropic has no native JSON mode, so Engram parses JSON from text responses. The llm_util::extract_json_payload helper strips markdown fences that Claude occasionally emits.

Option C — Google Gemini

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=google \
  -e GOOGLE_API_KEY=AIza... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to gemini-flash-latest. Override with -e ENGRAM_GOOGLE_MODEL=gemini-2.5-flash.

Option D — Any OpenAI-compatible endpoint

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=openai-compatible \
  -e OPENAI_API_KEY=sk-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to OpenAI itself (https://api.openai.com/v1, gpt-4o-mini). Change one env var to point at any of these providers without recompiling:

Provider	`ENGRAM_OPENAI_BASE_URL=`	Note
OpenAI	`https://api.openai.com/v1`	default
Azure OpenAI	`https://<resource>.openai.azure.com/openai/deployments/<deployment>`
Groq	`https://api.groq.com/openai/v1`	very fast inference
Together.ai	`https://api.together.xyz/v1`
Mistral	`https://api.mistral.ai/v1`
DeepSeek	`https://api.deepseek.com/v1`
Perplexity	`https://api.perplexity.ai`
OpenRouter	`https://openrouter.ai/api/v1`	one key, many models
Fireworks	`https://api.fireworks.ai/inference/v1`
vLLM (self-hosted)	`http://your-host:8000/v1`
LM Studio	`http://localhost:1234/v1`	local desktop app
LocalAI	`http://localhost:8080/v1`	self-hosted
Ollama `/v1` compat layer	`http://localhost:11434/v1`	alternate path for Ollama
Your corporate LLM gateway	whatever URL your infra team exposes

openai is accepted as a backwards-compatible alias for openai-compatible.

Option E — `command` (shell out to anything)

For providers Engram does not ship natively — an internal model behind a custom RPC, a raw SOAP endpoint, a local inference binary, a quick wrapper over a Python SDK, or tomorrow's new provider — use the command backend. Engram will spawn your script per extraction call, pipe a JSON request to its stdin, and read a JSON response from its stdout.

The contract. Your script reads one JSON object from stdin:

{"system": "<extraction prompt>", "user": "<conversation text>", "structured": true}

Then writes one JSON value to stdout. Either a bare value:

{"facts": [{"text": "...", "entities": [...], "confidence": 0.95, "category": "..."}]}

Or an envelope (useful for surfacing errors):

{"content": {"facts": [...]}}
{"error": "rate limited, try again"}

Exit 0 on success, non-zero (with stderr) on failure.

Example Python wrapper (~15 lines):

#!/usr/bin/env python3
# my-llm.py — wraps any Python SDK for Engram.
import json, sys
from my_llm_sdk import chat  # your SDK

req = json.loads(sys.stdin.read())
resp = chat(
    system=req["system"],
    user=req["user"],
    json_mode=req.get("structured", False),
)
sys.stdout.write(json.dumps(resp))

Point Engram at it:

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=command \
  -e ENGRAM_LLM_COMMAND="python /scripts/my-llm.py" \
  -v $(pwd)/scripts:/scripts \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Timeout is 120 seconds by default — override with ENGRAM_LLM_COMMAND_TIMEOUT=30.

Security: the command provider runs arbitrary commands as the Engram process user. Never use it in a multi-tenant deployment where untrusted users can set ENGRAM_LLM_COMMAND. It is a local and single-tenant feature.

Option F — cargo install (all providers same binary)

cargo install jamjet-engram-server
engram serve                                            # ollama (default)
ENGRAM_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=... engram serve
ENGRAM_LLM_PROVIDER=command ENGRAM_LLM_COMMAND="python /path/to/wrapper.py" engram serve

Option G — REST mode for testing

engram serve --mode rest --port 9090

curl -X POST http://localhost:9090/v1/memory \
  -H 'content-type: application/json' \
  -d '{
    "user_id": "alice",
    "messages": [
      {"role": "user", "content": "I am allergic to peanuts and I love sourdough."}
    ]
  }'

curl 'http://localhost:9090/v1/memory/recall?q=food%20allergies&user_id=alice'

No server. No config. No Python sidecar. One binary.

MCP client configuration

Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`)

{
  "mcpServers": {
    "engram": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-v", "engram-data:/data",
        "ghcr.io/jamjet-labs/engram-server:0.3.2"
      ]
    }
  }
}

Cursor / any MCP-aware IDE

Point it at the same docker run command, or at a locally-installed engram serve binary. After restart, eleven tools are available to the model (seven memory_* + four messages_*).

MCP tools

Memory tools (7)

Tool	What it does
`memory_add`	Extract and store facts from conversation messages (calls the LLM for extraction)
`memory_recall`	Semantic search over stored facts, scoped by `user_id` / `org_id`
`memory_context`	Assemble a token-budgeted context block for an LLM prompt, with tier-aware selection
`memory_search`	Keyword search over facts (SQLite FTS5 / Postgres full-text)
`memory_forget`	Soft-delete a fact by ID with an optional reason
`memory_stats`	Aggregate counts: total facts, valid facts, entities, relationships
`memory_consolidate`	Run a consolidation cycle — decay stale facts, promote high-value ones, dedup near-duplicates

All scoped by (org_id, user_id, session_id) — org is the coarsest, session the finest.

Message store tools (4)

Tool	What it does
`messages_save`	Save chat messages for a conversation (optionally triggers fact extraction)
`messages_get`	Get all messages for a conversation by ID
`messages_list`	List all conversation IDs
`messages_delete`	Delete all messages for a conversation

REST API

Thirteen endpoints, rooted at /v1/memory. Full OpenAPI-style surface:

Method	Path	Handler
GET	`/health`	Liveness probe
POST	`/v1/memory`	Add messages (fact extraction)
GET	`/v1/memory/recall?q=…&user_id=…`	Semantic recall
POST	`/v1/memory/context`	Token-budgeted context assembly
GET	`/v1/memory/search?q=…&user_id=…`	Keyword search
GET	`/v1/memory/stats`	Aggregate statistics
POST	`/v1/memory/consolidate`	Trigger consolidation
DELETE	`/v1/memory/facts/:id`	Forget a fact
DELETE	`/v1/memory/users/:id`	GDPR user-data delete
POST	`/v1/memory/messages`	Save messages for a conversation
GET	`/v1/memory/messages`	List conversation IDs
GET	`/v1/memory/messages/:id`	Get messages for a conversation
DELETE	`/v1/memory/messages/:id`	Delete a conversation

Configuration

All settings flow through CLI flags or environment variables. Env vars are the recommended way to configure the Docker image.

Core

CLI flag	Env var	Default	Notes
`--db`	`ENGRAM_DB_PATH`	`engram.db`	SQLite file path or `postgres://…` connection URL
`--extract-on-save`	`ENGRAM_EXTRACT_ON_SAVE`	`true`	Enable fact extraction when saving chat messages
`--mode`	`ENGRAM_MODE`	`mcp`	`mcp` (stdio) or `rest` (HTTP)
`--port`	`ENGRAM_PORT`	`9090`	HTTP port in REST mode
`--llm-provider`	`ENGRAM_LLM_PROVIDER`	`ollama`	`ollama`, `openai-compatible` (alias `openai`), `anthropic`, `google`, `command`, `mock`
`--embedding-provider`	`ENGRAM_EMBEDDING_PROVIDER`	`ollama`	`ollama` or `mock` (cloud embedding providers are planned)
`--embedding-model`	`ENGRAM_EMBEDDING_MODEL`	`nomic-embed-text`	Ollama embedding model
`--embedding-dims`	`ENGRAM_EMBEDDING_DIMS`	`768`	Must match the embedding model's output

Ollama

CLI flag	Env var	Default
`--ollama-url`	`ENGRAM_OLLAMA_URL`	`http://localhost:11434` (Docker image: `http://host.docker.internal:11434`)
`--ollama-llm-model`	`ENGRAM_OLLAMA_LLM_MODEL`	`llama3.2`

OpenAI-compatible (OpenAI, Groq, Together, Mistral, DeepSeek, Azure, vLLM, …)

CLI flag	Env var	Default
`--openai-api-key`	`OPENAI_API_KEY`	(required when provider is `openai-compatible`)
`--openai-base-url`	`ENGRAM_OPENAI_BASE_URL`	`https://api.openai.com/v1` — change this to switch provider, see the big table above
`--openai-model`	`ENGRAM_OPENAI_MODEL`	`gpt-4o-mini`

Anthropic

CLI flag	Env var	Default
`--anthropic-api-key`	`ANTHROPIC_API_KEY`	(required when provider is `anthropic`)
`--anthropic-base-url`	`ENGRAM_ANTHROPIC_BASE_URL`	`https://api.anthropic.com`
`--anthropic-model`	`ENGRAM_ANTHROPIC_MODEL`	`claude-haiku-4-5-20251001`

Google Gemini

CLI flag	Env var	Default
`--google-api-key`	`GOOGLE_API_KEY`	(required when provider is `google`)
`--google-base-url`	`ENGRAM_GOOGLE_BASE_URL`	`https://generativelanguage.googleapis.com/v1beta`
`--google-model`	`ENGRAM_GOOGLE_MODEL`	`gemini-flash-latest`

Command (shell-out)

CLI flag	Env var	Default
`--llm-command`	`ENGRAM_LLM_COMMAND`	(required when provider is `command`) — run via `sh -c`
`--llm-command-timeout`	`ENGRAM_LLM_COMMAND_TIMEOUT`	`120` — seconds before the child is killed

The mock backend returns empty facts and deterministic byte-cycled vectors. It exists for tests and CI. The server prints a clear warning on startup if it detects mock mode in use. API keys and command contents are never echoed to stdout or logs — the server only describes provider model at base_url or command \<first 60 chars>``.

Database Backend

Engram supports SQLite (default) and PostgreSQL backends. The backend is selected by the --db URL:

# SQLite (default)
engram serve --mode rest --db engram.db

# PostgreSQL
engram serve --mode rest --db postgres://user:pass@localhost:5432/engram

When using PostgreSQL, tables are created automatically on first startup.

Chat Message Store

Engram can store raw chat messages alongside extracted facts. This enables Spring AI ChatMemoryRepository integration.

REST Endpoints:

Method	Endpoint	Description
POST	`/v1/memory/messages`	Save messages for a conversation
GET	`/v1/memory/messages`	List conversation IDs
GET	`/v1/memory/messages/{id}`	Get messages for a conversation
DELETE	`/v1/memory/messages/{id}`	Delete a conversation

MCP Tools: messages_save, messages_get, messages_list, messages_delete

Fact extraction on message save is controlled by --extract-on-save (default: true).

Embedding the library

If you don't want a separate process, depend on jamjet-engram directly:

[dependencies]
jamjet-engram = "0.3.2"

use engram::{Memory, OllamaEmbeddingProvider, OllamaLlmClient, Scope, ExtractionConfig, Message};

let embedding = Box::new(OllamaEmbeddingProvider::new());
let memory = Memory::open("sqlite:engram.db?mode=rwc", embedding).await?;

let scope = Scope::user("default", "alice");
let messages = vec![Message {
    role: "user".into(),
    content: "I am allergic to peanuts.".into(),
}];

let llm = Box::new(OllamaLlmClient::new());
let fact_ids = memory
    .add_messages(&messages, scope.clone(), llm, ExtractionConfig::default())
    .await?;

// Later, recall
let facts = memory.recall(&engram::memory::RecallQuery {
    query: "food allergies".into(),
    scope: Some(scope),
    max_results: 5,
    as_of: None,
    min_score: None,
}).await?;

Other languages

Engram is part of the JamJet ecosystem. These clients speak to engram-server over REST:

Language	Package	Install
Python	`jamjet` (includes `EngramClient`)	`pip install jamjet`
Java	`dev.jamjet:jamjet-sdk` (includes `EngramClient`)	Maven Central
Spring Boot	`dev.jamjet:engram-spring-boot-starter`	Maven Central — zero-config `@Bean EngramMemory`
Rust	`jamjet-engram` (embed directly)	`cargo add jamjet-engram`

See java-ai-memory.dev for how Engram compares to LangChain4j ChatMemory, Spring AI ChatMemory, Koog, Embabel, Google ADK Memory Bank, the Mem0 community wrapper, and Zep — with honest notes on where Engram fits and where the alternatives are more mature.

Architecture

┌──────────────────────────────────────────────────────────┐
│           Clients: Claude Desktop, Cursor, JamJet,        │
│           Python SDK, Java SDK, Spring Boot, cURL         │
├─────────────────┬────────────────────────────────────────┤
│   MCP (stdio)   │           REST (HTTP + JSON)            │
├─────────────────┴────────────────────────────────────────┤
│                     engram-server                         │
│                 (jamjet-engram-server)                    │
├──────────────────────────────────────────────────────────┤
│                       engram (lib)                        │
│   Extraction pipeline  │  Hybrid retrieval  │  Scopes     │
│   Conflict detection   │  Consolidation     │  Context    │
├──────────────────────────────────────────────────────────┤
│    LlmClient trait        │    EmbeddingProvider trait    │
│    Ollama · Mock          │    Ollama · Mock              │
├──────────────────────────────────────────────────────────┤
│                       Storage                             │
│   SQLite or PostgreSQL (facts, entities, FTS, vectors)   │
└──────────────────────────────────────────────────────────┘

What Engram is not

Not just a chat-history store. Engram now includes a message store (see Chat Message Store), but its primary value is the fact extraction and knowledge graph layer on top. If all you need is the last N messages of a conversation, a framework's built-in window may be simpler.
Not a state checkpointer. If you need to snapshot agent execution state for resume and replay, that's what LangGraph, Koog persistence, or JamJet's own durable runtime does. Pair them with Engram — they solve different problems.
Not a managed service. No hosted plane, no auth layer beyond scopes, no SLA. Bring your own process manager.
Not benchmarked yet. LongMemEval and DMR numbers are on the roadmap. Until they exist, treat comparative claims with the skepticism they deserve.

Roadmap

Status	Item
✅	Fact extraction pipeline + SQLite storage
✅	Hybrid retrieval (vector + FTS5 with prefix matching)
✅	Consolidation engine (decay, promote, dedup, summarize, reflect)
✅	MCP stdio server (7 tools)
✅	REST API (9 endpoints)
✅	Multi-provider LLM backends: Ollama, OpenAI-compatible (OpenAI/Azure/Groq/Together/Mistral/vLLM/LM Studio/…), Anthropic, Google, `command` shell-out
✅	Python, Java, Spring Boot clients
✅	Docker image, MCP Registry publish
✅	Postgres backend (in parallel to SQLite)
✅	Chat message store + Spring AI `ChatMemoryRepository` integration
📋	LongMemEval + DMR benchmark scores
📋	Quarkus extension
📋	Cloud embedding providers (OpenAI `text-embedding-3`, Google `text-embedding-004`)

License

Apache 2.0 — see LICENSE.