jamjet-engram-server 0.5.0

Engram MCP server — memory layer for AI agents. MCP stdio + REST API.
Documentation

Durable memory for AI agents — temporal knowledge graph, hybrid retrieval, SQLite or PostgreSQL.

crates.io (lib) crates.io (server) Docker MCP Registry License

java-ai-memory.dev · Main repo · JamJet docs · Discord


Engram is a durable memory layer for AI agents. It extracts facts from conversations, stores them in a temporal knowledge graph, and retrieves them with hybrid semantic + keyword search — backed by a single SQLite file or a PostgreSQL database.

It ships in two shapes:

  • jamjet-engram — a Rust library you embed in your own application.
  • jamjet-engram-server (this crate) — a standalone binary that speaks MCP over stdio and REST over HTTP, so Claude Desktop, Cursor, and any HTTP client can use it with no code.

Engram is provider-agnostic. Five LLM backends are wired in out of the box, and a sixth — command — lets you shell out to any external script, so you can plug in a provider Engram does not ship natively without touching Rust code:

ENGRAM_LLM_PROVIDER= What it does
ollama (default) Local Ollama via /api/chat. Free, no API keys, runs on your laptop.
openai-compatible Any endpoint that speaks OpenAI's chat-completions protocol — see the long list below.
anthropic Anthropic Claude via the Messages API.
google Google Gemini via generateContent with native JSON mode.
command Shell out to a user-supplied script. Infinite extensibility, zero recompile.
mock Deterministic tests-only backend — returns empty facts.

Pick one with ENGRAM_LLM_PROVIDER=… — the same binary handles all of them.

State of the project, April 2026. Engram is new — v0.3.2, small community, no public LongMemEval / DMR numbers yet. The architecture below works, the tests pass, the Docker image runs. If you need production-scale memory today, Mem0 Cloud and Zep Cloud are more mature. If you need a tryable, self-hostable, single-binary memory layer that doesn't require Python, Postgres, Qdrant, or Neo4j, Engram is built for you.

Why Engram?

Problem Engram's answer
Every agent memory library is Python-first Rust core with native Python, Java, and MCP clients — no sidecar required
Needs Postgres + Qdrant + Neo4j just to try Single SQLite file (zero infra) or Postgres when you need it
Conversation history is not knowledge memory Fact extraction pipeline — pulls structured facts out of messages
Old facts drift and contradict each other Conflict detection + consolidation — decay, promote, dedup, summarize, reflect
Memory recall is either semantic OR keyword Hybrid retrieval — vector search + SQLite FTS5 in one query
Agents lose memory across processes Durable by default — SQLite or Postgres, crash-safe, portable
MCP support is an afterthought MCP-native — 7 tools exposed by a single binary
No time-travel over what the agent knew Temporal knowledge graph — every fact is scoped and timestamped
Can't isolate memory per user or tenant First-class scopes — org / user / session built into every query

Quickstart — 30 seconds

Engram speaks to four LLM providers out of the box: Ollama (default — free, local), OpenAI, Anthropic Claude, and Google Gemini. Pick one.

Option A — Ollama (free, local, zero API keys)

Requirements: Ollama with llama3.2 and nomic-embed-text pulled:

ollama pull llama3.2
ollama pull nomic-embed-text

Docker (easiest):

docker run --rm -i \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

The container defaults to Ollama at host.docker.internal:11434 — works out of the box on Docker Desktop for Mac and Windows. Linux users need --add-host=host.docker.internal:host-gateway.

Option B — Anthropic Claude (hosted, highest quality)

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=anthropic \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to claude-haiku-4-5-20251001. Override with -e ENGRAM_ANTHROPIC_MODEL=claude-sonnet-4-6.

Note: Anthropic has no native JSON mode, so Engram parses JSON from text responses. The llm_util::extract_json_payload helper strips markdown fences that Claude occasionally emits.

Option C — Google Gemini

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=google \
  -e GOOGLE_API_KEY=AIza... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to gemini-flash-latest. Override with -e ENGRAM_GOOGLE_MODEL=gemini-2.5-flash.

Option D — Any OpenAI-compatible endpoint

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=openai-compatible \
  -e OPENAI_API_KEY=sk-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to OpenAI itself (https://api.openai.com/v1, gpt-4o-mini). Change one env var to point at any of these providers without recompiling:

Provider ENGRAM_OPENAI_BASE_URL= Note
OpenAI https://api.openai.com/v1 default
Azure OpenAI https://<resource>.openai.azure.com/openai/deployments/<deployment>
Groq https://api.groq.com/openai/v1 very fast inference
Together.ai https://api.together.xyz/v1
Mistral https://api.mistral.ai/v1
DeepSeek https://api.deepseek.com/v1
Perplexity https://api.perplexity.ai
OpenRouter https://openrouter.ai/api/v1 one key, many models
Fireworks https://api.fireworks.ai/inference/v1
vLLM (self-hosted) http://your-host:8000/v1
LM Studio http://localhost:1234/v1 local desktop app
LocalAI http://localhost:8080/v1 self-hosted
Ollama /v1 compat layer http://localhost:11434/v1 alternate path for Ollama
Your corporate LLM gateway whatever URL your infra team exposes

openai is accepted as a backwards-compatible alias for openai-compatible.

Option E — command (shell out to anything)

For providers Engram does not ship natively — an internal model behind a custom RPC, a raw SOAP endpoint, a local inference binary, a quick wrapper over a Python SDK, or tomorrow's new provider — use the command backend. Engram will spawn your script per extraction call, pipe a JSON request to its stdin, and read a JSON response from its stdout.

The contract. Your script reads one JSON object from stdin:

{"system": "<extraction prompt>", "user": "<conversation text>", "structured": true}

Then writes one JSON value to stdout. Either a bare value:

{"facts": [{"text": "...", "entities": [...], "confidence": 0.95, "category": "..."}]}

Or an envelope (useful for surfacing errors):

{"content": {"facts": [...]}}
{"error": "rate limited, try again"}

Exit 0 on success, non-zero (with stderr) on failure.

Example Python wrapper (~15 lines):

#!/usr/bin/env python3
# my-llm.py — wraps any Python SDK for Engram.
import json, sys
from my_llm_sdk import chat  # your SDK

req = json.loads(sys.stdin.read())
resp = chat(
    system=req["system"],
    user=req["user"],
    json_mode=req.get("structured", False),
)
sys.stdout.write(json.dumps(resp))

Point Engram at it:

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=command \
  -e ENGRAM_LLM_COMMAND="python /scripts/my-llm.py" \
  -v $(pwd)/scripts:/scripts \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Timeout is 120 seconds by default — override with ENGRAM_LLM_COMMAND_TIMEOUT=30.

Security: the command provider runs arbitrary commands as the Engram process user. Never use it in a multi-tenant deployment where untrusted users can set ENGRAM_LLM_COMMAND. It is a local and single-tenant feature.

Option F — cargo install (all providers same binary)

cargo install jamjet-engram-server
engram serve                                            # ollama (default)
ENGRAM_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=... engram serve
ENGRAM_LLM_PROVIDER=command ENGRAM_LLM_COMMAND="python /path/to/wrapper.py" engram serve

Option G — REST mode for testing

engram serve --mode rest --port 9090
curl -X POST http://localhost:9090/v1/memory \
  -H 'content-type: application/json' \
  -d '{
    "user_id": "alice",
    "messages": [
      {"role": "user", "content": "I am allergic to peanuts and I love sourdough."}
    ]
  }'

curl 'http://localhost:9090/v1/memory/recall?q=food%20allergies&user_id=alice'

No server. No config. No Python sidecar. One binary.


MCP client configuration

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json)

{
  "mcpServers": {
    "engram": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-v", "engram-data:/data",
        "ghcr.io/jamjet-labs/engram-server:0.3.2"
      ]
    }
  }
}

Cursor / any MCP-aware IDE

Point it at the same docker run command, or at a locally-installed engram serve binary. After restart, eleven tools are available to the model (seven memory_* + four messages_*).


MCP tools

Memory tools (7)

Tool What it does
memory_add Extract and store facts from conversation messages (calls the LLM for extraction)
memory_recall Semantic search over stored facts, scoped by user_id / org_id
memory_context Assemble a token-budgeted context block for an LLM prompt, with tier-aware selection
memory_search Keyword search over facts (SQLite FTS5 / Postgres full-text)
memory_forget Soft-delete a fact by ID with an optional reason
memory_stats Aggregate counts: total facts, valid facts, entities, relationships
memory_consolidate Run a consolidation cycle — decay stale facts, promote high-value ones, dedup near-duplicates

All scoped by (org_id, user_id, session_id) — org is the coarsest, session the finest.

Message store tools (4)

Tool What it does
messages_save Save chat messages for a conversation (optionally triggers fact extraction)
messages_get Get all messages for a conversation by ID
messages_list List all conversation IDs
messages_delete Delete all messages for a conversation

REST API

Thirteen endpoints, rooted at /v1/memory. Full OpenAPI-style surface:

Method Path Handler
GET /health Liveness probe
POST /v1/memory Add messages (fact extraction)
GET /v1/memory/recall?q=…&user_id=… Semantic recall
POST /v1/memory/context Token-budgeted context assembly
GET /v1/memory/search?q=…&user_id=… Keyword search
GET /v1/memory/stats Aggregate statistics
POST /v1/memory/consolidate Trigger consolidation
DELETE /v1/memory/facts/:id Forget a fact
DELETE /v1/memory/users/:id GDPR user-data delete
POST /v1/memory/messages Save messages for a conversation
GET /v1/memory/messages List conversation IDs
GET /v1/memory/messages/:id Get messages for a conversation
DELETE /v1/memory/messages/:id Delete a conversation

Configuration

All settings flow through CLI flags or environment variables. Env vars are the recommended way to configure the Docker image.

Core

CLI flag Env var Default Notes
--db ENGRAM_DB_PATH engram.db SQLite file path or postgres://… connection URL
--extract-on-save ENGRAM_EXTRACT_ON_SAVE true Enable fact extraction when saving chat messages
--mode ENGRAM_MODE mcp mcp (stdio) or rest (HTTP)
--port ENGRAM_PORT 9090 HTTP port in REST mode
--llm-provider ENGRAM_LLM_PROVIDER ollama ollama, openai-compatible (alias openai), anthropic, google, command, mock
--embedding-provider ENGRAM_EMBEDDING_PROVIDER ollama ollama or mock (cloud embedding providers are planned)
--embedding-model ENGRAM_EMBEDDING_MODEL nomic-embed-text Ollama embedding model
--embedding-dims ENGRAM_EMBEDDING_DIMS 768 Must match the embedding model's output

Ollama

CLI flag Env var Default
--ollama-url ENGRAM_OLLAMA_URL http://localhost:11434 (Docker image: http://host.docker.internal:11434)
--ollama-llm-model ENGRAM_OLLAMA_LLM_MODEL llama3.2

OpenAI-compatible (OpenAI, Groq, Together, Mistral, DeepSeek, Azure, vLLM, …)

CLI flag Env var Default
--openai-api-key OPENAI_API_KEY (required when provider is openai-compatible)
--openai-base-url ENGRAM_OPENAI_BASE_URL https://api.openai.com/v1change this to switch provider, see the big table above
--openai-model ENGRAM_OPENAI_MODEL gpt-4o-mini

Anthropic

CLI flag Env var Default
--anthropic-api-key ANTHROPIC_API_KEY (required when provider is anthropic)
--anthropic-base-url ENGRAM_ANTHROPIC_BASE_URL https://api.anthropic.com
--anthropic-model ENGRAM_ANTHROPIC_MODEL claude-haiku-4-5-20251001

Google Gemini

CLI flag Env var Default
--google-api-key GOOGLE_API_KEY (required when provider is google)
--google-base-url ENGRAM_GOOGLE_BASE_URL https://generativelanguage.googleapis.com/v1beta
--google-model ENGRAM_GOOGLE_MODEL gemini-flash-latest

Command (shell-out)

CLI flag Env var Default
--llm-command ENGRAM_LLM_COMMAND (required when provider is command) — run via sh -c
--llm-command-timeout ENGRAM_LLM_COMMAND_TIMEOUT 120 — seconds before the child is killed

The mock backend returns empty facts and deterministic byte-cycled vectors. It exists for tests and CI. The server prints a clear warning on startup if it detects mock mode in use. API keys and command contents are never echoed to stdout or logs — the server only describes provider model at base_url or command \<first 60 chars>``.


Database Backend

Engram supports SQLite (default) and PostgreSQL backends. The backend is selected by the --db URL:

# SQLite (default)
engram serve --mode rest --db engram.db

# PostgreSQL
engram serve --mode rest --db postgres://user:pass@localhost:5432/engram

When using PostgreSQL, tables are created automatically on first startup.


Chat Message Store

Engram can store raw chat messages alongside extracted facts. This enables Spring AI ChatMemoryRepository integration.

REST Endpoints:

Method Endpoint Description
POST /v1/memory/messages Save messages for a conversation
GET /v1/memory/messages List conversation IDs
GET /v1/memory/messages/{id} Get messages for a conversation
DELETE /v1/memory/messages/{id} Delete a conversation

MCP Tools: messages_save, messages_get, messages_list, messages_delete

Fact extraction on message save is controlled by --extract-on-save (default: true).


Embedding the library

If you don't want a separate process, depend on jamjet-engram directly:

[dependencies]
jamjet-engram = "0.3.2"
use engram::{Memory, OllamaEmbeddingProvider, OllamaLlmClient, Scope, ExtractionConfig, Message};

let embedding = Box::new(OllamaEmbeddingProvider::new());
let memory = Memory::open("sqlite:engram.db?mode=rwc", embedding).await?;

let scope = Scope::user("default", "alice");
let messages = vec![Message {
    role: "user".into(),
    content: "I am allergic to peanuts.".into(),
}];

let llm = Box::new(OllamaLlmClient::new());
let fact_ids = memory
    .add_messages(&messages, scope.clone(), llm, ExtractionConfig::default())
    .await?;

// Later, recall
let facts = memory.recall(&engram::memory::RecallQuery {
    query: "food allergies".into(),
    scope: Some(scope),
    max_results: 5,
    as_of: None,
    min_score: None,
}).await?;

Other languages

Engram is part of the JamJet ecosystem. These clients speak to engram-server over REST:

Language Package Install
Python jamjet (includes EngramClient) pip install jamjet
Java dev.jamjet:jamjet-sdk (includes EngramClient) Maven Central
Spring Boot dev.jamjet:engram-spring-boot-starter Maven Central — zero-config @Bean EngramMemory
Rust jamjet-engram (embed directly) cargo add jamjet-engram

See java-ai-memory.dev for how Engram compares to LangChain4j ChatMemory, Spring AI ChatMemory, Koog, Embabel, Google ADK Memory Bank, the Mem0 community wrapper, and Zep — with honest notes on where Engram fits and where the alternatives are more mature.


Architecture

┌──────────────────────────────────────────────────────────┐
│           Clients: Claude Desktop, Cursor, JamJet,        │
│           Python SDK, Java SDK, Spring Boot, cURL         │
├─────────────────┬────────────────────────────────────────┤
│   MCP (stdio)   │           REST (HTTP + JSON)            │
├─────────────────┴────────────────────────────────────────┤
│                     engram-server                         │
│                 (jamjet-engram-server)                    │
├──────────────────────────────────────────────────────────┤
│                       engram (lib)                        │
│   Extraction pipeline  │  Hybrid retrieval  │  Scopes     │
│   Conflict detection   │  Consolidation     │  Context    │
├──────────────────────────────────────────────────────────┤
│    LlmClient trait        │    EmbeddingProvider trait    │
│    Ollama · Mock          │    Ollama · Mock              │
├──────────────────────────────────────────────────────────┤
│                       Storage                             │
│   SQLite or PostgreSQL (facts, entities, FTS, vectors)   │
└──────────────────────────────────────────────────────────┘

What Engram is not

  • Not just a chat-history store. Engram now includes a message store (see Chat Message Store), but its primary value is the fact extraction and knowledge graph layer on top. If all you need is the last N messages of a conversation, a framework's built-in window may be simpler.
  • Not a state checkpointer. If you need to snapshot agent execution state for resume and replay, that's what LangGraph, Koog persistence, or JamJet's own durable runtime does. Pair them with Engram — they solve different problems.
  • Not a managed service. No hosted plane, no auth layer beyond scopes, no SLA. Bring your own process manager.
  • Not benchmarked yet. LongMemEval and DMR numbers are on the roadmap. Until they exist, treat comparative claims with the skepticism they deserve.

Roadmap

Status Item
Fact extraction pipeline + SQLite storage
Hybrid retrieval (vector + FTS5 with prefix matching)
Consolidation engine (decay, promote, dedup, summarize, reflect)
MCP stdio server (7 tools)
REST API (9 endpoints)
Multi-provider LLM backends: Ollama, OpenAI-compatible (OpenAI/Azure/Groq/Together/Mistral/vLLM/LM Studio/…), Anthropic, Google, command shell-out
Python, Java, Spring Boot clients
Docker image, MCP Registry publish
Postgres backend (in parallel to SQLite)
Chat message store + Spring AI ChatMemoryRepository integration
📋 LongMemEval + DMR benchmark scores
📋 Quarkus extension
📋 Cloud embedding providers (OpenAI text-embedding-3, Google text-embedding-004)

License

Apache 2.0 — see LICENSE.