jamjet-engram-server 0.3.2

Engram MCP server — memory layer for AI agents. MCP stdio + REST API.
Documentation

Durable memory for AI agents — temporal knowledge graph, hybrid retrieval, SQLite-backed.

crates.io (lib) crates.io (server) Docker MCP Registry License

java-ai-memory.dev · Main repo · JamJet docs · Discord


Engram is a durable memory layer for AI agents. It extracts facts from conversations, stores them in a temporal knowledge graph, and retrieves them with hybrid semantic + keyword search — all backed by a single SQLite file.

It ships in two shapes:

  • jamjet-engram — a Rust library you embed in your own application.
  • jamjet-engram-server (this crate) — a standalone binary that speaks MCP over stdio and REST over HTTP, so Claude Desktop, Cursor, and any HTTP client can use it with no code.

Engram is provider-agnostic. Five LLM backends are wired in out of the box, and a sixth — command — lets you shell out to any external script, so you can plug in a provider Engram does not ship natively without touching Rust code:

ENGRAM_LLM_PROVIDER= What it does
ollama (default) Local Ollama via /api/chat. Free, no API keys, runs on your laptop.
openai-compatible Any endpoint that speaks OpenAI's chat-completions protocol — see the long list below.
anthropic Anthropic Claude via the Messages API.
google Google Gemini via generateContent with native JSON mode.
command Shell out to a user-supplied script. Infinite extensibility, zero recompile.
mock Deterministic tests-only backend — returns empty facts.

Pick one with ENGRAM_LLM_PROVIDER=… — the same binary handles all of them.

State of the project, April 2026. Engram is new — v0.3.2, small community, no public LongMemEval / DMR numbers yet. The architecture below works, the tests pass, the Docker image runs. If you need production-scale memory today, Mem0 Cloud and Zep Cloud are more mature. If you need a tryable, self-hostable, single-binary memory layer that doesn't require Python, Postgres, Qdrant, or Neo4j, Engram is built for you.

Why Engram?

Problem Engram's answer
Every agent memory library is Python-first Rust core with native Python, Java, and MCP clients — no sidecar required
Needs Postgres + Qdrant + Neo4j just to try Single SQLite file, zero infra
Conversation history is not knowledge memory Fact extraction pipeline — pulls structured facts out of messages
Old facts drift and contradict each other Conflict detection + consolidation — decay, promote, dedup, summarize, reflect
Memory recall is either semantic OR keyword Hybrid retrieval — vector search + SQLite FTS5 in one query
Agents lose memory across processes Durable by default — one SQLite file, crash-safe, portable
MCP support is an afterthought MCP-native — 7 tools exposed by a single binary
No time-travel over what the agent knew Temporal knowledge graph — every fact is scoped and timestamped
Can't isolate memory per user or tenant First-class scopes — org / user / session built into every query

Quickstart — 30 seconds

Engram speaks to four LLM providers out of the box: Ollama (default — free, local), OpenAI, Anthropic Claude, and Google Gemini. Pick one.

Option A — Ollama (free, local, zero API keys)

Requirements: Ollama with llama3.2 and nomic-embed-text pulled:

ollama pull llama3.2
ollama pull nomic-embed-text

Docker (easiest):

docker run --rm -i \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

The container defaults to Ollama at host.docker.internal:11434 — works out of the box on Docker Desktop for Mac and Windows. Linux users need --add-host=host.docker.internal:host-gateway.

Option B — Anthropic Claude (hosted, highest quality)

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=anthropic \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to claude-haiku-4-5-20251001. Override with -e ENGRAM_ANTHROPIC_MODEL=claude-sonnet-4-6.

Note: Anthropic has no native JSON mode, so Engram parses JSON from text responses. The llm_util::extract_json_payload helper strips markdown fences that Claude occasionally emits.

Option C — Google Gemini

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=google \
  -e GOOGLE_API_KEY=AIza... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to gemini-flash-latest. Override with -e ENGRAM_GOOGLE_MODEL=gemini-2.5-flash.

Option D — Any OpenAI-compatible endpoint

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=openai-compatible \
  -e OPENAI_API_KEY=sk-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to OpenAI itself (https://api.openai.com/v1, gpt-4o-mini). Change one env var to point at any of these providers without recompiling:

Provider ENGRAM_OPENAI_BASE_URL= Note
OpenAI https://api.openai.com/v1 default
Azure OpenAI https://<resource>.openai.azure.com/openai/deployments/<deployment>
Groq https://api.groq.com/openai/v1 very fast inference
Together.ai https://api.together.xyz/v1
Mistral https://api.mistral.ai/v1
DeepSeek https://api.deepseek.com/v1
Perplexity https://api.perplexity.ai
OpenRouter https://openrouter.ai/api/v1 one key, many models
Fireworks https://api.fireworks.ai/inference/v1
vLLM (self-hosted) http://your-host:8000/v1
LM Studio http://localhost:1234/v1 local desktop app
LocalAI http://localhost:8080/v1 self-hosted
Ollama /v1 compat layer http://localhost:11434/v1 alternate path for Ollama
Your corporate LLM gateway whatever URL your infra team exposes

openai is accepted as a backwards-compatible alias for openai-compatible.

Option E — command (shell out to anything)

For providers Engram does not ship natively — an internal model behind a custom RPC, a raw SOAP endpoint, a local inference binary, a quick wrapper over a Python SDK, or tomorrow's new provider — use the command backend. Engram will spawn your script per extraction call, pipe a JSON request to its stdin, and read a JSON response from its stdout.

The contract. Your script reads one JSON object from stdin:

{"system": "<extraction prompt>", "user": "<conversation text>", "structured": true}

Then writes one JSON value to stdout. Either a bare value:

{"facts": [{"text": "...", "entities": [...], "confidence": 0.95, "category": "..."}]}

Or an envelope (useful for surfacing errors):

{"content": {"facts": [...]}}
{"error": "rate limited, try again"}

Exit 0 on success, non-zero (with stderr) on failure.

Example Python wrapper (~15 lines):

#!/usr/bin/env python3
# my-llm.py — wraps any Python SDK for Engram.
import json, sys
from my_llm_sdk import chat  # your SDK

req = json.loads(sys.stdin.read())
resp = chat(
    system=req["system"],
    user=req["user"],
    json_mode=req.get("structured", False),
)
sys.stdout.write(json.dumps(resp))

Point Engram at it:

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=command \
  -e ENGRAM_LLM_COMMAND="python /scripts/my-llm.py" \
  -v $(pwd)/scripts:/scripts \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Timeout is 120 seconds by default — override with ENGRAM_LLM_COMMAND_TIMEOUT=30.

Security: the command provider runs arbitrary commands as the Engram process user. Never use it in a multi-tenant deployment where untrusted users can set ENGRAM_LLM_COMMAND. It is a local and single-tenant feature.

Option F — cargo install (all providers same binary)

cargo install jamjet-engram-server
engram serve                                            # ollama (default)
ENGRAM_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=... engram serve
ENGRAM_LLM_PROVIDER=command ENGRAM_LLM_COMMAND="python /path/to/wrapper.py" engram serve

Option G — REST mode for testing

engram serve --mode rest --port 9090
curl -X POST http://localhost:9090/v1/memory \
  -H 'content-type: application/json' \
  -d '{
    "user_id": "alice",
    "messages": [
      {"role": "user", "content": "I am allergic to peanuts and I love sourdough."}
    ]
  }'

curl 'http://localhost:9090/v1/memory/recall?q=food%20allergies&user_id=alice'

No server. No config. No Python sidecar. One binary.


MCP client configuration

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json)

{
  "mcpServers": {
    "engram": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-v", "engram-data:/data",
        "ghcr.io/jamjet-labs/engram-server:0.3.2"
      ]
    }
  }
}

Cursor / any MCP-aware IDE

Point it at the same docker run command, or at a locally-installed engram serve binary. After restart, seven memory_* tools are available to the model.


The seven MCP tools

Tool What it does
memory_add Extract and store facts from conversation messages (calls the LLM for extraction)
memory_recall Semantic search over stored facts, scoped by user_id / org_id
memory_context Assemble a token-budgeted context block for an LLM prompt, with tier-aware selection
memory_search Keyword search over facts (SQLite FTS5)
memory_forget Soft-delete a fact by ID with an optional reason
memory_stats Aggregate counts: total facts, valid facts, entities, relationships
memory_consolidate Run a consolidation cycle — decay stale facts, promote high-value ones, dedup near-duplicates

All scoped by (org_id, user_id, session_id) — org is the coarsest, session the finest.


REST API

Nine endpoints, rooted at /v1/memory. Full OpenAPI-style surface:

Method Path Handler
GET /health Liveness probe
POST /v1/memory Add messages (fact extraction)
GET /v1/memory/recall?q=…&user_id=… Semantic recall
POST /v1/memory/context Token-budgeted context assembly
GET /v1/memory/search?q=…&user_id=… Keyword search
GET /v1/memory/stats Aggregate statistics
POST /v1/memory/consolidate Trigger consolidation
DELETE /v1/memory/facts/:id Forget a fact
DELETE /v1/memory/users/:id GDPR user-data delete

Configuration

All settings flow through CLI flags or environment variables. Env vars are the recommended way to configure the Docker image.

Core

CLI flag Env var Default Notes
--db ENGRAM_DB_PATH engram.db SQLite file path
--mode ENGRAM_MODE mcp mcp (stdio) or rest (HTTP)
--port ENGRAM_PORT 9090 HTTP port in REST mode
--llm-provider ENGRAM_LLM_PROVIDER ollama ollama, openai-compatible (alias openai), anthropic, google, command, mock
--embedding-provider ENGRAM_EMBEDDING_PROVIDER ollama ollama or mock (cloud embedding providers are planned)
--embedding-model ENGRAM_EMBEDDING_MODEL nomic-embed-text Ollama embedding model
--embedding-dims ENGRAM_EMBEDDING_DIMS 768 Must match the embedding model's output

Ollama

CLI flag Env var Default
--ollama-url ENGRAM_OLLAMA_URL http://localhost:11434 (Docker image: http://host.docker.internal:11434)
--ollama-llm-model ENGRAM_OLLAMA_LLM_MODEL llama3.2

OpenAI-compatible (OpenAI, Groq, Together, Mistral, DeepSeek, Azure, vLLM, …)

CLI flag Env var Default
--openai-api-key OPENAI_API_KEY (required when provider is openai-compatible)
--openai-base-url ENGRAM_OPENAI_BASE_URL https://api.openai.com/v1change this to switch provider, see the big table above
--openai-model ENGRAM_OPENAI_MODEL gpt-4o-mini

Anthropic

CLI flag Env var Default
--anthropic-api-key ANTHROPIC_API_KEY (required when provider is anthropic)
--anthropic-base-url ENGRAM_ANTHROPIC_BASE_URL https://api.anthropic.com
--anthropic-model ENGRAM_ANTHROPIC_MODEL claude-haiku-4-5-20251001

Google Gemini

CLI flag Env var Default
--google-api-key GOOGLE_API_KEY (required when provider is google)
--google-base-url ENGRAM_GOOGLE_BASE_URL https://generativelanguage.googleapis.com/v1beta
--google-model ENGRAM_GOOGLE_MODEL gemini-flash-latest

Command (shell-out)

CLI flag Env var Default
--llm-command ENGRAM_LLM_COMMAND (required when provider is command) — run via sh -c
--llm-command-timeout ENGRAM_LLM_COMMAND_TIMEOUT 120 — seconds before the child is killed

The mock backend returns empty facts and deterministic byte-cycled vectors. It exists for tests and CI. The server prints a clear warning on startup if it detects mock mode in use. API keys and command contents are never echoed to stdout or logs — the server only describes provider model at base_url or command \<first 60 chars>``.


Embedding the library

If you don't want a separate process, depend on jamjet-engram directly:

[dependencies]
jamjet-engram = "0.3.2"
use engram::{Memory, OllamaEmbeddingProvider, OllamaLlmClient, Scope, ExtractionConfig, Message};

let embedding = Box::new(OllamaEmbeddingProvider::new());
let memory = Memory::open("sqlite:engram.db?mode=rwc", embedding).await?;

let scope = Scope::user("default", "alice");
let messages = vec![Message {
    role: "user".into(),
    content: "I am allergic to peanuts.".into(),
}];

let llm = Box::new(OllamaLlmClient::new());
let fact_ids = memory
    .add_messages(&messages, scope.clone(), llm, ExtractionConfig::default())
    .await?;

// Later, recall
let facts = memory.recall(&engram::memory::RecallQuery {
    query: "food allergies".into(),
    scope: Some(scope),
    max_results: 5,
    as_of: None,
    min_score: None,
}).await?;

Other languages

Engram is part of the JamJet ecosystem. These clients speak to engram-server over REST:

Language Package Install
Python jamjet (includes EngramClient) pip install jamjet
Java dev.jamjet:jamjet-sdk (includes EngramClient) Maven Central
Spring Boot dev.jamjet:engram-spring-boot-starter Maven Central — zero-config @Bean EngramMemory
Rust jamjet-engram (embed directly) cargo add jamjet-engram

See java-ai-memory.dev for how Engram compares to LangChain4j ChatMemory, Spring AI ChatMemory, Koog, Embabel, Google ADK Memory Bank, the Mem0 community wrapper, and Zep — with honest notes on where Engram fits and where the alternatives are more mature.


Architecture

┌──────────────────────────────────────────────────────────┐
│           Clients: Claude Desktop, Cursor, JamJet,        │
│           Python SDK, Java SDK, Spring Boot, cURL         │
├─────────────────┬────────────────────────────────────────┤
│   MCP (stdio)   │           REST (HTTP + JSON)            │
├─────────────────┴────────────────────────────────────────┤
│                     engram-server                         │
│                 (jamjet-engram-server)                    │
├──────────────────────────────────────────────────────────┤
│                       engram (lib)                        │
│   Extraction pipeline  │  Hybrid retrieval  │  Scopes     │
│   Conflict detection   │  Consolidation     │  Context    │
├──────────────────────────────────────────────────────────┤
│    LlmClient trait        │    EmbeddingProvider trait    │
│    Ollama · Mock          │    Ollama · Mock              │
├──────────────────────────────────────────────────────────┤
│                       Storage                             │
│       SQLite (facts, entities, FTS5, vectors)            │
└──────────────────────────────────────────────────────────┘

What Engram is not

  • Not a chat-history store. If all you need is the last N messages of a conversation, use LangChain4j ChatMemory, Spring AI ChatMemory, or any framework's built-in window.
  • Not a state checkpointer. If you need to snapshot agent execution state for resume and replay, that's what LangGraph, Koog persistence, or JamJet's own durable runtime does. Pair them with Engram — they solve different problems.
  • Not a managed service. No hosted plane, no auth layer beyond scopes, no SLA. Bring your own process manager.
  • Not benchmarked yet. LongMemEval and DMR numbers are on the roadmap. Until they exist, treat comparative claims with the skepticism they deserve.

Roadmap

Status Item
Fact extraction pipeline + SQLite storage
Hybrid retrieval (vector + FTS5 with prefix matching)
Consolidation engine (decay, promote, dedup, summarize, reflect)
MCP stdio server (7 tools)
REST API (9 endpoints)
Multi-provider LLM backends: Ollama, OpenAI-compatible (OpenAI/Azure/Groq/Together/Mistral/vLLM/LM Studio/…), Anthropic, Google, command shell-out
Python, Java, Spring Boot clients
Docker image, MCP Registry publish
🔄 Postgres backend (in parallel to SQLite)
🔄 Spring AI ChatMemoryRepository implementation
📋 LongMemEval + DMR benchmark scores
📋 Quarkus extension
📋 Cloud embedding providers (OpenAI text-embedding-3, Google text-embedding-004)

License

Apache 2.0 — see LICENSE.