jamjet-engram-server 0.3.2

Durable memory for AI agents — temporal knowledge graph, hybrid retrieval, SQLite-backed.

java-ai-memory.dev · Main repo · JamJet docs · Discord

Engram is a durable memory layer for AI agents. It extracts facts from conversations, stores them in a temporal knowledge graph, and retrieves them with hybrid semantic + keyword search — all backed by a single SQLite file.

It ships in two shapes:

jamjet-engram — a Rust library you embed in your own application.
jamjet-engram-server (this crate) — a standalone binary that speaks MCP over stdio and REST over HTTP, so Claude Desktop, Cursor, and any HTTP client can use it with no code.

Engram is provider-agnostic. Five LLM backends are wired in out of the box, and a sixth — command — lets you shell out to any external script, so you can plug in a provider Engram does not ship natively without touching Rust code:

`ENGRAM_LLM_PROVIDER=`	What it does
`ollama` (default)	Local Ollama via `/api/chat`. Free, no API keys, runs on your laptop.
`openai-compatible`	Any endpoint that speaks OpenAI's chat-completions protocol — see the long list below.
`anthropic`	Anthropic Claude via the Messages API.
`google`	Google Gemini via `generateContent` with native JSON mode.
`command`	Shell out to a user-supplied script. Infinite extensibility, zero recompile.
`mock`	Deterministic tests-only backend — returns empty facts.

Pick one with ENGRAM_LLM_PROVIDER=… — the same binary handles all of them.

State of the project, April 2026. Engram is new — v0.3.2, small community, no public LongMemEval / DMR numbers yet. The architecture below works, the tests pass, the Docker image runs. If you need production-scale memory today, Mem0 Cloud and Zep Cloud are more mature. If you need a tryable, self-hostable, single-binary memory layer that doesn't require Python, Postgres, Qdrant, or Neo4j, Engram is built for you.

Why Engram?

Problem	Engram's answer
Every agent memory library is Python-first	Rust core with native Python, Java, and MCP clients — no sidecar required
Needs Postgres + Qdrant + Neo4j just to try	Single SQLite file, zero infra
Conversation history is not knowledge memory	Fact extraction pipeline — pulls structured facts out of messages
Old facts drift and contradict each other	Conflict detection + consolidation — decay, promote, dedup, summarize, reflect
Memory recall is either semantic OR keyword	Hybrid retrieval — vector search + SQLite FTS5 in one query
Agents lose memory across processes	Durable by default — one SQLite file, crash-safe, portable
MCP support is an afterthought	MCP-native — 7 tools exposed by a single binary
No time-travel over what the agent knew	Temporal knowledge graph — every fact is scoped and timestamped
Can't isolate memory per user or tenant	First-class scopes — org / user / session built into every query

Quickstart — 30 seconds

Engram speaks to four LLM providers out of the box: Ollama (default — free, local), OpenAI, Anthropic Claude, and Google Gemini. Pick one.

Option A — Ollama (free, local, zero API keys)

Requirements: Ollama with llama3.2 and nomic-embed-text pulled:

ollama pull llama3.2
ollama pull nomic-embed-text

Docker (easiest):

docker run --rm -i \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

The container defaults to Ollama at host.docker.internal:11434 — works out of the box on Docker Desktop for Mac and Windows. Linux users need --add-host=host.docker.internal:host-gateway.

Option B — Anthropic Claude (hosted, highest quality)

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=anthropic \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to claude-haiku-4-5-20251001. Override with -e ENGRAM_ANTHROPIC_MODEL=claude-sonnet-4-6.

Note: Anthropic has no native JSON mode, so Engram parses JSON from text responses. The llm_util::extract_json_payload helper strips markdown fences that Claude occasionally emits.

Option C — Google Gemini

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=google \
  -e GOOGLE_API_KEY=AIza... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to gemini-flash-latest. Override with -e ENGRAM_GOOGLE_MODEL=gemini-2.5-flash.

Option D — Any OpenAI-compatible endpoint

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=openai-compatible \
  -e OPENAI_API_KEY=sk-... \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Defaults to OpenAI itself (https://api.openai.com/v1, gpt-4o-mini). Change one env var to point at any of these providers without recompiling:

Provider	`ENGRAM_OPENAI_BASE_URL=`	Note
OpenAI	`https://api.openai.com/v1`	default
Azure OpenAI	`https://<resource>.openai.azure.com/openai/deployments/<deployment>`
Groq	`https://api.groq.com/openai/v1`	very fast inference
Together.ai	`https://api.together.xyz/v1`
Mistral	`https://api.mistral.ai/v1`
DeepSeek	`https://api.deepseek.com/v1`
Perplexity	`https://api.perplexity.ai`
OpenRouter	`https://openrouter.ai/api/v1`	one key, many models
Fireworks	`https://api.fireworks.ai/inference/v1`
vLLM (self-hosted)	`http://your-host:8000/v1`
LM Studio	`http://localhost:1234/v1`	local desktop app
LocalAI	`http://localhost:8080/v1`	self-hosted
Ollama `/v1` compat layer	`http://localhost:11434/v1`	alternate path for Ollama
Your corporate LLM gateway	whatever URL your infra team exposes

openai is accepted as a backwards-compatible alias for openai-compatible.

Option E — `command` (shell out to anything)

For providers Engram does not ship natively — an internal model behind a custom RPC, a raw SOAP endpoint, a local inference binary, a quick wrapper over a Python SDK, or tomorrow's new provider — use the command backend. Engram will spawn your script per extraction call, pipe a JSON request to its stdin, and read a JSON response from its stdout.

The contract. Your script reads one JSON object from stdin:

{"system": "<extraction prompt>", "user": "<conversation text>", "structured": true}

Then writes one JSON value to stdout. Either a bare value:

{"facts": [{"text": "...", "entities": [...], "confidence": 0.95, "category": "..."}]}

Or an envelope (useful for surfacing errors):

{"content": {"facts": [...]}}
{"error": "rate limited, try again"}

Exit 0 on success, non-zero (with stderr) on failure.

Example Python wrapper (~15 lines):

#!/usr/bin/env python3
# my-llm.py — wraps any Python SDK for Engram.
import json, sys
from my_llm_sdk import chat  # your SDK

req = json.loads(sys.stdin.read())
resp = chat(
    system=req["system"],
    user=req["user"],
    json_mode=req.get("structured", False),
)
sys.stdout.write(json.dumps(resp))

Point Engram at it:

docker run --rm -i \
  -e ENGRAM_LLM_PROVIDER=command \
  -e ENGRAM_LLM_COMMAND="python /scripts/my-llm.py" \
  -v $(pwd)/scripts:/scripts \
  -v engram-data:/data \
  ghcr.io/jamjet-labs/engram-server:0.3.2

Timeout is 120 seconds by default — override with ENGRAM_LLM_COMMAND_TIMEOUT=30.

Security: the command provider runs arbitrary commands as the Engram process user. Never use it in a multi-tenant deployment where untrusted users can set ENGRAM_LLM_COMMAND. It is a local and single-tenant feature.

Option F — cargo install (all providers same binary)

cargo install jamjet-engram-server
engram serve                                            # ollama (default)
ENGRAM_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=... engram serve
ENGRAM_LLM_PROVIDER=command ENGRAM_LLM_COMMAND="python /path/to/wrapper.py" engram serve

Option G — REST mode for testing

engram serve --mode rest --port 9090

curl -X POST http://localhost:9090/v1/memory \
  -H 'content-type: application/json' \
  -d '{
    "user_id": "alice",
    "messages": [
      {"role": "user", "content": "I am allergic to peanuts and I love sourdough."}
    ]
  }'

curl 'http://localhost:9090/v1/memory/recall?q=food%20allergies&user_id=alice'

No server. No config. No Python sidecar. One binary.

MCP client configuration

Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`)

{
  "mcpServers": {
    "engram": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-v", "engram-data:/data",
        "ghcr.io/jamjet-labs/engram-server:0.3.2"
      ]
    }
  }
}

Cursor / any MCP-aware IDE

Point it at the same docker run command, or at a locally-installed engram serve binary. After restart, seven memory_* tools are available to the model.

The seven MCP tools

Tool	What it does
`memory_add`	Extract and store facts from conversation messages (calls the LLM for extraction)
`memory_recall`	Semantic search over stored facts, scoped by `user_id` / `org_id`
`memory_context`	Assemble a token-budgeted context block for an LLM prompt, with tier-aware selection
`memory_search`	Keyword search over facts (SQLite FTS5)
`memory_forget`	Soft-delete a fact by ID with an optional reason
`memory_stats`	Aggregate counts: total facts, valid facts, entities, relationships
`memory_consolidate`	Run a consolidation cycle — decay stale facts, promote high-value ones, dedup near-duplicates

All scoped by (org_id, user_id, session_id) — org is the coarsest, session the finest.

REST API

Nine endpoints, rooted at /v1/memory. Full OpenAPI-style surface:

Method	Path	Handler
GET	`/health`	Liveness probe
POST	`/v1/memory`	Add messages (fact extraction)
GET	`/v1/memory/recall?q=…&user_id=…`	Semantic recall
POST	`/v1/memory/context`	Token-budgeted context assembly
GET	`/v1/memory/search?q=…&user_id=…`	Keyword search
GET	`/v1/memory/stats`	Aggregate statistics
POST	`/v1/memory/consolidate`	Trigger consolidation
DELETE	`/v1/memory/facts/:id`	Forget a fact
DELETE	`/v1/memory/users/:id`	GDPR user-data delete

Configuration

All settings flow through CLI flags or environment variables. Env vars are the recommended way to configure the Docker image.

Core

CLI flag	Env var	Default	Notes
`--db`	`ENGRAM_DB_PATH`	`engram.db`	SQLite file path
`--mode`	`ENGRAM_MODE`	`mcp`	`mcp` (stdio) or `rest` (HTTP)
`--port`	`ENGRAM_PORT`	`9090`	HTTP port in REST mode
`--llm-provider`	`ENGRAM_LLM_PROVIDER`	`ollama`	`ollama`, `openai-compatible` (alias `openai`), `anthropic`, `google`, `command`, `mock`
`--embedding-provider`	`ENGRAM_EMBEDDING_PROVIDER`	`ollama`	`ollama` or `mock` (cloud embedding providers are planned)
`--embedding-model`	`ENGRAM_EMBEDDING_MODEL`	`nomic-embed-text`	Ollama embedding model
`--embedding-dims`	`ENGRAM_EMBEDDING_DIMS`	`768`	Must match the embedding model's output

Ollama

CLI flag	Env var	Default
`--ollama-url`	`ENGRAM_OLLAMA_URL`	`http://localhost:11434` (Docker image: `http://host.docker.internal:11434`)
`--ollama-llm-model`	`ENGRAM_OLLAMA_LLM_MODEL`	`llama3.2`

OpenAI-compatible (OpenAI, Groq, Together, Mistral, DeepSeek, Azure, vLLM, …)

CLI flag	Env var	Default
`--openai-api-key`	`OPENAI_API_KEY`	(required when provider is `openai-compatible`)
`--openai-base-url`	`ENGRAM_OPENAI_BASE_URL`	`https://api.openai.com/v1` — change this to switch provider, see the big table above
`--openai-model`	`ENGRAM_OPENAI_MODEL`	`gpt-4o-mini`

Anthropic

CLI flag	Env var	Default
`--anthropic-api-key`	`ANTHROPIC_API_KEY`	(required when provider is `anthropic`)
`--anthropic-base-url`	`ENGRAM_ANTHROPIC_BASE_URL`	`https://api.anthropic.com`
`--anthropic-model`	`ENGRAM_ANTHROPIC_MODEL`	`claude-haiku-4-5-20251001`

Google Gemini

CLI flag	Env var	Default
`--google-api-key`	`GOOGLE_API_KEY`	(required when provider is `google`)
`--google-base-url`	`ENGRAM_GOOGLE_BASE_URL`	`https://generativelanguage.googleapis.com/v1beta`
`--google-model`	`ENGRAM_GOOGLE_MODEL`	`gemini-flash-latest`

Command (shell-out)

CLI flag	Env var	Default
`--llm-command`	`ENGRAM_LLM_COMMAND`	(required when provider is `command`) — run via `sh -c`
`--llm-command-timeout`	`ENGRAM_LLM_COMMAND_TIMEOUT`	`120` — seconds before the child is killed

The mock backend returns empty facts and deterministic byte-cycled vectors. It exists for tests and CI. The server prints a clear warning on startup if it detects mock mode in use. API keys and command contents are never echoed to stdout or logs — the server only describes provider model at base_url or command \<first 60 chars>``.

Embedding the library

If you don't want a separate process, depend on jamjet-engram directly:

[dependencies]
jamjet-engram = "0.3.2"

use engram::{Memory, OllamaEmbeddingProvider, OllamaLlmClient, Scope, ExtractionConfig, Message};

let embedding = Box::new(OllamaEmbeddingProvider::new());
let memory = Memory::open("sqlite:engram.db?mode=rwc", embedding).await?;

let scope = Scope::user("default", "alice");
let messages = vec![Message {
    role: "user".into(),
    content: "I am allergic to peanuts.".into(),
}];

let llm = Box::new(OllamaLlmClient::new());
let fact_ids = memory
    .add_messages(&messages, scope.clone(), llm, ExtractionConfig::default())
    .await?;

// Later, recall
let facts = memory.recall(&engram::memory::RecallQuery {
    query: "food allergies".into(),
    scope: Some(scope),
    max_results: 5,
    as_of: None,
    min_score: None,
}).await?;

Other languages

Engram is part of the JamJet ecosystem. These clients speak to engram-server over REST:

Language	Package	Install
Python	`jamjet` (includes `EngramClient`)	`pip install jamjet`
Java	`dev.jamjet:jamjet-sdk` (includes `EngramClient`)	Maven Central
Spring Boot	`dev.jamjet:engram-spring-boot-starter`	Maven Central — zero-config `@Bean EngramMemory`
Rust	`jamjet-engram` (embed directly)	`cargo add jamjet-engram`

See java-ai-memory.dev for how Engram compares to LangChain4j ChatMemory, Spring AI ChatMemory, Koog, Embabel, Google ADK Memory Bank, the Mem0 community wrapper, and Zep — with honest notes on where Engram fits and where the alternatives are more mature.

Architecture

┌──────────────────────────────────────────────────────────┐
│           Clients: Claude Desktop, Cursor, JamJet,        │
│           Python SDK, Java SDK, Spring Boot, cURL         │
├─────────────────┬────────────────────────────────────────┤
│   MCP (stdio)   │           REST (HTTP + JSON)            │
├─────────────────┴────────────────────────────────────────┤
│                     engram-server                         │
│                 (jamjet-engram-server)                    │
├──────────────────────────────────────────────────────────┤
│                       engram (lib)                        │
│   Extraction pipeline  │  Hybrid retrieval  │  Scopes     │
│   Conflict detection   │  Consolidation     │  Context    │
├──────────────────────────────────────────────────────────┤
│    LlmClient trait        │    EmbeddingProvider trait    │
│    Ollama · Mock          │    Ollama · Mock              │
├──────────────────────────────────────────────────────────┤
│                       Storage                             │
│       SQLite (facts, entities, FTS5, vectors)            │
└──────────────────────────────────────────────────────────┘

What Engram is not

Not a chat-history store. If all you need is the last N messages of a conversation, use LangChain4j ChatMemory, Spring AI ChatMemory, or any framework's built-in window.
Not a state checkpointer. If you need to snapshot agent execution state for resume and replay, that's what LangGraph, Koog persistence, or JamJet's own durable runtime does. Pair them with Engram — they solve different problems.
Not a managed service. No hosted plane, no auth layer beyond scopes, no SLA. Bring your own process manager.
Not benchmarked yet. LongMemEval and DMR numbers are on the roadmap. Until they exist, treat comparative claims with the skepticism they deserve.

Roadmap

Status	Item
✅	Fact extraction pipeline + SQLite storage
✅	Hybrid retrieval (vector + FTS5 with prefix matching)
✅	Consolidation engine (decay, promote, dedup, summarize, reflect)
✅	MCP stdio server (7 tools)
✅	REST API (9 endpoints)
✅	Multi-provider LLM backends: Ollama, OpenAI-compatible (OpenAI/Azure/Groq/Together/Mistral/vLLM/LM Studio/…), Anthropic, Google, `command` shell-out
✅	Python, Java, Spring Boot clients
✅	Docker image, MCP Registry publish
🔄	Postgres backend (in parallel to SQLite)
🔄	Spring AI `ChatMemoryRepository` implementation
📋	LongMemEval + DMR benchmark scores
📋	Quarkus extension
📋	Cloud embedding providers (OpenAI `text-embedding-3`, Google `text-embedding-004`)

License

Apache 2.0 — see LICENSE.