# sqlrite-agent — a Python LLM agent with persistent memory in SQLRite
A small CLI chat agent whose entire long-term memory lives in **one
`.sqlrite` file on disk**. Every turn the agent:
1. Embeds the user's message.
2. Hybrid-searches its memory: top-k vector KNN (HNSW) over past
messages and summaries, plus keyword recall over the structured
`facts` table.
3. Injects the recalled context into the system prompt.
4. Sends prompt + recent turns to an LLM and prints the reply.
5. Writes the new turn back to the same SQLRite database.
Close the process, reopen it days later — your assistant still knows
your dog's name. No Postgres, no Pinecone, no Redis. One file.
> **Why this example?** Single-file embedded storage is the *right*
> architecture for a local agent, and SQLRite's HNSW vector index +
> structured SQL gives you semantic *and* deterministic recall from one
> store. This is a place where the database genuinely fits the
> workload, not just a demo.
## Architecture
```mermaid
flowchart LR
User[/"User input"/] --> Embed["Embedder<br/>(hash / OpenAI / local)"]
Embed --> Recall["Memory.recall()"]
Recall -->|vector KNN| Msgs[("messages<br/>HNSW(embedding)")]
Recall -->|vector KNN| Sums[("summaries<br/>HNSW(embedding)")]
Recall -->|keyword| Facts[("facts<br/>SQL")]
Recall --> Prompt["Prompt assembly<br/>(system + recent turns)"]
Prompt --> LLM["LLM<br/>(Anthropic / echo)"]
LLM --> Reply[/"Assistant reply"/]
Reply --> Writeback["Memory.log_message()"]
Writeback --> Msgs
User -.-> Writeback
```
Three tables, one file:
| `messages` | Every user / assistant turn, plus its 384-dim embedding. | HNSW on `embedding` |
| `summaries` | Periodic rollups for old context that's too long to inline. | HNSW on `embedding` |
| `facts` | Structured `(subject, predicate, object)` triples extracted from user turns. | Plain SQL — keyword recall |
## Install
```bash
# 1. Clone the rust_sqlite repo (this example ships inside it).
git clone https://github.com/joaoh82/rust_sqlite
cd rust_sqlite/examples/python-agent
# 2. Create a virtualenv and install the example.
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
# 3. (Optional) Install an LLM provider extra. Without one you get
# the offline "echo" agent — the recall pipeline still runs.
pip install 'sqlrite-agent[anthropic]' # default LLM
# pip install 'sqlrite-agent[openai]' # use OpenAI embeddings too
# pip install 'sqlrite-agent[local-embeddings]' # 384-dim sentence-transformer
```
The `sqlrite` Python wheel comes from PyPI automatically (pinned to the
0.10.x release that introduced `VECTOR(N)` + HNSW indexes).
## Run
```bash
# Zero config — runs against a fresh in-memory hash embedder and the
# offline echo "LLM". You see the recall pipeline work end-to-end
# without an API key.
python -m sqlrite_agent
# With Anthropic — set ANTHROPIC_API_KEY and run.
export ANTHROPIC_API_KEY=sk-ant-...
python -m sqlrite_agent
# Pick where the DB lives (default: ~/.sqlrite-agent.sqlrite).
python -m sqlrite_agent --db ./my-agent.sqlrite
# Multiple parallel conversations in one DB.
python -m sqlrite_agent --conversation work
python -m sqlrite_agent --conversation personal
# Force a specific embedder.
python -m sqlrite_agent --embedder local # sentence-transformers
python -m sqlrite_agent --embedder openai # text-embedding-3-small
```
## CLI commands
While the REPL is running, anything starting with `/` is a command:
| `/help` | Show all commands. |
| `/stats` | Counts of messages, summaries, facts. |
| `/facts` | List every extracted fact. |
| `/recent` | Last 10 turns in chronological order. |
| `/recall <query>` | Show what *would* be recalled for a query, without replying. |
| `/summarize` | Summarize the last 20 turns into a single `summaries` row. |
| `/quit` | Exit. `Ctrl-D` also works. |
## 60-second demo script
Run this top-to-bottom to see persistent memory survive a process
restart. Uses the zero-key default (`hash` embedder + `echo` chat).
```bash
# Session 1 — drop some facts, then quit.
$ python -m sqlrite_agent --db agent.sqlrite
sqlrite-agent 0.1.0 — db=agent.sqlrite, ...
loaded memory: 0 messages, 0 summaries, 0 facts.
you> My dog's name is Mochi.
agent> [echo agent ...]
you> Mochi loves carrots more than treats.
agent> [echo agent ...]
you> I live in Lisbon, Portugal.
agent> [echo agent ...]
you> /facts
user.dog.name = Mochi
user.location = Lisbon, Portugal
you> /quit
# Session 2 — fresh process, same DB.
$ python -m sqlrite_agent --db agent.sqlrite
sqlrite-agent 0.1.0 — db=agent.sqlrite, ...
loaded memory: 6 messages, 0 summaries, 2 facts.
you> What's my dog's name?
[recalled: 1 facts, 0 summaries, 4 messages]
agent> [echo agent ... — but the recall block above includes
user.dog.name = Mochi]
```
With `ANTHROPIC_API_KEY` set, the second turn answers "Mochi" instead
of the canned echo because the LLM sees the recalled fact in its
system prompt.
## Open the DB yourself with the SQLRite REPL
The memory file is plain SQLRite — open it from anywhere:
```bash
$ cargo install sqlrite-engine # or grab a binary from GitHub Releases
$ sqlrite agent.sqlrite
SQLRite v0.10.0
sqlrite> SELECT role, content FROM messages ORDER BY id LIMIT 5;
sqlrite> SELECT subject, predicate, object FROM facts;
sqlrite> SELECT id, content
...> FROM messages
...> ORDER BY vec_distance_cosine(embedding, (SELECT embedding FROM messages WHERE id = 1))
...> LIMIT 3;
```
This is the demo's whole point: **the agent's memory is just SQL**.
You can query it, back it up, copy it between machines, or load it
into the Node / Go / WASM SDK without converting anything.
## How recall works
`Memory.recall(query)` runs three searches in parallel and merges the
results. Pseudocode:
```python
embedding = embedder.embed(query)
keywords = query_keywords(query) # filtered to content words
vector_messages = SELECT ... FROM messages
ORDER BY vec_distance_cosine(embedding, ?)
LIMIT k_messages
vector_summaries = SELECT ... FROM summaries
ORDER BY vec_distance_cosine(embedding, ?)
LIMIT k_summaries
lexical_messages = SELECT ... FROM messages
WHERE fts_match(content, ?)
ORDER BY bm25_score(content, ?) DESC
LIMIT k_messages -- Phase 8, BM25 over the inverted index
facts = SELECT * FROM facts
WHERE subject LIKE ...
OR predicate LIKE ...
OR object LIKE ...
LIMIT k_facts
```
The vector and lexical message lists are merged in Python (dedupe by
`id`, vector ranking primary) — that's the simplest correct shape for
hybrid retrieval: vector finds conceptual neighbors even with zero
lexical overlap, and BM25 surfaces exact-term matches the vector
embedding might rank too low. See [`docs/fts.md`](../../docs/fts.md)
for the BM25 surface and [`examples/hybrid-retrieval/`](../hybrid-retrieval/)
for an example that fuses both into a single `ORDER BY` arithmetic.
## Embedding-provider tradeoffs
| `hash` (default) | None — stdlib only | No | Bag-of-words approximation only. Good enough for the demo, mediocre for real RAG. | Zero. |
| `openai` | `openai` package | `OPENAI_API_KEY` | Excellent. `text-embedding-3-small` constrained to 384 dims. | ~30s install. |
| `local` | `sentence-transformers` (~500 MB with torch) | No | Excellent. `all-MiniLM-L6-v2`, fully offline. | ~5 min install. |
Swap with `--embedder hash | openai | local`. The dimension is fixed
at 384 to match `VECTOR(384)` in the schema; if you need a different
dim, change `DEFAULT_DIM` in `sqlrite_agent/db.py` and start with a
fresh DB.
## Known simplifications
This is an *example*, not a production agent. Things v1 punts on:
- **Memory eviction.** No automatic rolling-window or summarize-and-evict
loop yet — run `/summarize` manually when the conversation grows
unwieldy.
- **Fact extraction.** Six hand-written regex patterns. A real agent
would call the LLM to extract facts so it catches phrasings the
regex misses. Easy upgrade: wrap an LLM call in `facts.extract_facts`.
- **Single-query hybrid.** The agent merges vector hits and BM25 hits
in Python. The engine also supports a single SQL query that fuses
both into one `ORDER BY` arithmetic (`0.5 * bm25_score(...) + 0.5 *
(1.0 - vec_distance_cosine(...))`) — see [`examples/hybrid-retrieval/`](../hybrid-retrieval/).
The merge approach handles conceptual queries with no token overlap;
the single-query approach is tighter when you always want BM25 to
pre-filter. Pick per workload.
- **Concurrency.** The agent assumes single-user single-process. The
engine supports concurrent reads + a single writer via fs2 advisory
locks; running two `sqlrite-agent` processes against the same DB
works, but they won't see each other's in-flight writes until commit.
## Development
```bash
# Tests run fully offline with the hash embedder and echo chat — no
# API keys, no network.
pip install -e '.[dev]'
pytest
```
## Layout
```
examples/python-agent/
├── pyproject.toml # package metadata + pinned sqlrite dep
├── README.md # this file
├── sqlrite_agent/
│ ├── __init__.py
│ ├── __main__.py # python -m sqlrite_agent → cli.main()
│ ├── agent.py # turn loop, prompt assembly, summarization
│ ├── chat.py # LLM provider abstraction (Anthropic / Echo)
│ ├── cli.py # interactive REPL + slash commands
│ ├── db.py # schema, migrations, all SQL
│ ├── embeddings.py # Embedder abstraction (hash / OpenAI / local)
│ ├── facts.py # regex-based fact extractor
│ ├── memory.py # hybrid recall over messages + summaries + facts
│ └── sqlutil.py # safe SQL-literal inlining
└── tests/ # offline pytest suite (31 tests)
```
The agent binds only to the SQLRite Python SDK's documented public
surface (`sqlrite.connect`, `Connection`, `Cursor`). It does not reach
into internals.
## License
MIT — same as the rest of the rust_sqlite repo.