# Lantern
Lantern is a local-first memory engine for agent activity.
It ingests text an agent has touched, chunks it deterministically, and keeps a
full provenance trail — source URI, content hash, byte ranges, ingest time —
alongside a BM25 keyword index. Everything lives in a single SQLite file under
`./.lantern/` so it is easy to inspect, back up, or wipe by hand.
## Thesis
Most memory tools are either chat-memory products, document search tools, or
heavyweight agent frameworks. Lantern is narrower and more durable:
> **a local memory engine for agent activity with provenance-aware search**
Provenance comes first. Every stored chunk can answer *where* it came from,
*when* it was ingested, *what* exact byte range it covers, and *why* a search
result surfaced it.
## Install
### Prebuilt Binaries
Static binaries are available from the [GitLab Releases](https://git.skylantix.com/diogenes/lantern/-/releases):
| Linux x86_64 | `lantern-linux-amd64` |
| Linux aarch64 | `lantern-linux-arm64` |
| macOS aarch64 | `lantern-macos-arm64` |
```bash
# Download and install (Linux amd64 example)
curl -L -o lantern https://git.skylantix.com/diogenes/lantern/-/releases/v0.3.0/downloads/lantern-linux-amd64
chmod +x lantern
sudo mv lantern /usr/local/bin/
```
All Linux binaries are fully static (musl) — no libc or OpenSSL dependency.
SHA256 checksums are attached to each release.
### Build from Source
Requires a recent Rust toolchain (2024 edition):
```bash
cargo build --release
./target/release/lantern --help
```
## Commands
| `lantern init` | Create a local store at `./.lantern/lantern.db` |
| `lantern ingest <path>` | Ingest supported files from a path; respects `.lantern-ignore` (use `--no-ignore` to bypass) |
| `lantern ingest <path> --follow` | Poll `<path>` on an interval (`--follow-interval-secs`, default 5) and re-ingest new or modified files until interrupted |
| `lantern ingest --stdin --uri L` | Ingest piped content under an explicit label |
| `lantern ingest <fifo>` | Auto-detect a named pipe and read it to EOF as a streamed batch (append mode, `fifo://` URI) |
| `lantern embed` | Generate embeddings for chunks via Ollama (`--model`, `--ollama-url`, `--limit`) |
| `lantern mcp` | Run the MCP server over stdio or TCP (`--port`) |
| `lantern search <query>` | BM25 keyword search with `--kind`, `--path`, `--limit` filters |
| `lantern search --semantic <q>` | Semantic search via Ollama embeddings (cosine similarity; auto-uses sqlite-vec when eligible) |
| `lantern search --vec-semantic <q>` | Force the sqlite-vec-backed semantic path for the default model |
| `lantern search --hybrid <q>` | Hybrid keyword + semantic search via Reciprocal Rank Fusion |
| `lantern query <q>` | Alias for `search` tuned for broader exploration (limit 20, summary format) |
| `lantern show <id>` | Full provenance, chunk text, confidence breakdown, and entity evidence for one source (id prefix ok) |
| `lantern inspect` | Store status: schema version, counts, confidence signals, decay checkpoints, recent sources |
| `lantern export` | JSON dump of sources + chunks, filterable by `--path` / `--query` |
| `lantern diff [<path>]` | Compare indexed `file://` sources against the filesystem |
| `lantern forget <pattern>` | Preview matching sources; pass `--apply` to actually delete |
| `lantern reindex` | Rebuild the full-text index from the canonical chunk rows |
| `lantern compact` | Decay stale access metadata so old reads stop dominating confidence |
| `lantern memory add\|list\|archive` | Create, list, and archive first-class memory records |
| `lantern feedback <chunk>` | Record thumbs-up / thumbs-down feedback for a chunk |
| `lantern query-success <chunk>` | Record an observed query-success signal for confidence scoring |
| `lantern sessions` | List sessions grouped from chunk `session_id` metadata |
| `lantern related-sessions <id>` | Sessions that share at least one entity with the given session, ranked by shared entities |
| `lantern temporal-sessions <id>` | Sessions whose timestamp ranges sit closest to the given session |
| `lantern entities` | List entities (URLs, repos, domains, emails, paths, `@mentions`, `#hashtags`) ranked by chunk refs |
| `lantern entity-neighbors <id>` | Entities that co-occur in the same chunks, with typed edges and shared chunk refs |
| `lantern entity-session-neighbors <id>` | Entities that co-occur in the same sessions, ranked by shared sessions |
| `lantern stash` | Write a timestamped `tar.gz` snapshot under `<store>/stashes/` |
| `lantern version` / `--version` | Print the build version |
Every command that produces structured output accepts `--format text` or
`--format json`; `search` additionally defaults to a compact `summary` mode.
## Examples
### Index a notes tree and search it
```bash
lantern init
lantern ingest notes/
lantern search "lantern bug tracker"
```
### Capture an agent session transcript from stdin
```bash
lantern search haystack --kind application/jsonl
```
### Stream session transcripts through a named pipe
```bash
mkfifo /tmp/lantern.pipe
# In one shell: the agent writes its transcript to the pipe between turns.
# In another: Lantern reads to EOF, ingests the batch, and is ready for the next.
lantern ingest /tmp/lantern.pipe
```
Lantern auto-detects the FIFO, reads until the writer closes, and routes the
bytes through the stdin-append path. Each reader session lands as its own
source under a `fifo://<abs_path>#<suffix>` URI, so repeated batches
accumulate instead of overwriting. A `.jsonl` FIFO name still triggers the
transcript extractor, preserving role / session / turn / tool metadata.
### Watch a transcript directory for new sessions
```bash
lantern ingest ~/agent-sessions/ --follow --follow-interval-secs 5
```
Polling-based: Lantern re-scans the directory every interval and ingests any
file whose content hash has changed. Unchanged files are a no-op, so this is
cheap to leave running. Stop with Ctrl-C.
### Drill into a single source
```bash
lantern inspect # copy a source id from the recent list
lantern show fd7e8e # short prefix is enough
```
### See what drifted since the last ingest
```bash
lantern diff notes/ # missing / changed / unchanged / unindexed
```
### Snapshot the store before a risky change
```bash
lantern stash # writes .lantern/stashes/lantern-<ts>.tar.gz
```
## .lantern-ignore
Lantern respects `.lantern-ignore` files for excluding paths from ingestion,
similar to `.gitignore`. Place a `.lantern-ignore` file in the directory being
ingested:
```
# Ignore build artifacts
target/
dist/
build/
# Ignore dependencies
node_modules/
.venv/
vendor/
# Ignore but keep one file
!important-logs/
# Ignore specific extensions
*.log
*.tmp
```
**Pattern syntax:**
- `#` — comments
- `*`, `?`, `**` — glob wildcards
- `/` suffix — match directories only
- `!` prefix — negate (un-ignore)
**Default ignores** (applied when no `.lantern-ignore` exists):
`.git/`, `target/`, `node_modules/`, `.hermes/`, `__pycache__/`, `.venv/`, `vendor/`
Use `--no-ignore` to bypass all ignore rules:
```bash
lantern ingest . --no-ignore
```
## Data model
Two tables carry the indexed state; both are visible from `sqlite3`:
- `sources` — one row per ingested artifact. Keeps `uri`, optional filesystem
`path`, `kind` (`text/markdown`, `text/plain`, `application/jsonl`), total
`bytes`, `content_sha256`, and timestamps.
- `chunks` — one row per deterministic slice of a source. Keeps the parent
`source_id`, `ordinal`, `byte_start`/`byte_end`, `char_count`, chunk text,
and chunk `sha256`.
A shadow FTS5 virtual table (`chunks_fts`) is kept in sync by triggers and
supplies BM25 ranking and snippet highlighting to `search`.
## Development
```bash
cargo fmt
cargo check --all-targets
cargo test --all-targets
cargo run -- --help
```
## Status
Early but usable. The CLI is stable, the schema versions its migrations (now through v18),
and every command has integration test coverage. Beyond the core retrieval surfaces —
keyword search (FTS5 BM25), semantic search (Ollama embeddings with cosine similarity,
auto-accelerated with sqlite-vec for the default unfiltered path and backfilled on upgrade),
hybrid search, an opt-in `--vec-semantic` path, and an MCP server — Lantern now exposes
a typed entity graph (URLs, repos, domains, emails, file paths, `@mentions`, `#hashtags`)
with neighbor and session-neighbor traversal, session-scoped retrieval with related and
temporal session views, first-class memory records, confidence signals (feedback,
query-success, access decay) surfaced in `show` / `inspect` / `export`, and JSONL ingest
that preserves tool-call lineage across Anthropic, OpenAI, and Responses transcripts.
Ingestion supports `.lantern-ignore` for excluding build artifacts and dependencies.
## License
Lantern is licensed under the **GNU Affero General Public License v3.0 only
(AGPL-3.0-only)**.
Copyright (C) 2026 Raphael Bitton
See [`LICENSE`](./LICENSE).