lantern 0.3.0

Local-first, provenance-aware semantic search for agent activity
Documentation
# Lantern

Lantern is a local-first memory engine for agent activity.

It ingests text an agent has touched, chunks it deterministically, and keeps a
full provenance trail — source URI, content hash, byte ranges, ingest time —
alongside a BM25 keyword index. Everything lives in a single SQLite file under
`./.lantern/` so it is easy to inspect, back up, or wipe by hand.

## Thesis

Most memory tools are either chat-memory products, document search tools, or
heavyweight agent frameworks. Lantern is narrower and more durable:

> **a local memory engine for agent activity with provenance-aware search**

Provenance comes first. Every stored chunk can answer *where* it came from,
*when* it was ingested, *what* exact byte range it covers, and *why* a search
result surfaced it.

## Install

### Prebuilt Binaries

Static binaries are available from the [GitLab Releases](https://git.skylantix.com/diogenes/lantern/-/releases):

| Platform | File |
|----------|------|
| Linux x86_64 | `lantern-linux-amd64` |
| Linux aarch64 | `lantern-linux-arm64` |
| macOS aarch64 | `lantern-macos-arm64` |

```bash
# Download and install (Linux amd64 example)
curl -L -o lantern https://git.skylantix.com/diogenes/lantern/-/releases/v0.3.0/downloads/lantern-linux-amd64
chmod +x lantern
sudo mv lantern /usr/local/bin/
```

All Linux binaries are fully static (musl) — no libc or OpenSSL dependency.
SHA256 checksums are attached to each release.

### Build from Source

Requires a recent Rust toolchain (2024 edition):

```bash
cargo build --release
./target/release/lantern --help
```

## Commands

| Command                           | Purpose                                                             |
|-----------------------------------|---------------------------------------------------------------------|
| `lantern init`                    | Create a local store at `./.lantern/lantern.db`                     |
| `lantern ingest <path>`           | Ingest supported files from a path; respects `.lantern-ignore` (use `--no-ignore` to bypass) |
| `lantern ingest <path> --follow`  | Poll `<path>` on an interval (`--follow-interval-secs`, default 5) and re-ingest new or modified files until interrupted |
| `lantern ingest --stdin --uri L`  | Ingest piped content under an explicit label                        |
| `lantern ingest <fifo>`           | Auto-detect a named pipe and read it to EOF as a streamed batch (append mode, `fifo://` URI) |
| `lantern embed`                   | Generate embeddings for chunks via Ollama (`--model`, `--ollama-url`, `--limit`) |
| `lantern mcp`                     | Run the MCP server over stdio or TCP (`--port`)                     |
| `lantern search <query>`          | BM25 keyword search with `--kind`, `--path`, `--limit` filters      |
| `lantern search --semantic <q>`   | Semantic search via Ollama embeddings (cosine similarity; auto-uses sqlite-vec when eligible) |
| `lantern search --vec-semantic <q>` | Force the sqlite-vec-backed semantic path for the default model     |
| `lantern search --hybrid <q>`     | Hybrid keyword + semantic search via Reciprocal Rank Fusion         |
| `lantern query <q>`               | Alias for `search` tuned for broader exploration (limit 20, summary format) |
| `lantern show <id>`               | Full provenance, chunk text, confidence breakdown, and entity evidence for one source (id prefix ok) |
| `lantern inspect`                 | Store status: schema version, counts, confidence signals, decay checkpoints, recent sources |
| `lantern export`                  | JSON dump of sources + chunks, filterable by `--path` / `--query`   |
| `lantern diff [<path>]`           | Compare indexed `file://` sources against the filesystem            |
| `lantern forget <pattern>`        | Preview matching sources; pass `--apply` to actually delete         |
| `lantern reindex`                 | Rebuild the full-text index from the canonical chunk rows           |
| `lantern compact`                 | Decay stale access metadata so old reads stop dominating confidence |
| `lantern memory add\|list\|archive` | Create, list, and archive first-class memory records              |
| `lantern feedback <chunk>`        | Record thumbs-up / thumbs-down feedback for a chunk                 |
| `lantern query-success <chunk>`   | Record an observed query-success signal for confidence scoring      |
| `lantern sessions`                | List sessions grouped from chunk `session_id` metadata              |
| `lantern related-sessions <id>`   | Sessions that share at least one entity with the given session, ranked by shared entities |
| `lantern temporal-sessions <id>`  | Sessions whose timestamp ranges sit closest to the given session    |
| `lantern entities`                | List entities (URLs, repos, domains, emails, paths, `@mentions`, `#hashtags`) ranked by chunk refs |
| `lantern entity-neighbors <id>`   | Entities that co-occur in the same chunks, with typed edges and shared chunk refs |
| `lantern entity-session-neighbors <id>` | Entities that co-occur in the same sessions, ranked by shared sessions |
| `lantern stash`                   | Write a timestamped `tar.gz` snapshot under `<store>/stashes/`      |
| `lantern version` / `--version`   | Print the build version                                             |

Every command that produces structured output accepts `--format text` or
`--format json`; `search` additionally defaults to a compact `summary` mode.

## Examples

### Index a notes tree and search it

```bash
lantern init
lantern ingest notes/
lantern search "lantern bug tracker"
```

### Capture an agent session transcript from stdin

```bash
cat session.jsonl | lantern ingest --stdin \
    --uri "session://2026-04-18-foo" --kind application/jsonl
lantern search haystack --kind application/jsonl
```

### Stream session transcripts through a named pipe

```bash
mkfifo /tmp/lantern.pipe
# In one shell: the agent writes its transcript to the pipe between turns.
# In another: Lantern reads to EOF, ingests the batch, and is ready for the next.
lantern ingest /tmp/lantern.pipe
```

Lantern auto-detects the FIFO, reads until the writer closes, and routes the
bytes through the stdin-append path. Each reader session lands as its own
source under a `fifo://<abs_path>#<suffix>` URI, so repeated batches
accumulate instead of overwriting. A `.jsonl` FIFO name still triggers the
transcript extractor, preserving role / session / turn / tool metadata.

### Watch a transcript directory for new sessions

```bash
lantern ingest ~/agent-sessions/ --follow --follow-interval-secs 5
```

Polling-based: Lantern re-scans the directory every interval and ingests any
file whose content hash has changed. Unchanged files are a no-op, so this is
cheap to leave running. Stop with Ctrl-C.

### Drill into a single source

```bash
lantern inspect                 # copy a source id from the recent list
lantern show fd7e8e             # short prefix is enough
```

### See what drifted since the last ingest

```bash
lantern diff notes/             # missing / changed / unchanged / unindexed
```

### Snapshot the store before a risky change

```bash
lantern stash                   # writes .lantern/stashes/lantern-<ts>.tar.gz
```

## .lantern-ignore

Lantern respects `.lantern-ignore` files for excluding paths from ingestion,
similar to `.gitignore`. Place a `.lantern-ignore` file in the directory being
ingested:

```
# Ignore build artifacts
target/
dist/
build/

# Ignore dependencies
node_modules/
.venv/
vendor/

# Ignore but keep one file
!important-logs/

# Ignore specific extensions
*.log
*.tmp
```

**Pattern syntax:**
- `#` — comments
- `*`, `?`, `**` — glob wildcards
- `/` suffix — match directories only
- `!` prefix — negate (un-ignore)

**Default ignores** (applied when no `.lantern-ignore` exists):
`.git/`, `target/`, `node_modules/`, `.hermes/`, `__pycache__/`, `.venv/`, `vendor/`

Use `--no-ignore` to bypass all ignore rules:

```bash
lantern ingest . --no-ignore
```

## Data model

Two tables carry the indexed state; both are visible from `sqlite3`:

- `sources` — one row per ingested artifact. Keeps `uri`, optional filesystem
  `path`, `kind` (`text/markdown`, `text/plain`, `application/jsonl`), total
  `bytes`, `content_sha256`, and timestamps.
- `chunks` — one row per deterministic slice of a source. Keeps the parent
  `source_id`, `ordinal`, `byte_start`/`byte_end`, `char_count`, chunk text,
  and chunk `sha256`.

A shadow FTS5 virtual table (`chunks_fts`) is kept in sync by triggers and
supplies BM25 ranking and snippet highlighting to `search`.

## Development

```bash
cargo fmt
cargo check --all-targets
cargo test --all-targets
cargo run -- --help
```

## Status

Early but usable. The CLI is stable, the schema versions its migrations (now through v18),
and every command has integration test coverage. Beyond the core retrieval surfaces —
keyword search (FTS5 BM25), semantic search (Ollama embeddings with cosine similarity,
auto-accelerated with sqlite-vec for the default unfiltered path and backfilled on upgrade),
hybrid search, an opt-in `--vec-semantic` path, and an MCP server — Lantern now exposes
a typed entity graph (URLs, repos, domains, emails, file paths, `@mentions`, `#hashtags`)
with neighbor and session-neighbor traversal, session-scoped retrieval with related and
temporal session views, first-class memory records, confidence signals (feedback,
query-success, access decay) surfaced in `show` / `inspect` / `export`, and JSONL ingest
that preserves tool-call lineage across Anthropic, OpenAI, and Responses transcripts.
Ingestion supports `.lantern-ignore` for excluding build artifacts and dependencies.

## License

Lantern is licensed under the **GNU Affero General Public License v3.0 only
(AGPL-3.0-only)**.

Copyright (C) 2026 Raphael Bitton

See [`LICENSE`](./LICENSE).