lantern 0.2.3

Local-first, provenance-aware semantic search for agent activity
Documentation
# Lantern

Lantern is a local-first memory engine for agent activity.

It ingests text an agent has touched, chunks it deterministically, and keeps a
full provenance trail — source URI, content hash, byte ranges, ingest time —
alongside a BM25 keyword index. Everything lives in a single SQLite file under
`./.lantern/` so it is easy to inspect, back up, or wipe by hand.

## Thesis

Most memory tools are either chat-memory products, document search tools, or
heavyweight agent frameworks. Lantern is narrower and more durable:

> **a local memory engine for agent activity with provenance-aware search**

Provenance comes first. Every stored chunk can answer *where* it came from,
*when* it was ingested, *what* exact byte range it covers, and *why* a search
result surfaced it.

## Install

### Prebuilt Binaries

Static binaries are available from the [GitLab Releases](https://git.skylantix.com/diogenes/lantern/-/releases):

| Platform | File |
|----------|------|
| Linux x86_64 | `lantern-linux-amd64` |
| Linux aarch64 | `lantern-linux-arm64` |
| macOS aarch64 | `lantern-macos-arm64` |

```bash
# Download and install (Linux amd64 example)
curl -L -o lantern https://git.skylantix.com/diogenes/lantern/-/releases/v0.2.2/downloads/lantern-linux-amd64
chmod +x lantern
sudo mv lantern /usr/local/bin/
```

All Linux binaries are fully static (musl) — no libc or OpenSSL dependency.
SHA256 checksums are attached to each release.

### Build from Source

Requires a recent Rust toolchain (2024 edition):

```bash
cargo build --release
./target/release/lantern --help
```

## Commands

| Command                           | Purpose                                                             |
|-----------------------------------|---------------------------------------------------------------------|
| `lantern init`                    | Create a local store at `./.lantern/lantern.db`                     |
| `lantern ingest <path>`           | Ingest supported files from a path; respects `.lantern-ignore` (use `--no-ignore` to bypass) |
| `lantern ingest <path> --follow`  | Poll `<path>` on an interval (`--follow-interval-secs`, default 5) and re-ingest new or modified files until interrupted |
| `lantern ingest --stdin --uri L`  | Ingest piped content under an explicit label                        |
| `lantern ingest <fifo>`           | Auto-detect a named pipe and read it to EOF as a streamed batch (append mode, `fifo://` URI) |
| `lantern embed`                   | Generate embeddings for chunks via Ollama (`--model`, `--ollama-url`, `--limit`) |
| `lantern mcp`                     | Run the MCP server over stdio or TCP (`--port`)                     |
| `lantern search <query>`          | BM25 keyword search with `--kind`, `--path`, `--limit` filters      |
| `lantern search --semantic <q>`   | Semantic search via Ollama embeddings (cosine similarity; auto-uses sqlite-vec when eligible) |
| `lantern search --vec-semantic <q>` | Force the sqlite-vec-backed semantic path for the default model     |
| `lantern search --hybrid <q>`     | Hybrid keyword + semantic search via Reciprocal Rank Fusion         |
| `lantern show <id>`               | Full provenance + all chunks for one source (id prefix ok)          |
| `lantern inspect`                 | Store status: schema version, counts, recent sources                |
| `lantern export`                  | JSON dump of sources + chunks, filterable by `--path` / `--query`   |
| `lantern diff [<path>]`           | Compare indexed `file://` sources against the filesystem            |
| `lantern forget <pattern>`        | Preview matching sources; pass `--apply` to actually delete         |
| `lantern reindex`                 | Rebuild the full-text index from the canonical chunk rows           |
| `lantern stash`                   | Write a timestamped `tar.gz` snapshot under `<store>/stashes/`      |
| `lantern version` / `--version`   | Print the build version                                             |

Every command that produces structured output accepts `--format text` or
`--format json`; `search` additionally defaults to a compact `summary` mode.

## Examples

### Index a notes tree and search it

```bash
lantern init
lantern ingest notes/
lantern search "lantern bug tracker"
```

### Capture an agent session transcript from stdin

```bash
cat session.jsonl | lantern ingest --stdin \
    --uri "session://2026-04-18-foo" --kind application/jsonl
lantern search haystack --kind application/jsonl
```

### Stream session transcripts through a named pipe

```bash
mkfifo /tmp/lantern.pipe
# In one shell: the agent writes its transcript to the pipe between turns.
# In another: Lantern reads to EOF, ingests the batch, and is ready for the next.
lantern ingest /tmp/lantern.pipe
```

Lantern auto-detects the FIFO, reads until the writer closes, and routes the
bytes through the stdin-append path. Each reader session lands as its own
source under a `fifo://<abs_path>#<suffix>` URI, so repeated batches
accumulate instead of overwriting. A `.jsonl` FIFO name still triggers the
transcript extractor, preserving role / session / turn / tool metadata.

### Watch a transcript directory for new sessions

```bash
lantern ingest ~/agent-sessions/ --follow --follow-interval-secs 5
```

Polling-based: Lantern re-scans the directory every interval and ingests any
file whose content hash has changed. Unchanged files are a no-op, so this is
cheap to leave running. Stop with Ctrl-C.

### Drill into a single source

```bash
lantern inspect                 # copy a source id from the recent list
lantern show fd7e8e             # short prefix is enough
```

### See what drifted since the last ingest

```bash
lantern diff notes/             # missing / changed / unchanged / unindexed
```

### Snapshot the store before a risky change

```bash
lantern stash                   # writes .lantern/stashes/lantern-<ts>.tar.gz
```

## .lantern-ignore

Lantern respects `.lantern-ignore` files for excluding paths from ingestion,
similar to `.gitignore`. Place a `.lantern-ignore` file in the directory being
ingested:

```
# Ignore build artifacts
target/
dist/
build/

# Ignore dependencies
node_modules/
.venv/
vendor/

# Ignore but keep one file
!important-logs/

# Ignore specific extensions
*.log
*.tmp
```

**Pattern syntax:**
- `#` — comments
- `*`, `?`, `**` — glob wildcards
- `/` suffix — match directories only
- `!` prefix — negate (un-ignore)

**Default ignores** (applied when no `.lantern-ignore` exists):
`.git/`, `target/`, `node_modules/`, `.hermes/`, `__pycache__/`, `.venv/`, `vendor/`

Use `--no-ignore` to bypass all ignore rules:

```bash
lantern ingest . --no-ignore
```

## Data model

Two tables carry the indexed state; both are visible from `sqlite3`:

- `sources` — one row per ingested artifact. Keeps `uri`, optional filesystem
  `path`, `kind` (`text/markdown`, `text/plain`, `application/jsonl`), total
  `bytes`, `content_sha256`, and timestamps.
- `chunks` — one row per deterministic slice of a source. Keeps the parent
  `source_id`, `ordinal`, `byte_start`/`byte_end`, `char_count`, chunk text,
  and chunk `sha256`.

A shadow FTS5 virtual table (`chunks_fts`) is kept in sync by triggers and
supplies BM25 ranking and snippet highlighting to `search`.

## Development

```bash
cargo fmt
cargo check --all-targets
cargo test --all-targets
cargo run -- --help
```

## Status

Early but usable. The CLI is stable, the schema versions its migrations (now through v7),
and every command has integration test coverage. Keyword search (FTS5 BM25),
semantic search (Ollama embeddings with cosine similarity, now auto-accelerated
with sqlite-vec for the default unfiltered path and backfilled on upgrade), hybrid search,
an opt-in `--vec-semantic` path, and an MCP server are all implemented. Ingestion supports
`.lantern-ignore` for excluding build artifacts and dependencies.

## License

Lantern is licensed under the **GNU Affero General Public License v3.0 only
(AGPL-3.0-only)**.

Copyright (C) 2026 Raphael Bitton

See [`LICENSE`](./LICENSE).