lantern 0.2.2

Local-first, provenance-aware semantic search for agent activity
Documentation

Lantern

Lantern is a local-first memory engine for agent activity.

It ingests text an agent has touched, chunks it deterministically, and keeps a full provenance trail — source URI, content hash, byte ranges, ingest time — alongside a BM25 keyword index. Everything lives in a single SQLite file under ./.lantern/ so it is easy to inspect, back up, or wipe by hand.

Thesis

Most memory tools are either chat-memory products, document search tools, or heavyweight agent frameworks. Lantern is narrower and more durable:

a local memory engine for agent activity with provenance-aware search

Provenance comes first. Every stored chunk can answer where it came from, when it was ingested, what exact byte range it covers, and why a search result surfaced it.

Install

Prebuilt Binaries

Static binaries are available from the GitLab Releases:

Platform File
Linux x86_64 lantern-linux-amd64
Linux aarch64 lantern-linux-arm64
macOS aarch64 lantern-macos-arm64
# Download and install (Linux amd64 example)
curl -L -o lantern https://git.skylantix.com/diogenes/lantern/-/releases/v0.2.2/downloads/lantern-linux-amd64
chmod +x lantern
sudo mv lantern /usr/local/bin/

All Linux binaries are fully static (musl) — no libc or OpenSSL dependency. SHA256 checksums are attached to each release.

Build from Source

Requires a recent Rust toolchain (2024 edition):

cargo build --release
./target/release/lantern --help

Commands

Command Purpose
lantern init Create a local store at ./.lantern/lantern.db
lantern ingest <path> Ingest supported files from a path; respects .lantern-ignore (use --no-ignore to bypass)
lantern ingest <path> --follow Poll <path> on an interval (--follow-interval-secs, default 5) and re-ingest new or modified files until interrupted
lantern ingest --stdin --uri L Ingest piped content under an explicit label
lantern ingest <fifo> Auto-detect a named pipe and read it to EOF as a streamed batch (append mode, fifo:// URI)
lantern embed Generate embeddings for chunks via Ollama (--model, --ollama-url, --limit)
lantern mcp Run the MCP server over stdio or TCP (--port)
lantern search <query> BM25 keyword search with --kind, --path, --limit filters
lantern search --semantic <q> Semantic search via Ollama embeddings (cosine similarity; auto-uses sqlite-vec when eligible)
lantern search --vec-semantic <q> Force the sqlite-vec-backed semantic path for the default model
lantern search --hybrid <q> Hybrid keyword + semantic search via Reciprocal Rank Fusion
lantern show <id> Full provenance + all chunks for one source (id prefix ok)
lantern inspect Store status: schema version, counts, recent sources
lantern export JSON dump of sources + chunks, filterable by --path / --query
lantern diff [<path>] Compare indexed file:// sources against the filesystem
lantern forget <pattern> Preview matching sources; pass --apply to actually delete
lantern reindex Rebuild the full-text index from the canonical chunk rows
lantern stash Write a timestamped tar.gz snapshot under <store>/stashes/
lantern version / --version Print the build version

Every command that produces structured output accepts --format text or --format json; search additionally defaults to a compact summary mode.

Examples

Index a notes tree and search it

lantern init
lantern ingest notes/
lantern search "lantern bug tracker"

Capture an agent session transcript from stdin

cat session.jsonl | lantern ingest --stdin \
    --uri "session://2026-04-18-foo" --kind application/jsonl
lantern search haystack --kind application/jsonl

Stream session transcripts through a named pipe

mkfifo /tmp/lantern.pipe
# In one shell: the agent writes its transcript to the pipe between turns.
# In another: Lantern reads to EOF, ingests the batch, and is ready for the next.
lantern ingest /tmp/lantern.pipe

Lantern auto-detects the FIFO, reads until the writer closes, and routes the bytes through the stdin-append path. Each reader session lands as its own source under a fifo://<abs_path>#<suffix> URI, so repeated batches accumulate instead of overwriting. A .jsonl FIFO name still triggers the transcript extractor, preserving role / session / turn / tool metadata.

Watch a transcript directory for new sessions

lantern ingest ~/agent-sessions/ --follow --follow-interval-secs 5

Polling-based: Lantern re-scans the directory every interval and ingests any file whose content hash has changed. Unchanged files are a no-op, so this is cheap to leave running. Stop with Ctrl-C.

Drill into a single source

lantern inspect                 # copy a source id from the recent list
lantern show fd7e8e             # short prefix is enough

See what drifted since the last ingest

lantern diff notes/             # missing / changed / unchanged / unindexed

Snapshot the store before a risky change

lantern stash                   # writes .lantern/stashes/lantern-<ts>.tar.gz

.lantern-ignore

Lantern respects .lantern-ignore files for excluding paths from ingestion, similar to .gitignore. Place a .lantern-ignore file in the directory being ingested:

# Ignore build artifacts
target/
dist/
build/

# Ignore dependencies
node_modules/
.venv/
vendor/

# Ignore but keep one file
!important-logs/

# Ignore specific extensions
*.log
*.tmp

Pattern syntax:

  • # — comments
  • *, ?, ** — glob wildcards
  • / suffix — match directories only
  • ! prefix — negate (un-ignore)

Default ignores (applied when no .lantern-ignore exists): .git/, target/, node_modules/, .hermes/, __pycache__/, .venv/, vendor/

Use --no-ignore to bypass all ignore rules:

lantern ingest . --no-ignore

Data model

Two tables carry the indexed state; both are visible from sqlite3:

  • sources — one row per ingested artifact. Keeps uri, optional filesystem path, kind (text/markdown, text/plain, application/jsonl), total bytes, content_sha256, and timestamps.
  • chunks — one row per deterministic slice of a source. Keeps the parent source_id, ordinal, byte_start/byte_end, char_count, chunk text, and chunk sha256.

A shadow FTS5 virtual table (chunks_fts) is kept in sync by triggers and supplies BM25 ranking and snippet highlighting to search.

Development

cargo fmt
cargo check --all-targets
cargo test --all-targets
cargo run -- --help

Status

Early but usable. The CLI is stable, the schema versions its migrations (now through v7), and every command has integration test coverage. Keyword search (FTS5 BM25), semantic search (Ollama embeddings with cosine similarity, now auto-accelerated with sqlite-vec for the default unfiltered path and backfilled on upgrade), hybrid search, an opt-in --vec-semantic path, and an MCP server are all implemented. Ingestion supports .lantern-ignore for excluding build artifacts and dependencies.

License

Lantern is licensed under the GNU Affero General Public License v3.0 only (AGPL-3.0-only).

Copyright (C) 2026 Raphael Bitton

See LICENSE.