lantern 0.3.0

Local-first, provenance-aware semantic search for agent activity
Documentation

Lantern

Lantern is a local-first memory engine for agent activity.

It ingests text an agent has touched, chunks it deterministically, and keeps a full provenance trail — source URI, content hash, byte ranges, ingest time — alongside a BM25 keyword index. Everything lives in a single SQLite file under ./.lantern/ so it is easy to inspect, back up, or wipe by hand.

Thesis

Most memory tools are either chat-memory products, document search tools, or heavyweight agent frameworks. Lantern is narrower and more durable:

a local memory engine for agent activity with provenance-aware search

Provenance comes first. Every stored chunk can answer where it came from, when it was ingested, what exact byte range it covers, and why a search result surfaced it.

Install

Prebuilt Binaries

Static binaries are available from the GitLab Releases:

Platform File
Linux x86_64 lantern-linux-amd64
Linux aarch64 lantern-linux-arm64
macOS aarch64 lantern-macos-arm64
# Download and install (Linux amd64 example)
curl -L -o lantern https://git.skylantix.com/diogenes/lantern/-/releases/v0.3.0/downloads/lantern-linux-amd64
chmod +x lantern
sudo mv lantern /usr/local/bin/

All Linux binaries are fully static (musl) — no libc or OpenSSL dependency. SHA256 checksums are attached to each release.

Build from Source

Requires a recent Rust toolchain (2024 edition):

cargo build --release
./target/release/lantern --help

Commands

Command Purpose
lantern init Create a local store at ./.lantern/lantern.db
lantern ingest <path> Ingest supported files from a path; respects .lantern-ignore (use --no-ignore to bypass)
lantern ingest <path> --follow Poll <path> on an interval (--follow-interval-secs, default 5) and re-ingest new or modified files until interrupted
lantern ingest --stdin --uri L Ingest piped content under an explicit label
lantern ingest <fifo> Auto-detect a named pipe and read it to EOF as a streamed batch (append mode, fifo:// URI)
lantern embed Generate embeddings for chunks via Ollama (--model, --ollama-url, --limit)
lantern mcp Run the MCP server over stdio or TCP (--port)
lantern search <query> BM25 keyword search with --kind, --path, --limit filters
lantern search --semantic <q> Semantic search via Ollama embeddings (cosine similarity; auto-uses sqlite-vec when eligible)
lantern search --vec-semantic <q> Force the sqlite-vec-backed semantic path for the default model
lantern search --hybrid <q> Hybrid keyword + semantic search via Reciprocal Rank Fusion
lantern query <q> Alias for search tuned for broader exploration (limit 20, summary format)
lantern show <id> Full provenance, chunk text, confidence breakdown, and entity evidence for one source (id prefix ok)
lantern inspect Store status: schema version, counts, confidence signals, decay checkpoints, recent sources
lantern export JSON dump of sources + chunks, filterable by --path / --query
lantern diff [<path>] Compare indexed file:// sources against the filesystem
lantern forget <pattern> Preview matching sources; pass --apply to actually delete
lantern reindex Rebuild the full-text index from the canonical chunk rows
lantern compact Decay stale access metadata so old reads stop dominating confidence
lantern memory add|list|archive Create, list, and archive first-class memory records
lantern feedback <chunk> Record thumbs-up / thumbs-down feedback for a chunk
lantern query-success <chunk> Record an observed query-success signal for confidence scoring
lantern sessions List sessions grouped from chunk session_id metadata
lantern related-sessions <id> Sessions that share at least one entity with the given session, ranked by shared entities
lantern temporal-sessions <id> Sessions whose timestamp ranges sit closest to the given session
lantern entities List entities (URLs, repos, domains, emails, paths, @mentions, #hashtags) ranked by chunk refs
lantern entity-neighbors <id> Entities that co-occur in the same chunks, with typed edges and shared chunk refs
lantern entity-session-neighbors <id> Entities that co-occur in the same sessions, ranked by shared sessions
lantern stash Write a timestamped tar.gz snapshot under <store>/stashes/
lantern version / --version Print the build version

Every command that produces structured output accepts --format text or --format json; search additionally defaults to a compact summary mode.

Examples

Index a notes tree and search it

lantern init
lantern ingest notes/
lantern search "lantern bug tracker"

Capture an agent session transcript from stdin

cat session.jsonl | lantern ingest --stdin \
    --uri "session://2026-04-18-foo" --kind application/jsonl
lantern search haystack --kind application/jsonl

Stream session transcripts through a named pipe

mkfifo /tmp/lantern.pipe
# In one shell: the agent writes its transcript to the pipe between turns.
# In another: Lantern reads to EOF, ingests the batch, and is ready for the next.
lantern ingest /tmp/lantern.pipe

Lantern auto-detects the FIFO, reads until the writer closes, and routes the bytes through the stdin-append path. Each reader session lands as its own source under a fifo://<abs_path>#<suffix> URI, so repeated batches accumulate instead of overwriting. A .jsonl FIFO name still triggers the transcript extractor, preserving role / session / turn / tool metadata.

Watch a transcript directory for new sessions

lantern ingest ~/agent-sessions/ --follow --follow-interval-secs 5

Polling-based: Lantern re-scans the directory every interval and ingests any file whose content hash has changed. Unchanged files are a no-op, so this is cheap to leave running. Stop with Ctrl-C.

Drill into a single source

lantern inspect                 # copy a source id from the recent list
lantern show fd7e8e             # short prefix is enough

See what drifted since the last ingest

lantern diff notes/             # missing / changed / unchanged / unindexed

Snapshot the store before a risky change

lantern stash                   # writes .lantern/stashes/lantern-<ts>.tar.gz

.lantern-ignore

Lantern respects .lantern-ignore files for excluding paths from ingestion, similar to .gitignore. Place a .lantern-ignore file in the directory being ingested:

# Ignore build artifacts
target/
dist/
build/

# Ignore dependencies
node_modules/
.venv/
vendor/

# Ignore but keep one file
!important-logs/

# Ignore specific extensions
*.log
*.tmp

Pattern syntax:

  • # — comments
  • *, ?, ** — glob wildcards
  • / suffix — match directories only
  • ! prefix — negate (un-ignore)

Default ignores (applied when no .lantern-ignore exists): .git/, target/, node_modules/, .hermes/, __pycache__/, .venv/, vendor/

Use --no-ignore to bypass all ignore rules:

lantern ingest . --no-ignore

Data model

Two tables carry the indexed state; both are visible from sqlite3:

  • sources — one row per ingested artifact. Keeps uri, optional filesystem path, kind (text/markdown, text/plain, application/jsonl), total bytes, content_sha256, and timestamps.
  • chunks — one row per deterministic slice of a source. Keeps the parent source_id, ordinal, byte_start/byte_end, char_count, chunk text, and chunk sha256.

A shadow FTS5 virtual table (chunks_fts) is kept in sync by triggers and supplies BM25 ranking and snippet highlighting to search.

Development

cargo fmt
cargo check --all-targets
cargo test --all-targets
cargo run -- --help

Status

Early but usable. The CLI is stable, the schema versions its migrations (now through v18), and every command has integration test coverage. Beyond the core retrieval surfaces — keyword search (FTS5 BM25), semantic search (Ollama embeddings with cosine similarity, auto-accelerated with sqlite-vec for the default unfiltered path and backfilled on upgrade), hybrid search, an opt-in --vec-semantic path, and an MCP server — Lantern now exposes a typed entity graph (URLs, repos, domains, emails, file paths, @mentions, #hashtags) with neighbor and session-neighbor traversal, session-scoped retrieval with related and temporal session views, first-class memory records, confidence signals (feedback, query-success, access decay) surfaced in show / inspect / export, and JSONL ingest that preserves tool-call lineage across Anthropic, OpenAI, and Responses transcripts. Ingestion supports .lantern-ignore for excluding build artifacts and dependencies.

License

Lantern is licensed under the GNU Affero General Public License v3.0 only (AGPL-3.0-only).

Copyright (C) 2026 Raphael Bitton

See LICENSE.