docs.rs failed to build pond-db-0.11.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

Visit the last successful build: pond-db-0.10.2

pond

Lossless storage and search for AI agent sessions, across every agentic client.

Quickstart. Install, run guided setup, and ingest your local sessions:

brew install tenequm/tap/pond
pond init   # guided setup: storage, adapters, MCP registration, optional schedule
pond sync   # ingest, embed, update indexes - every enabled adapter

pond init registers pond as an MCP server for detected clients; to add it by hand:

claude mcp add -s user pond -- pond mcp   # Claude Code
codex mcp add pond -- pond mcp            # Codex

Pond keeps every AI conversation you've ever had intact and searchable, and lets you continue any of them in any supported tool - your history, your search, your sessions, independent of the agent vendor that made them. It is one Rust binary that ingests sessions from registered agentic-client adapters into a canonical Session / Message / Part interlingua, stores them in Lance on object storage, and serves search over them via HTTP+JSON and MCP. Two deployments: a personal pond on your laptop, or a multi-tenant backend for hosted agent infrastructure. No extra database, no wrapper around Lance.

Current automatically synced agent clients:

Claude Code CLI
Claude desktop app (local agent mode)
Codex CLI
opencode CLI
pi-coding-agent CLI

You can also import a Claude.ai data export with the claude-ai-export adapter - a manual download, so it is not auto-discovered: pond sync claude-ai-export --path <path>.

Status: pre-v1. Schemas, wire shapes, and config keys are subject to breaking change until v1. Full documentation lives at pond.locker; the contract is docs/spec.md.

Background

Every agentic CLI ships its own session format and its own search surface. Switching tools means losing history. Replaying a Claude Code session in another provider's tooling means re-translating the wire shape by hand. Hosted multi-tenant deployments rebuild the same storage layer from scratch.

Pond is the storage and retrieval layer that sits underneath. Every adapter is a bidirectional codec between a client format and one canonical schema, so any session can be restored by any adapter - it need not return to the client that produced it. Storage, search (vector or BM25 full-text, one arm per query), and provider-agnostic replay all sit on a single Lance-on-object-storage foundation.

The v1 surface includes: full CLI, HTTP+JSON and MCP transports, search over three Lance datasets, intfloat/multilingual-e5-small embeddings at FP16 weights (Metal on macOS, CUDA opt-in, CPU fallback), and local-FS / S3 / GCS / Azure backends through Lance's object_store integration.

Install

Linux and macOS are supported; Windows is not in v1 scope.

Package Managers (macOS and Linux):

brew install tenequm/tap/pond                       # Homebrew
nix profile add github:tenequm/pond-nix#pond        # Nix
cargo install pond-db                               # crates.io (installs the `pond` command)

Build from source:

git clone https://github.com/tenequm/pond.git
cd pond
cargo install --path .

For CUDA acceleration on Linux:

cargo install --path . --features cuda

On macOS the Metal backend is selected automatically; on other systems the CPU fallback runs without extra features.

Usage

Set up storage, adapters, MCP registration, and an optional sync schedule in one pass (idempotent - re-run it any time to repair or update):

pond init

Then import sessions from local adapters, embed them, update indexes, and search:

pond sync
pond search "how did we wire up the OCC retry loop"

Run a server:

pond serve                         # HTTP on 127.0.0.1:9797
pond serve --transport stdio       # MCP over stdio
pond mcp                           # alias for stdio MCP

Fetch a single session or message, or move a whole corpus:

pond get --session-id <id>
pond copy --from local --to snapshot.pond
pond copy --from snapshot.pond --to local

Ask structured questions with read-only SQL (the same surface as the pond_sql_query MCP tool):

pond sql "SELECT project, count(*) FROM messages GROUP BY project ORDER BY 2 DESC"

Run maintenance on demand (sync already embeds inline and folds indexes every run):

pond optimize --only embed
pond optimize --only index

Keep pond current automatically (launchd on macOS, systemd user timers or cron on Linux):

pond schedule start                # every 5m by default (--every 15m|1h|6h|1d)
pond schedule status
pond schedule logs

pond status prints a per-table storage table, then indexes (text/semantic readiness), stored (sessions + searchable messages), and adapters (configured adapter count). Pass --adapters for per-project tables and per-intent index detail. pond search --explain returns Lance's analyze_plan output for each retrieval arm.

Remote storage

By default pond stores data locally under $XDG_DATA_HOME/pond. To use an object store, add credentials and switch the destination:

pond creds add                                                    # interactive: name, access key, hidden secret
pond storage use s3+https://nbg1.your-objectstorage.com/my-pond   # probe end-to-end, then flip [storage].path
pond storage check                                                # verify: parse, creds, conditional-put (OCC), write/read/delete

pond init --storage-path <url> configures a remote destination during setup and prompts for credentials inline when the destination is remote, so a bucket is one command. The s3+https://host/bucket form works for any S3-compatible store (Hetzner, R2, B2, MinIO); s3://, gs://, and az:// use the standard cloud SDK credential chain when no [creds.*] set matches. pond copy --from <local> --to <url> carries existing local data into the bucket - idempotent, never deletes the source, and on completion it rebuilds the destination indexes and verifies every row landed (exit 6 if any are missing or duplicated, so you never reconcile by hand). pond copy --verify-only --from <local> --to <url> runs that same check read-only, without copying. Full walkthrough: pond.locker.

Configuration

pond init walks through everything below interactively and enables the adapters it finds. pond sync only ingests already-enabled adapters - enabling one is an explicit step (pond adapters enable / pond adapters discover / pond init), never a side effect of sync. Config lives under $XDG_CONFIG_HOME/pond/. Every [adapters.<name>] block needs enabled = true to be active; sections without it (or with enabled = false) are skipped.

[adapters.claude-code]
enabled = true
path = "~/.claude/projects"

[adapters.codex-cli]
enabled = false                    # kept in config, skipped on `pond sync`
path = "~/.codex/sessions"

Verbosity

Root-level -v / -vv / -vvv raise the tracing level (info / debug / trace); -q / -qq lower it. The default surfaces warnings only. RUST_LOG overrides the CLI flag when set; POND_LOG is no longer honored.

Design

The full contract is in docs/spec.md. Key choices:

Lance direct, no wrapper. The lance-format/lance crates are the only storage and search engine. No lancedb, no parallel abstraction. Storage, indexing, OCC, schema evolution, blob columns, versioning, and time-travel are all Lance. The read-only pond sql surface is DataFusion planning over the same Lance datasets - a query escape hatch, not a second engine.
Canonical Session / Message / Part interlingua. Owned in pond, in the shape of Effect v4's Prompt-side Part union. This schema is pond's product; everything else is machinery around it.
Three Lance datasets (sessions, messages, parts). messages carries the nullable embedding (vector + embedding_model) alongside denormalized filter columns (source_agent / project / role / timestamp) for single-stage filter pushdown.
No-synthesis adapter seam. Adapters parse source records through extractor helpers that make "invent a value" a compile error - model-no-synthesis, model-schema-honesty, and adapter-provenance-required are structural, not review rules.
Index lifecycle decoupled from writes. Writes commit data (embeddings included, computed inline at ingest) without folding the search indexes. pond sync runs index maintenance by default, and pond optimize --only index runs it on demand; Lance merges index results with a flat scan over unindexed fragments, so reads stay correct.
Single-arm retrieval. Each query runs one retriever - vector (cosine, with a gentle recency tiebreaker) or fts (BM25) - chosen per query; no server-side fusion. The vector arm falls back to full-text when the store has no embeddings, and --sort-by recency returns newest-first. Results group to one summary per session, keyed on session_root.
Language-neutral full-text. Word-level simple tokenizer with English stemming (ascii-folding on); tokens the stemmer does not recognize pass through unchanged and stay exact-matchable, so pond indexes sessions in any language alike.
Two transports, one handler set. HTTP+JSON (axum) and MCP (rmcp) both dispatch into the same handlers. Wire ops: pond_search, pond_get, pond_ingest. MCP additionally exposes the read-only pond_sql_query tool and the schema://pond, schema://pond-sql, and stats://pond resources.
Opaque-string multi-tenancy. Each tenant is a namespace string the integrator supplies; pond does not authenticate, authorize, or model identity. The object store's IAM is the storage boundary.
Encryption is operational. Bucket SSE plus filesystem encryption; pond holds no keys and adds no application-level crypto.

References

The upstream schemas that shaped pond's canonical model are documented in docs/references/ (source URLs + why each matters; the vendored code itself is not redistributed). Real session captures live under tests/fixtures/adapter/.

Source	Why it matters
Effect-TS/effect	Effect v4 Prompt/Response Part unions. Pond's canonical types copy this shape.
sst/opencode	Effect Schema canonical Part union; SDK types; storage schema.
kilo-org/kilocode	OpenCode fork. Adds `editorContext`, plan-followup, kilocode-specific events.
badlogic/pi-mono	pi-coding-agent leaf-cursor branching and cross-provider conformance test matrix.
open-telemetry/semantic-conventions-genai	GenAI semantic conventions. Inspiration for shape overlap; pond does not derive from OTel.
`tests/fixtures/adapter/`	Real session captures for nine source harnesses (claude_ai_export, claude_code, claude_desktop_app, claude_managed_agents, codex_cli, nanoclaw, openclaw, opencode, pi-coding-agent). Drives adapter design and serves as adapter test fixtures.

Contributing

Issues and pull requests are welcome. The most useful contributions right now:

Spec feedback on docs/spec.md.
Pointers to additional reference schemas or session samples worth documenting under docs/references/.
Bug reports against the v1 surface (CLI verbs, wire ops, schema mismatches, OCC behavior, object-store backends).

For larger changes, open an issue first to discuss the direction. For security issues, see SECURITY.md.

pond-db 0.11.0