pond
Lossless storage and hybrid search for AI agent sessions, across every agentic client.
Quickstart. Install, ingest your local sessions, and add pond as an MCP server in any app:
# add pond as an MCP server (pick your client):
Pond keeps every AI conversation you've ever had intact and searchable, and lets you continue any of them in any supported tool - your history, your search, your sessions, independent of the agent vendor that made them. It is one Rust binary that ingests sessions from registered agentic-client adapters into a canonical Session / Message / Part interlingua, stores them in Lance on object storage, and serves hybrid search over them via HTTP+JSON and MCP. Two deployments: a personal pond on your laptop, or a multi-tenant backend for hosted agent infrastructure. No extra database, no wrapper around Lance.
Current automatically synced agent clients:
- Claude Code CLI
- Codex CLI
- opencode CLI
- pi-coding-agent CLI
Status: pre-v1. Schemas, wire shapes, and config keys are subject to breaking change until v1. Full documentation lives at pond.cascade.fyi; the contract is docs/spec.md.
Background
Every agentic CLI ships its own session format and its own search surface. Switching tools means losing history. Replaying a Claude Code session in another provider's tooling means re-translating the wire shape by hand. Hosted multi-tenant deployments rebuild the same storage layer from scratch.
Pond is the storage and retrieval layer that sits underneath. Every adapter is a bidirectional codec between a client format and one canonical schema, so any session can be restored by any adapter - it need not return to the client that produced it. Storage, hybrid search (BM25 + vector, score-normalized fusion), and provider-agnostic replay all sit on a single Lance-on-object-storage foundation.
The v1 surface includes: full CLI, HTTP+JSON and MCP transports, hybrid search over three Lance datasets, intfloat/multilingual-e5-small embeddings at FP16 weights (Metal on macOS, CUDA opt-in, CPU fallback), and local-FS / S3 / GCS / Azure backends through Lance's object_store integration.
Install
Linux and macOS are supported; Windows is not in v1 scope.
Package Managers (macOS and Linux):
Build from source:
For CUDA acceleration on Linux:
On macOS the Metal backend is selected automatically; on other systems the CPU fallback runs without extra features.
Usage
Set up storage, sources, MCP registration, and an optional sync schedule in one pass (idempotent - re-run it any time to repair or update):
Then import sessions from local sources, embed them, update indexes, and search:
Run a server:
Fetch a single session or message, or move a whole corpus:
Ask structured questions with read-only SQL (the same surface as the pond_sql_query MCP tool):
Stages can be run independently when needed:
Keep pond current automatically (launchd on macOS, systemd user timers or cron on Linux):
pond status prints a per-table storage table, then indexes (text/semantic readiness), stored (sessions + searchable messages), and sources (configured adapter count). Pass --adapters for per-project tables and per-intent index detail. pond search --explain returns Lance's analyze_plan output for each retrieval arm.
Configuration
pond init walks through everything below interactively; pond sync also discovers sources on first run and writes them to config.toml (under $XDG_CONFIG_HOME/pond/). Every [sources.<name>] block needs enabled = true to be active; sections without it (or with enabled = false) are skipped. Re-enable interactively with pond sync <name>.
[]
= true
= "~/.claude/projects"
[]
= false # kept in config, skipped on `pond sync`
= "~/.codex/sessions"
Verbosity
Root-level -v / -vv / -vvv raise the tracing level (info / debug / trace); -q / -qq lower it. The default surfaces warnings only. RUST_LOG overrides the CLI flag when set; POND_LOG is no longer honored.
Design
The full contract is in docs/spec.md. Key choices:
- Lance direct, no wrapper. The
lance-format/lancecrates are the only storage and search engine. Nolancedb, no parallel abstraction. Storage, indexing, OCC, schema evolution, blob columns, versioning, and time-travel are all Lance. The read-onlypond sqlsurface is DataFusion planning over the same Lance datasets - a query escape hatch, not a second engine. - Canonical Session / Message / Part interlingua. Owned in pond, in the shape of Effect v4's
Prompt-side Part union. This schema is pond's product; everything else is machinery around it. - Three Lance datasets (
sessions,messages,parts).messagescarries the nullable embedding (vector+embedding_model) alongside denormalized filter columns (source_agent/project/role/timestamp) for single-stage filter pushdown. - No-synthesis adapter seam. Adapters parse source records through extractor helpers that make "invent a value" a compile error -
model-no-synthesis,model-schema-honesty, andadapter-provenance-requiredare structural, not review rules. - Index lifecycle decoupled from writes. Writes commit data without folding indexes.
pond syncruns index maintenance by default, andpond sync --only update-indexesruns it on demand; Lance merges index results with a flat scan over unindexed fragments, so reads stay correct. - Score-normalized hybrid fusion. Per-arm shaping (max-norm BM25 for FTS, rank-norm for vector), min-max to [0, 1], then weighted sum. Session-root-keyed dedup so cross-arm agreement compounds at the conversation level.
- Language-neutral full-text. Character
ngramtokenizer (3-5), no monolingual stemmer - pond indexes sessions in any language alike. - Two transports, one handler set. HTTP+JSON (axum) and MCP (rmcp) both dispatch into the same handlers. Wire ops:
pond_search,pond_get,pond_ingest. MCP additionally exposes the read-onlypond_sql_querytool and theschema://pond,schema://pond-sql, andstats://pondresources. - Opaque-string multi-tenancy. Each tenant is a
namespacestring the integrator supplies; pond does not authenticate, authorize, or model identity. The object store's IAM is the storage boundary. - Encryption is operational. Bucket SSE plus filesystem encryption; pond holds no keys and adds no application-level crypto.
References
The upstream schemas that shaped pond's canonical model are documented in docs/references/ (source URLs + why each matters; the vendored code itself is not redistributed). Real session captures live under tests/fixtures/adapter/.
| Source | Why it matters |
|---|---|
| Effect-TS/effect | Effect v4 Prompt/Response Part unions. Pond's canonical types copy this shape. |
| sst/opencode | Effect Schema canonical Part union; SDK types; storage schema. |
| kilo-org/kilocode | OpenCode fork. Adds editorContext, plan-followup, kilocode-specific events. |
| badlogic/pi-mono | pi-coding-agent leaf-cursor branching and cross-provider conformance test matrix. |
| open-telemetry/semantic-conventions-genai | GenAI semantic conventions. Inspiration for shape overlap; pond does not derive from OTel. |
tests/fixtures/adapter/ |
Real session captures for nine source harnesses (claude_ai_export, claude_code, claude_desktop_app, claude_managed_agents, codex_cli, nanoclaw, openclaw, opencode, pi-coding-agent). Drives adapter design and serves as adapter test fixtures. |
Contributing
Issues and pull requests are welcome. The most useful contributions right now:
- Spec feedback on
docs/spec.md. - Pointers to additional reference schemas or session samples worth documenting under
docs/references/. - Bug reports against the v1 surface (CLI verbs, wire ops, schema mismatches, OCC behavior, object-store backends).
For larger changes, open an issue first to discuss the direction. For security issues, see SECURITY.md.
License
Apache-2.0 (c) 2026 tenequm