Expand description
The storage substrate (spec.md#substrate): pond’s one seam to Lance, generic over consumers.
Structs§
- Conflict
Exhausted - Anyhow-chain sentinel pond attaches when
retry_lanceexhausts attempts against an OCC commit-conflict failure (spec.md#protocol). The wire layer downcasts to this type to classify the outcome asconflictrather than the genericstorage_unavailable. - Data
Liveness data/bytes on disk vs bytes the latest manifest references; the gap is superseded versions awaiting the cleanup retention window.- Handle
- Index
Intent - Declarative description of one index pond keeps on a table. Created when
its trigger fires; folded forward by
pond index optimize. - Index
Status - Maintenance
Policy - Resolved per-call inputs to the storage-maintenance pass. Built from
[maintenance](and any per-invocation CLI override) at the entry point; threaded down tooptimize_table_compactso the substrate never re-readsConfigitself. - Resolved
Storage - A storage address with its options assembled and secrets materialized -
everything
Store::open_with_optionsneeds, plus the binding for display. - Runtime
Caps - Lance cache caps in bytes.
Nonelets the substrate pick the backend-aware default (local FS gets a tighter cap; object stores stay near Lance’s defaults). Wired throughStore::open_with_optionsfrom[runtime]. - Scan
Opts - Read-side options for
Handle::scan: optional prefilter predicate and optional projection. Default = no filter, all columns. - Storage
Url - A parsed pond storage address. The fat-URL grammar
(
s3+https://host/bucket/prefix) folds the endpoint into the address so it can never desync from the bucket (the litestream out-of-band-endpoint failure class); parsing splits it back into the URL Lance opens plus theobject_storeoptions the endpoint implies. - Table
Optimize Outcome - What
Handle::optimize_tabledid for one table. - Table
Sizes - On-disk byte totals for the three session datasets, plus everything else
under the data-dir root. Sized by listing through Lance’s object-store
layer (spec.md#lance-chokepoints-storage) so
file://ands3://behave alike.
Enums§
- BindVia
- How a creds set got bound to a URL - surfaced in binding lines so a wrong match is visible before any auth error.
- Check
Failure pond storage checkfailure classes, each with its own exit code at the CLI so cron and CI can branch on them. Display carries only the fix-naming lead; the underlying error is exposed separately throughCheckFailure::concise_causeso surfaces stay one readable line instead of trailing the upstream chain (Lance flattens its inner errors into each level’s Display, so the raw chain prints the same failure several times over).- Creds
Binding - Index
Params Kind - The lance-native shape of an
IndexIntent’s params, dispatched to the rightIndexParamsat create time. - Index
Trigger - When an
IndexIntentshould exist on disk. - Optimize
Event - Boundary event during one
Handle::optimize_tablepass. The CLI binds a progress callback to render a live spinner; library callers passNone. - Optimize
Phase - Phase
Outcome - Per-phase result for one table’s pass through
Handle::optimize_table. spec.md#substrate 3.7 (lance-index-maintenance): the indices phase and the compaction phase get independent retry budgets and independent commits, so a hot writer that starves the Rewrite cannot abort the index Update. - Predicate
- Scalar
Value - Table
Constants§
- COMPACTION_
ABSORB_ FACTOR - Keep a task only when the merged-in remainder is >= largest/this: size-tiered amortization, O(log n) lifetime rewrites per row.
- DEFAULT_
COMPACTION_ FRAGMENT_ CAP - Per-task fragment-count backstop: tasks this wide always run, bounding manifest growth even when the amplification veto would skip them. As policy cap, 0 disables the veto (tests).
- DEFAULT_
INDEX_ LAG_ THRESHOLD - Default minimum unindexed-fragment count required before a per-intent
append/rebuild step is admitted into
optimize_table_indices. Lower values make each commit smaller and more frequent (bad on remote stores); higher values let fragments accumulate behind the brute-force fallback. 4 is the floor of the documented 4-8 band. - TARGET_
FRAGMENT_ BYTES - Fragments are sized by bytes, not Lance’s 1M-row default: kilobyte-average rows make a row target tolerate multi-GiB fragments that compaction re-rewrites wholesale to absorb tiny appends (~190 GiB/day of churn).
- VECTOR_
INDEX_ ACTIVATION_ ROWS - Embedded-row count at which pond builds the IVF_PQ vector index on
messages.vector(spec.md#search). Below it, vector search runs a brute-force flat scan - exact and fast at small and medium scale, and IVF_PQ cannot train well on fewer vectors anyway.
Functions§
- default_
cleanup_ older_ than - Default manifest-retention window for the safe cleanup pass. Matches
LanceDB’s recommended OSS-operator practice (lancedb docs: performance.mdx,
tables/update.mdx). With
delete_unverified=false, Lance’s 7-day in-progress guard still protects unverified files regardless of this value (UNVERIFIED_THRESHOLD_DAYSin lance/dataset/cleanup.rs). - index_
lag_ threshold - init_
index_ lag_ threshold - Seed the process-wide index-lag threshold from
[maintenance].index_lag_threshold. First call wins (mirrorsembed::init_model_id/sessions::init_embedding_dim). - is_
commit_ conflict - True when the chain root is one of Lance’s commit-conflict variants
(
CommitConflict,RetryableCommitConflict,TooMuchWriteContention). Everything else (timeouts, IAM denials, disk errors) is not a conflict. - storage_
check - Probe a resolved storage destination end-to-end (spec.md#substrate): a
conditional
PutMode::Createpair proving theIf-None-Match-> 412 OCC primitive Lance’s commit handler relies on, then read-back and delete of the synthetic key. - unmatched_
creds_ sets - Names of defined creds sets that bound to none of this invocation’s URLs (spec.md#creds-scope-match: misbinding must never be silent). Empty when the invocation touched no credential-taking URL - a local-only command must not nag about sets kept for remote work.