Skip to main content

Module embed

Module embed 

Source
Expand description

The embedding stage: candle XLM-RoBERTa FP16 (CandleEmbedder) plus the batch-oriented EmbedWorker that fills messages.vector / messages.embedding_model (spec.md#search). One message produces one vector - there is no chunking.

LazyEmbedder caches a loaded backend for pond mcp / pond serve and drops it after DEFAULT_IDLE_EVICTION of no use. The drop is clean under macOS phys_footprint (post-drop drops to ~107 MiB regardless of backend), so time-weighted RSS over an interactive MCP session stays well under the per-instance budget despite the macOS Metal buffer pool’s iokit_mapped retention during active queries.

The worker accumulates messages and calls the model once per fixed-size batch, never once per message, and writes each batch’s vectors to messages in one column-update commit.

Structs§

BatchProgress
Per-batch stats handed to a progress callback. Lets pond embed drive an indicatif bar without leaking the crate into this module’s API.
CandleEmbedder
The candle e5 backend: XLM-RoBERTa FP16 weights on the GPU (Metal on macOS, CUDA on a cuda-feature non-macOS build, CPU otherwise). forward is &self, so no interior mutability is needed.
EmbedSummary
Outcome of an EmbedWorker::run pass.
EmbedWorker
Fills messages.vector / messages.embedding_model for the backlog of un-embedded messages. Reads messages.search_text directly, batches it through the backend one vector each, and writes each batch back to messages by primary key.
LazyEmbedder
Lazy holder for an Embedder with idle eviction. The model isn’t loaded until the first hybrid/vector call asks for it - idle pond mcp / pond serve processes pay nothing while no vector queries land. After idle_threshold of inactivity the cached backend is dropped on the next get call; under macOS phys_footprint the drop reclaims ~365-585 MiB cleanly (the post-drop floor is ~107 MiB regardless of backend). Reload cost is one synchronous model-load (300-500 ms), absorbed inside the human-paced gap between MCP queries.

Constants§

DEFAULT_BATCH_SIZE
Messages per model-inference + write batch. e5 truncates at 512 tokens, so a 32-row batch’s padded attention transient stays bounded.
DEFAULT_IDLE_EVICTION
How long the cached backend can sit unused before LazyEmbedder::get drops it. Five minutes matches typical interactive-MCP conversational pauses: short enough that a model that’s been unused for a turn or two is gone before the next quiet window, long enough that ordinary query bursts never pay the reload cost.
DEFAULT_MODEL_ID
Default embedding model pond ships a loader for (spec.md#search). Used when [embeddings].model is absent. pond embed stamps the runtime model id (see model_id) into messages.embedding_model with every vector. e5-small (384-dim) is the default; scripts/search-benchmarks/queries-paraphrased.tsv showed no statistically-significant quality loss vs e5-base while halving vector storage and ~halving model RSS.
DEFAULT_SORT_WINDOW
Messages buffered and length-sorted before being cut into model batches. The tokenizer pads every batch to its longest member, so a batch mixing a short and a long message embeds the short one at the long one’s length. Sorting a window first clusters similar-length messages, so each batch pads near its own longest, not the corpus worst case. Bounded so peak memory stays one window, not the whole backlog. See EmbedWorker::with_sort_window.

Traits§

Embedder
The embedding seam (spec.md#search): text in, vectors out. The real backend is CandleEmbedder; tests substitute an instrumented fake to assert batching behavior. The vector width is checked at the write boundary and the model id is whatever model_id returns at the time of the write.

Functions§

format_passage
Format a document (one message’s search_text) for the embedder - the passage: half of the pair documented on format_query. Used by EmbedWorker when batching messages for pond embed.
format_query
Format a search query for the embedder. e5 is an asymmetric retriever: its model card prescribes query: on the search side, passage: on documents. Used by pond_search to prepare the query text before the candle/Metal embed call.
init_model_id
Seed model_id from config. First call wins; later calls with a different id are silently ignored - the process loads its config once.
model_id
The active model id. Returns the value installed by init_model_id or DEFAULT_MODEL_ID when nothing has installed one (tests, ad-hoc tooling).