Module embed

Expand description

The embedding stage: candle XLM-RoBERTa FP16 (CandleEmbedder) plus the batch-oriented EmbedWorker that fills messages.vector / messages.embedding_model (spec.md#search). One message produces one vector - there is no chunking.

LazyEmbedder caches a loaded backend for pond mcp / pond serve and drops it after DEFAULT_IDLE_EVICTION of no use. The drop is clean under macOS phys_footprint (post-drop drops to ~107 MiB regardless of backend), so time-weighted RSS over an interactive MCP session stays well under the per-instance budget despite the macOS Metal buffer pool’s iokit_mapped retention during active queries.

The worker accumulates messages and calls the model once per fixed-size batch, never once per message, and writes each batch’s vectors to messages in one column-update commit.

Structs§

BatchProgress: Per-batch stats handed to a progress callback. Lets pond embed drive an indicatif bar without leaking the crate into this module’s API.
CandleEmbedder: The candle e5 backend: XLM-RoBERTa FP16 weights on the GPU (Metal on macOS, CUDA on a cuda-feature non-macOS build, CPU otherwise). forward is &self, so no interior mutability is needed.
EmbedSummary: Outcome of an EmbedWorker::run pass.
EmbedWorker: Fills messages.vector / messages.embedding_model for the backlog of un-embedded messages. Reads messages.search_text directly, batches it through the backend one vector each, and writes each batch back to messages by primary key.
LazyEmbedder: Lazy holder for an Embedder with idle eviction. The model isn’t loaded until the first hybrid/vector call asks for it - idle pond mcp / pond serve processes pay nothing while no vector queries land. After idle_threshold of inactivity the cached backend is dropped on the next get call; under macOS phys_footprint the drop reclaims ~365-585 MiB cleanly (the post-drop floor is ~107 MiB regardless of backend). Reload cost is one synchronous model-load (300-500 ms), absorbed inside the human-paced gap between MCP queries.

Constants§

DEFAULT_BATCH_SIZE: Messages per model-inference + write batch. e5 truncates at 512 tokens, so a 32-row batch’s padded attention transient stays bounded.
DEFAULT_IDLE_EVICTION: How long the cached backend can sit unused before LazyEmbedder::get drops it. Five minutes matches typical interactive-MCP conversational pauses: short enough that a model that’s been unused for a turn or two is gone before the next quiet window, long enough that ordinary query bursts never pay the reload cost.
DEFAULT_MODEL_ID: Default embedding model pond ships a loader for (spec.md#search). Used when [embeddings].model is absent. pond embed stamps the runtime model id (see model_id) into messages.embedding_model with every vector. e5-small (384-dim) is the default; scripts/search-benchmarks/queries-paraphrased.tsv showed no statistically-significant quality loss vs e5-base while halving vector storage and ~halving model RSS.
DEFAULT_SORT_WINDOW: Messages buffered and length-sorted before being cut into model batches. The tokenizer pads every batch to its longest member, so a batch mixing a short and a long message embeds the short one at the long one’s length. Sorting a window first clusters similar-length messages, so each batch pads near its own longest, not the corpus worst case. Bounded so peak memory stays one window, not the whole backlog. See EmbedWorker::with_sort_window.

Traits§

Embedder: The embedding seam (spec.md#search): text in, vectors out. The real backend is CandleEmbedder; tests substitute an instrumented fake to assert batching behavior. The vector width is checked at the write boundary and the model id is whatever model_id returns at the time of the write.

Functions§

format_passage: Format a document (one message’s search_text) for the embedder - the passage: half of the pair documented on format_query. Used by EmbedWorker when batching messages for pond embed.
format_query: Format a search query for the embedder. e5 is an asymmetric retriever: its model card prescribes query: on the search side, passage: on documents. Used by pond_search to prepare the query text before the candle/Metal embed call.
init_model_id: Seed model_id from config. First call wins; later calls with a different id are silently ignored - the process loads its config once.
model_id: The active model id. Returns the value installed by init_model_id or DEFAULT_MODEL_ID when nothing has installed one (tests, ad-hoc tooling).

Module embed

Module embed Copy item path

Structs§

Constants§

Traits§

Functions§

Module embed