Module sampling_coordinator

Expand description

SamplingCoordinator — coalesces N concurrent SamplingClient::create_message calls into ⌈N/M⌉ calls within a configurable time window or batch-size limit.

Per the v0.9.0 design (docs/dev-log/0098-v0.9.0-implementation-plan.md §4 P4 / Risk #7 / MAJOR 3 batching resolution):

§Why batch?

Each sampling/createMessage call surfaces ONE approval prompt in the user’s MCP client (Claude Desktop / Claude Code / future clients). When the daemon-side consolidate-timer fires solo_storage::triples_batch::run_triples_batch_tick, it can produce N per-cluster sampling calls in quick succession — N separate approval prompts spam the user.

SamplingCoordinator collapses N calls within a window window into ONE coalesced peer.create_message call. The user sees ONE approval per coalesce window; the per-cluster results are demultiplexed back to the individual callers via their oneshot reply channels.

§When NOT to batch

Coordinator is bypassed for non-sampling backends (Anthropic / Ollama / None) — those don’t surface approval prompts and have their own rate-limiting concerns. The coordinator inserts itself ONLY when wrapping [PeerSamplingClient] / a fake equivalent.

§Coalesce strategy

Single-request batch: passes through as a normal create_message call, with no prompt rewriting. Zero behaviour change from the v0.9.0 P2 path.
Multi-request batch (N > 1): wraps each request in a numbered JSON object, asks the LLM for a JSON array of responses, parses the array, demultiplexes per-task. The prompt template is documented in [build_coalesced_request].

§Privacy invariant

The audit emit per logical request (one AuditOperation::LlmSamplingCall row per submitted [SamplingLlmClient::complete]) STAYS — the coordinator is an optimisation on the wire, NOT a change to the audit shape. See plan §11 Risk #8 — operators MUST be able to count per-logical-call audit rows, not per-coalesce.

Structs§

SamplingCoordinator: Wrapper around a SamplingClient that coalesces concurrent create_message calls into batched create_message calls (within a configurable time window OR batch-size limit).

Constants§

DEFAULT_COALESCE_MAX_BATCH: Default max-batch size: 10 logical requests per coalesced create_message. Plan §4 P4c default — caps the rendered prompt size + prevents one slow batch from holding the worker indefinitely.
DEFAULT_COALESCE_WINDOW: Default coalesce window: 5 seconds. Plan §4 P4c default — chosen so the user’s approval-prompt latency stays under typical MCP-session “I’m doing work” tolerance.