Module speculation

Expand description

Paper 3 — speculative tool-call dispatcher.

Given an EnrichmentPlan from the planner, spawns each prefetch as a Tokio task, caps concurrency per rate_limit_host (so two providers hitting the same domain share the budget), waits up to prefetch_timeout_ms for results to land, and aborts everything still pending on session shutdown.

§Design

Dispatcher trait — PrefetchDispatcher abstracts the actual tools/call path so tests can plug in a mock without pulling in MCP transport. The real impl wraps the server’s handler and is wired in SessionPipeline.
Per-host concurrency cap — a Mutex<HashMap<host, in_flight>> tracks in-flight prefetches per rate-limit host; the dispatcher refuses to schedule a call when the cap is hit. None host = unlimited (local tool).
Bounded synchronous wait — SpeculationEngine::wait_within blocks at most prefetch_timeout_ms collecting results that landed in time; anything still pending keeps running in the background and lands later via the dedup cache.
Cascade cancellation — SpeculationEngine::shutdown (also called from Drop) aborts every pending task. No orphan IO.

Telemetry counters (prefetch_dispatched, prefetch_won_race, prefetch_wasted) are updated by the caller; this module only reports the outcomes.

Structs§

HostBudget: Per-host in-flight counter. Cheap clone (Arc).
PrefetchRequest: One unit of work the engine decides about. Public so the host can produce the list (combining EnrichmentPlan.calls with extracted args from projection) before handing it to the engine.
SpeculationEngine: Per-turn speculation engine. One instance per SessionPipeline; holds the JoinSet and the host budget. Drop = shutdown.

Enums§

PrefetchError
PrefetchOutcome: Outcome of a single prefetch task as observed by SpeculationEngine::wait_within.
SkipReason

Traits§

PrefetchDispatcher: Abstracts how a prefetch is actually executed. The real impl wraps the MCP server’s tools/call handler; the test impl returns a canned body or an error after an optional sleep.