Expand description
Paper 3 — speculative tool-call dispatcher.
Given an EnrichmentPlan from the planner, spawns each prefetch
as a Tokio task, caps concurrency per rate_limit_host (so two
providers hitting the same domain share the budget), waits up to
prefetch_timeout_ms for results to land, and aborts everything
still pending on session shutdown.
§Design
- Dispatcher trait —
PrefetchDispatcherabstracts the actualtools/callpath so tests can plug in a mock without pulling in MCP transport. The real impl wraps the server’s handler and is wired inSessionPipeline. - Per-host concurrency cap — a
Mutex<HashMap<host, in_flight>>tracks in-flight prefetches per rate-limit host; the dispatcher refuses to schedule a call when the cap is hit.Nonehost = unlimited (local tool). - Bounded synchronous wait —
SpeculationEngine::wait_withinblocks at mostprefetch_timeout_mscollecting results that landed in time; anything still pending keeps running in the background and lands later via the dedup cache. - Cascade cancellation —
SpeculationEngine::shutdown(also called fromDrop) aborts every pending task. No orphan IO.
Telemetry counters (prefetch_dispatched, prefetch_won_race,
prefetch_wasted) are updated by the caller; this module only
reports the outcomes.
Structs§
- Host
Budget - Per-host in-flight counter. Cheap clone (Arc).
- Prefetch
Request - One unit of work the engine decides about. Public so the host can
produce the list (combining
EnrichmentPlan.callswith extracted args fromprojection) before handing it to the engine. - Speculation
Engine - Per-turn speculation engine. One instance per
SessionPipeline; holds the JoinSet and the host budget. Drop = shutdown.
Enums§
- Prefetch
Error - Prefetch
Outcome - Outcome of a single prefetch task as observed by
SpeculationEngine::wait_within. - Skip
Reason
Traits§
- Prefetch
Dispatcher - Abstracts how a prefetch is actually executed. The real impl wraps
the MCP server’s
tools/callhandler; the test impl returns a canned body or an error after an optional sleep.