pub struct EnrichmentEffectiveness {Show 14 fields
pub total_prefetches: u32,
pub cited_prefetches: u32,
pub total_declines: u32,
pub late_invoked_after_decline: u32,
pub cost_overrun_count: u32,
pub total_predictions: u32,
pub net_prediction_error_tokens: i64,
pub inference_calls_saved_prefetch: u32,
pub inference_calls_saved_dedup: u32,
pub inference_calls_saved_fail_fast: u32,
pub inference_tokens_saved: u64,
pub prefetch_dispatched: u32,
pub prefetch_won_race: u32,
pub prefetch_wasted: u32,
}Expand description
Aggregate scoring of how well the Paper 3 enrichment planner served
the agent during a session. Populated by the live pipeline (counters)
plus the offline post-pass (cited_* numbers, see P-3-08).
Three primary rates the operator reads:
- Prefetch hit rate — fraction of planner-prefetched calls whose content was textually cited by the LLM in the next 1–3 turns. The north-star efficiency number; target ≥ 60%.
- Decline recall loss — fraction of declined candidates the LLM ended up calling itself within the next 5 turns. Higher means the planner is too greedy. Target ≤ 10%.
- Cost overrun rate — fraction of admitted calls whose actual
tokens_baselineexceeded the predicted cost by ≥ 30%. Drives refresh ofcost_model.typical_kbpriors. Target ≤ 15%.
And the operator-facing ROI counters:
inference_calls_saved_*— number of LLM round-trips the planner short-circuited, broken into three buckets so the contribution of each mechanism stays visible:prefetch(cited speculative calls),dedup(Paper 2 L0 hits — tool body replaced with a near-ref hint so the LLM never sees the full payload), andfail_fast(e.g. ToolSearch self-loop blocked afterfail_fast_after_n).inference_tokens_saved— sum oftokens_baselinefrom those short-circuited calls. The headline “we saved this much context” number fortune analyze.
Token savings vs a no-planner baseline is the roll-up “did the enricher pay for itself” answer; it lives in the corpus-replay validation harness (Paper 3 §Validation strategy), not on this summary, because it requires running the same session both with and without the planner. This struct carries only the per-session counters that drive the three rates above.
Fields§
§total_prefetches: u32Number of calls the planner pre-fetched.
cited_prefetches: u32Of total_prefetches, the count whose content was cited by the
LLM in the next 1–3 turns. Filled in by the offline post-pass;
stays 0 until the post-pass has run.
total_declines: u32Number of candidates the planner declined for any reason.
late_invoked_after_decline: u32Of total_declines, the count where the LLM later issued the
declined tool itself within the next 5 turns. Lower-is-better.
cost_overrun_count: u32Number of admitted calls whose actual tokens_baseline exceeded
the planner’s prediction by ≥ 30%.
total_predictions: u32Total admitted calls (denominator for cost_overrun_rate).
net_prediction_error_tokens: i64Sum of predicted-vs-actual prediction error in tokens — useful for diagnosing systematic under- or over-estimation.
inference_calls_saved_prefetch: u32LLM tool-uses avoided because the planner pre-fetched the
content and the model cited it in the next 1–3 turns. Counted
only when PipelineEvent::cited_in_next_n_turns is Some(true).
inference_calls_saved_dedup: u32LLM tool-uses avoided because L0 dedup replaced the response
with a near-ref hint. Counted on every event with
is_dedup_hit = true.
inference_calls_saved_fail_fast: u32LLM tool-uses avoided because crate::enrichment short-
circuited a fail_fast_after_n loop (e.g. ToolSearch returning
0 bytes twice in a row). Incremented from the planner side via
Self::record_fail_fast_skip.
inference_tokens_saved: u64Sum of baseline tokens from all three saved-call buckets. The
“we saved this much context” headline for tune analyze.
prefetch_dispatched: u32Number of speculative tool-calls the host actually dispatched
out-of-band (a subset of total_prefetches: the fraction the
host successfully scheduled, not just plans the planner
produced).
prefetch_won_race: u32Of prefetch_dispatched, the count where the prefetch result
landed in the dedup cache before the LLM asked for the same
tool, so the LLM’s call collapsed to an L0 hit. The other axis
of “did the speculation pay off” — independent of textual
citation.
prefetch_wasted: u32Prefetches the LLM never asked for in the same session. Wasted
API quota / dollars; high values trigger R7’s per-tool
auto-disable in tune analyze.
Implementations§
Source§impl EnrichmentEffectiveness
impl EnrichmentEffectiveness
Sourcepub fn prefetch_hit_rate(&self) -> Option<f32>
pub fn prefetch_hit_rate(&self) -> Option<f32>
Fraction of prefetches that paid off (cited by the LLM).
Returns None when no prefetches happened — distinct from a
0% hit rate.
Sourcepub fn decline_recall_loss(&self) -> Option<f32>
pub fn decline_recall_loss(&self) -> Option<f32>
Fraction of declined candidates the LLM later called anyway.
Sourcepub fn cost_overrun_rate(&self) -> Option<f32>
pub fn cost_overrun_rate(&self) -> Option<f32>
Fraction of admitted calls whose actual baseline exceeded the prediction by ≥ 30%.
Sourcepub fn total_calls_saved(&self) -> u32
pub fn total_calls_saved(&self) -> u32
Total LLM tool-uses the planner short-circuited across all three buckets. The headline “round-trips avoided” number.
Sourcepub fn accumulate(&mut self, ev: &PipelineEvent)
pub fn accumulate(&mut self, ev: &PipelineEvent)
Fold one PipelineEvent into the per-session counters.
Inspects the four enricher-specific fields plus is_dedup_hit
and tokens_baseline/tokens_final to maintain:
total_prefetches/total_predictions/cost_overrun_*whenenricher_prefetched = true.cited_prefetchesandinference_calls_saved_prefetchwhen the offline post-pass has setcited_in_next_n_turns = Some(true).total_declineswhenenricher_decline_reasonis set.inference_calls_saved_dedup(and the correspondinginference_tokens_saved) on every L0 dedup hit.
Use it to drive SessionSummary.enrichment from the live
pipeline or from a JSONL post-pass — same accumulator either way.
Sourcepub fn record_fail_fast_skip(&mut self, predicted_cost_tokens: u32)
pub fn record_fail_fast_skip(&mut self, predicted_cost_tokens: u32)
Record a fail_fast_after_n short-circuit — the planner refused
to issue a tool call (e.g. a third empty ToolSearch), so no
PipelineEvent is ever emitted for it. Call this from the
planner side to keep inference_calls_saved_fail_fast honest.
predicted_cost_tokens is the per-call estimate from the
tool’s cost_model — added to inference_tokens_saved so the
fail-fast contribution shows up in the headline number.
Sourcepub fn record_prefetch_dispatched(&mut self)
pub fn record_prefetch_dispatched(&mut self)
Record that the host actually dispatched a speculative tool
call (a subset of total_prefetches: planner produced a plan
and the dispatcher succeeded in scheduling it). Increment
alongside total_prefetches from the host side; mismatches
between the two surface as “planner produced more than
dispatcher could schedule” — concurrency cap saturated.
Sourcepub fn record_prefetch_won_race(&mut self)
pub fn record_prefetch_won_race(&mut self)
Record that a dispatched prefetch landed in the dedup cache before the LLM asked for the same tool, so the LLM’s call collapsed to an L0 hit. Independent of textual citation — the LLM still issued the tool, but our prefetched body served the answer at zero added latency.
Sourcepub fn record_prefetch_wasted(&mut self)
pub fn record_prefetch_wasted(&mut self)
Record that a dispatched prefetch was never claimed by the
LLM during the rest of the session (offline post-pass tally).
High prefetch_wasted / prefetch_dispatched ratio is the
signal tune analyze watches for R7’s per-tool auto-disable.
Sourcepub fn prefetch_race_win_rate(&self) -> Option<f32>
pub fn prefetch_race_win_rate(&self) -> Option<f32>
Fraction of dispatched prefetches that beat the LLM to the
dedup cache. None when nothing was dispatched.
Sourcepub fn prefetch_waste_rate(&self) -> Option<f32>
pub fn prefetch_waste_rate(&self) -> Option<f32>
Fraction of dispatched prefetches that were never claimed by
the LLM. None when nothing was dispatched. Higher means the
planner’s speculation was wasted — drive R7’s auto-disable.
Trait Implementations§
Source§impl Clone for EnrichmentEffectiveness
impl Clone for EnrichmentEffectiveness
Source§fn clone(&self) -> EnrichmentEffectiveness
fn clone(&self) -> EnrichmentEffectiveness
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for EnrichmentEffectiveness
impl Debug for EnrichmentEffectiveness
Source§impl Default for EnrichmentEffectiveness
impl Default for EnrichmentEffectiveness
Source§fn default() -> EnrichmentEffectiveness
fn default() -> EnrichmentEffectiveness
Source§impl<'de> Deserialize<'de> for EnrichmentEffectiveness
impl<'de> Deserialize<'de> for EnrichmentEffectiveness
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl PartialEq for EnrichmentEffectiveness
impl PartialEq for EnrichmentEffectiveness
Source§fn eq(&self, other: &EnrichmentEffectiveness) -> bool
fn eq(&self, other: &EnrichmentEffectiveness) -> bool
self and other values to be equal, and is used by ==.