Skip to main content

EnrichmentEffectiveness

Struct EnrichmentEffectiveness 

Source
pub struct EnrichmentEffectiveness {
Show 14 fields pub total_prefetches: u32, pub cited_prefetches: u32, pub total_declines: u32, pub late_invoked_after_decline: u32, pub cost_overrun_count: u32, pub total_predictions: u32, pub net_prediction_error_tokens: i64, pub inference_calls_saved_prefetch: u32, pub inference_calls_saved_dedup: u32, pub inference_calls_saved_fail_fast: u32, pub inference_tokens_saved: u64, pub prefetch_dispatched: u32, pub prefetch_won_race: u32, pub prefetch_wasted: u32,
}
Expand description

Aggregate scoring of how well the Paper 3 enrichment planner served the agent during a session. Populated by the live pipeline (counters) plus the offline post-pass (cited_* numbers, see P-3-08).

Three primary rates the operator reads:

  • Prefetch hit rate — fraction of planner-prefetched calls whose content was textually cited by the LLM in the next 1–3 turns. The north-star efficiency number; target ≥ 60%.
  • Decline recall loss — fraction of declined candidates the LLM ended up calling itself within the next 5 turns. Higher means the planner is too greedy. Target ≤ 10%.
  • Cost overrun rate — fraction of admitted calls whose actual tokens_baseline exceeded the predicted cost by ≥ 30%. Drives refresh of cost_model.typical_kb priors. Target ≤ 15%.

And the operator-facing ROI counters:

  • inference_calls_saved_* — number of LLM round-trips the planner short-circuited, broken into three buckets so the contribution of each mechanism stays visible: prefetch (cited speculative calls), dedup (Paper 2 L0 hits — tool body replaced with a near-ref hint so the LLM never sees the full payload), and fail_fast (e.g. ToolSearch self-loop blocked after fail_fast_after_n).
  • inference_tokens_saved — sum of tokens_baseline from those short-circuited calls. The headline “we saved this much context” number for tune analyze.

Token savings vs a no-planner baseline is the roll-up “did the enricher pay for itself” answer; it lives in the corpus-replay validation harness (Paper 3 §Validation strategy), not on this summary, because it requires running the same session both with and without the planner. This struct carries only the per-session counters that drive the three rates above.

Fields§

§total_prefetches: u32

Number of calls the planner pre-fetched.

§cited_prefetches: u32

Of total_prefetches, the count whose content was cited by the LLM in the next 1–3 turns. Filled in by the offline post-pass; stays 0 until the post-pass has run.

§total_declines: u32

Number of candidates the planner declined for any reason.

§late_invoked_after_decline: u32

Of total_declines, the count where the LLM later issued the declined tool itself within the next 5 turns. Lower-is-better.

§cost_overrun_count: u32

Number of admitted calls whose actual tokens_baseline exceeded the planner’s prediction by ≥ 30%.

§total_predictions: u32

Total admitted calls (denominator for cost_overrun_rate).

§net_prediction_error_tokens: i64

Sum of predicted-vs-actual prediction error in tokens — useful for diagnosing systematic under- or over-estimation.

§inference_calls_saved_prefetch: u32

LLM tool-uses avoided because the planner pre-fetched the content and the model cited it in the next 1–3 turns. Counted only when PipelineEvent::cited_in_next_n_turns is Some(true).

§inference_calls_saved_dedup: u32

LLM tool-uses avoided because L0 dedup replaced the response with a near-ref hint. Counted on every event with is_dedup_hit = true.

§inference_calls_saved_fail_fast: u32

LLM tool-uses avoided because crate::enrichment short- circuited a fail_fast_after_n loop (e.g. ToolSearch returning 0 bytes twice in a row). Incremented from the planner side via Self::record_fail_fast_skip.

§inference_tokens_saved: u64

Sum of baseline tokens from all three saved-call buckets. The “we saved this much context” headline for tune analyze.

§prefetch_dispatched: u32

Number of speculative tool-calls the host actually dispatched out-of-band (a subset of total_prefetches: the fraction the host successfully scheduled, not just plans the planner produced).

§prefetch_won_race: u32

Of prefetch_dispatched, the count where the prefetch result landed in the dedup cache before the LLM asked for the same tool, so the LLM’s call collapsed to an L0 hit. The other axis of “did the speculation pay off” — independent of textual citation.

§prefetch_wasted: u32

Prefetches the LLM never asked for in the same session. Wasted API quota / dollars; high values trigger R7’s per-tool auto-disable in tune analyze.

Implementations§

Source§

impl EnrichmentEffectiveness

Source

pub fn prefetch_hit_rate(&self) -> Option<f32>

Fraction of prefetches that paid off (cited by the LLM). Returns None when no prefetches happened — distinct from a 0% hit rate.

Source

pub fn decline_recall_loss(&self) -> Option<f32>

Fraction of declined candidates the LLM later called anyway.

Source

pub fn cost_overrun_rate(&self) -> Option<f32>

Fraction of admitted calls whose actual baseline exceeded the prediction by ≥ 30%.

Source

pub fn total_calls_saved(&self) -> u32

Total LLM tool-uses the planner short-circuited across all three buckets. The headline “round-trips avoided” number.

Source

pub fn accumulate(&mut self, ev: &PipelineEvent)

Fold one PipelineEvent into the per-session counters.

Inspects the four enricher-specific fields plus is_dedup_hit and tokens_baseline/tokens_final to maintain:

  1. total_prefetches / total_predictions / cost_overrun_* when enricher_prefetched = true.
  2. cited_prefetches and inference_calls_saved_prefetch when the offline post-pass has set cited_in_next_n_turns = Some(true).
  3. total_declines when enricher_decline_reason is set.
  4. inference_calls_saved_dedup (and the corresponding inference_tokens_saved) on every L0 dedup hit.

Use it to drive SessionSummary.enrichment from the live pipeline or from a JSONL post-pass — same accumulator either way.

Source

pub fn record_fail_fast_skip(&mut self, predicted_cost_tokens: u32)

Record a fail_fast_after_n short-circuit — the planner refused to issue a tool call (e.g. a third empty ToolSearch), so no PipelineEvent is ever emitted for it. Call this from the planner side to keep inference_calls_saved_fail_fast honest.

predicted_cost_tokens is the per-call estimate from the tool’s cost_model — added to inference_tokens_saved so the fail-fast contribution shows up in the headline number.

Source

pub fn record_prefetch_dispatched(&mut self)

Record that the host actually dispatched a speculative tool call (a subset of total_prefetches: planner produced a plan and the dispatcher succeeded in scheduling it). Increment alongside total_prefetches from the host side; mismatches between the two surface as “planner produced more than dispatcher could schedule” — concurrency cap saturated.

Source

pub fn record_prefetch_won_race(&mut self)

Record that a dispatched prefetch landed in the dedup cache before the LLM asked for the same tool, so the LLM’s call collapsed to an L0 hit. Independent of textual citation — the LLM still issued the tool, but our prefetched body served the answer at zero added latency.

Source

pub fn record_prefetch_wasted(&mut self)

Record that a dispatched prefetch was never claimed by the LLM during the rest of the session (offline post-pass tally). High prefetch_wasted / prefetch_dispatched ratio is the signal tune analyze watches for R7’s per-tool auto-disable.

Source

pub fn prefetch_race_win_rate(&self) -> Option<f32>

Fraction of dispatched prefetches that beat the LLM to the dedup cache. None when nothing was dispatched.

Source

pub fn prefetch_waste_rate(&self) -> Option<f32>

Fraction of dispatched prefetches that were never claimed by the LLM. None when nothing was dispatched. Higher means the planner’s speculation was wasted — drive R7’s auto-disable.

Source

pub fn report(&self) -> String

Compact one-line summary suitable for tune analyze output.

Trait Implementations§

Source§

impl Clone for EnrichmentEffectiveness

Source§

fn clone(&self) -> EnrichmentEffectiveness

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for EnrichmentEffectiveness

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for EnrichmentEffectiveness

Source§

fn default() -> EnrichmentEffectiveness

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for EnrichmentEffectiveness

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for EnrichmentEffectiveness

Source§

fn eq(&self, other: &EnrichmentEffectiveness) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for EnrichmentEffectiveness

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl StructuralPartialEq for EnrichmentEffectiveness

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,