pub struct CostModel {
pub typical_kb: f32,
pub max_kb: Option<f32>,
pub latency_ms_p50: Option<u32>,
pub dollars: Option<f32>,
pub freshness_ttl_s: Option<u32>,
}Expand description
What the call costs in tokens, latency, dollars, and how long the result stays valid in cache.
The numbers are the planner’s prior; the actual telemetry from
PipelineEvent updates them via tune analyze (Paper 2 idiom).
Fields§
§typical_kb: f32Median response size in kilobytes — informs the knapsack
cost term. Anchored on the corpus mining in
docs/research/paper3_corpus_findings.md.
max_kb: Option<f32>p99 response size — the planner uses this for the worst-case budget reservation when it cannot afford to overshoot.
latency_ms_p50: Option<u32>Median end-to-end latency. None = unknown, treated as 0
in the planner’s latency-aware mode.
dollars: Option<f32>Per-call dollar cost for paid APIs (Anthropic, OpenAI, …).
None = free.
freshness_ttl_s: Option<u32>How long a cached response stays valid before the planner must
refetch. Used by L0 dedup: a polling endpoint with
freshness_ttl_s = 15 returns the cached body for 15 s and
then collapses to a near-ref hint.