Input tokens served from the provider’s prompt cache, billed
at the (lower) cached rate. Omitted on responses with no
cache hit. Backend wire shape: cached_tokens (i32);
promoted to Option here for headroom on future
long-cache scenarios (multi-hour video transcripts, etc.).
Multi-turn billing audits reconcile by computing
(non-cached) input_tokens vs (cached) cached_tokens — both
roll into the provider’s billable total.
Sub-component of output_tokens that was model reasoning /
thinking output (Gemini 3.x’s thoughtTokens, OpenAI o-
series’ reasoning tokens, Anthropic’s extended-thinking
blocks). Omitted on responses from non-reasoning models.
Total billable output = output_tokens (which already
includes the reasoning component — this field is just
transparency on how much of that was thinking).