Struct TokenUsage

Source

pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
    pub model: String,
    pub provider: String,
    pub cached_input_tokens: u64,
    pub cache_write_input_tokens: u64,
    pub cache_write_5m_input_tokens: u64,
    pub cache_write_1h_input_tokens: u64,
    pub stop_reason: Option<String>,
    pub raw_stop_reason: Option<String>,
    pub reasoning_tokens: u64,
}

Expand description

Token usage from a single LLM call.

§Normalized superset convention

input_tokens is the total number of input tokens processed — it is a superset of cached_input_tokens and cache_write_input_tokens. Downstream cost logic derives the “fresh” (non-cached) portion by subtracting the two cache counts. This matches OpenAI and Gemini’s native reporting; Anthropic’s API reports the three groups as disjoint so the Anthropic parser normalizes by summing before assigning.

§Prompt-caching semantics per provider

OpenAI — cached_input_tokens counts cache READS (billed at a discount, typically 0.1x base input). Cache writes are free per OpenAI’s caching docs. cache_write_input_tokens is always 0.
Anthropic — the API returns three token groups: input_tokens (fresh, 1x), cache_read_input_tokens (0.1x), and cache_creation_input_tokens (1.25x at the default 5-minute TTL, 2.0x at 1-hour TTL). The parser remaps these to the superset convention above. The per-TTL split is surfaced as cache_write_5m_input_tokens and cache_write_1h_input_tokens (parsed from usage.cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens); these two fields sum to cache_write_input_tokens. Akribes workflows opt into the 1h TTL via the extended-cache-ttl-2025-04-11 beta header, so this split matters for cost accounting (#1091).
Gemini — only cache reads are reported; writes are not separately billed. cache_write_input_tokens is always 0.

Fields§

§input_tokens: u64

Total input tokens processed (superset of the two cache counts).

§output_tokens: u64§model: String§provider: String§cached_input_tokens: u64

Cache-READ tokens (billed at CACHE_READ_RATE, ~0.1x input).

§cache_write_input_tokens: u64

Cache-WRITE / creation tokens (Anthropic only today; billed at CACHE_WRITE_RATE, 1.25x input at 5m TTL or 2.0x at 1h TTL). This is the total across both TTL buckets; the breakdown lives on Self::cache_write_5m_input_tokens and Self::cache_write_1h_input_tokens (#1091). Serialized default for backward-compatibility with events predating this field.

§cache_write_5m_input_tokens: u64

Anthropic cache-WRITE tokens at the default 5-minute TTL, parsed from usage.cache_creation.ephemeral_5m_input_tokens. Subset of Self::cache_write_input_tokens — sums with Self::cache_write_1h_input_tokens to the total. 0 on providers that don’t report the per-TTL breakdown (OpenAI, Gemini, mock) and for pre-#1091 events that omit the field.

§cache_write_1h_input_tokens: u64

Anthropic cache-WRITE tokens at the 1-hour TTL, parsed from usage.cache_creation.ephemeral_1h_input_tokens. Subset of Self::cache_write_input_tokens — sums with Self::cache_write_5m_input_tokens to the total. 0 on providers without per-TTL reporting (OpenAI, Gemini, mock) and for pre-#1091 events that omit the field. The 1h-TTL bucket bills at 2.0x base input vs. 1.25x for 5m — pricing::compute_cost uses this split for accurate cost attribution (#1091).

§stop_reason: Option<String>

The provider-reported stop reason for the underlying call, when known. Anthropic surfaces values like "end_turn", "max_tokens", "tool_use", "stop_sequence". OpenAI: "stop", "length", "tool_calls". Gemini: "STOP", "MAX_TOKENS", etc.

Carried alongside usage so the engine’s validation-failure path can distinguish “model truncated mid-output” (max_tokens / length / MAX_TOKENS) from “model finished cleanly but produced an invalid shape” — see issue #320 / #321. None for providers that don’t surface a stop reason or for paths that haven’t been threaded (e.g. the mock provider). Serialized with #[serde(default)] so old wire payloads that omit the field still deserialize.

Today this field carries the RAW provider value when the parse_*_usage path produced the TokenUsage (the common case for non-streamed calls). The usage_from_outcome rebuild path (streaming + some retry paths) writes the OTel-canonical form ("stop" / "max_tokens" / "tool_use" / "content_filter" / "other") because LlmCallOutcome only carries the canonical form. Consumers that need a deterministic-by-provider raw value should prefer Self::raw_stop_reason (#1077).

§raw_stop_reason: Option<String>

Raw provider stop reason, never lossy-mapped to OTel canonical form. Set to the same value as Self::stop_reason when the parse_*_usage path produced the usage; None otherwise (mock, streaming rebuilds via usage_from_outcome).

Bench / observability code that needs to distinguish Gemini’s "STOP" from "RECITATION" (both collapse to "stop" under the canonical mapping) or Anthropic’s "stop_sequence" from "end_turn" should read this field. #1077.

§reasoning_tokens: u64

Reasoning / thinking tokens — a SUBSET of Self::output_tokens, not in addition. Captured from:

OpenAI o-series + GPT-5: usage.completion_tokens_details.reasoning_tokens
Anthropic extended-thinking: usage.thinking_tokens (when present)
Gemini with thinkingBudget set: usageMetadata.thoughtsTokenCount

0 when the model didn’t engage reasoning or the provider didn’t surface the breakdown. #[serde(default)] keeps wire-compat with pre-#322 events that omit the field entirely.

Struct TokenUsage Copy item path

§Normalized superset convention

§Prompt-caching semantics per provider

Fields§

Trait Implementations§

impl Clone for TokenUsage

fn clone(&self) -> TokenUsage

fn clone_from(&mut self, source: &Self)

impl Debug for TokenUsage

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Default for TokenUsage

fn default() -> TokenUsage

impl<'de> Deserialize<'de> for TokenUsage

fn deserialize<__D>( __deserializer: __D, ) -> Result<TokenUsage, <__D as Deserializer<'de>>::Error>where __D: Deserializer<'de>,

impl Serialize for TokenUsage

fn serialize<__S>( &self, __serializer: __S, ) -> Result<<__S as Serializer>::Ok, <__S as Serializer>::Error>where __S: Serializer,

Auto Trait Implementations§

impl Freeze for TokenUsage

impl RefUnwindSafe for TokenUsage

impl Send for TokenUsage

impl Sync for TokenUsage

impl Unpin for TokenUsage

impl UnsafeUnpin for TokenUsage

impl UnwindSafe for TokenUsage

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> DeserializeOwned for Twhere T: for<'de> Deserialize<'de>,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Sized + Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Sized + Policy<B, E>, P: Policy<B, E>,

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

Struct TokenUsage

fn deserialize<D>( deserializer: D, ) -> Result<TokenUsage, <D as Deserializer<'de>>::Error>
where __D: Deserializer<'de>,

fn serialize<S>( &self, serializer: S, ) -> Result<<S as Serializer>::Ok, <S as Serializer>::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,