Skip to main content

TokenUsage

Struct TokenUsage 

Source
pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
    pub model: String,
    pub provider: String,
    pub cached_input_tokens: u64,
    pub cache_write_input_tokens: u64,
    pub cache_write_5m_input_tokens: u64,
    pub cache_write_1h_input_tokens: u64,
    pub stop_reason: Option<String>,
    pub raw_stop_reason: Option<String>,
    pub reasoning_tokens: u64,
}
Expand description

Token usage from a single LLM call.

§Normalized superset convention

input_tokens is the total number of input tokens processed — it is a superset of cached_input_tokens and cache_write_input_tokens. Downstream cost logic derives the “fresh” (non-cached) portion by subtracting the two cache counts. This matches OpenAI and Gemini’s native reporting; Anthropic’s API reports the three groups as disjoint so the Anthropic parser normalizes by summing before assigning.

§Prompt-caching semantics per provider

  • OpenAIcached_input_tokens counts cache READS (billed at a discount, typically 0.1x base input). Cache writes are free per OpenAI’s caching docs. cache_write_input_tokens is always 0.
  • Anthropic — the API returns three token groups: input_tokens (fresh, 1x), cache_read_input_tokens (0.1x), and cache_creation_input_tokens (1.25x at the default 5-minute TTL, 2.0x at 1-hour TTL). The parser remaps these to the superset convention above. The per-TTL split is surfaced as cache_write_5m_input_tokens and cache_write_1h_input_tokens (parsed from usage.cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens); these two fields sum to cache_write_input_tokens. Akribes workflows opt into the 1h TTL via the extended-cache-ttl-2025-04-11 beta header, so this split matters for cost accounting (#1091).
  • Gemini — only cache reads are reported; writes are not separately billed. cache_write_input_tokens is always 0.

Fields§

§input_tokens: u64

Total input tokens processed (superset of the two cache counts).

§output_tokens: u64§model: String§provider: String§cached_input_tokens: u64

Cache-READ tokens (billed at CACHE_READ_RATE, ~0.1x input).

§cache_write_input_tokens: u64

Cache-WRITE / creation tokens (Anthropic only today; billed at CACHE_WRITE_RATE, 1.25x input at 5m TTL or 2.0x at 1h TTL). This is the total across both TTL buckets; the breakdown lives on Self::cache_write_5m_input_tokens and Self::cache_write_1h_input_tokens (#1091). Serialized default for backward-compatibility with events predating this field.

§cache_write_5m_input_tokens: u64

Anthropic cache-WRITE tokens at the default 5-minute TTL, parsed from usage.cache_creation.ephemeral_5m_input_tokens. Subset of Self::cache_write_input_tokens — sums with Self::cache_write_1h_input_tokens to the total. 0 on providers that don’t report the per-TTL breakdown (OpenAI, Gemini, mock) and for pre-#1091 events that omit the field.

§cache_write_1h_input_tokens: u64

Anthropic cache-WRITE tokens at the 1-hour TTL, parsed from usage.cache_creation.ephemeral_1h_input_tokens. Subset of Self::cache_write_input_tokens — sums with Self::cache_write_5m_input_tokens to the total. 0 on providers without per-TTL reporting (OpenAI, Gemini, mock) and for pre-#1091 events that omit the field. The 1h-TTL bucket bills at 2.0x base input vs. 1.25x for 5m — pricing::compute_cost uses this split for accurate cost attribution (#1091).

§stop_reason: Option<String>

The provider-reported stop reason for the underlying call, when known. Anthropic surfaces values like "end_turn", "max_tokens", "tool_use", "stop_sequence". OpenAI: "stop", "length", "tool_calls". Gemini: "STOP", "MAX_TOKENS", etc.

Carried alongside usage so the engine’s validation-failure path can distinguish “model truncated mid-output” (max_tokens / length / MAX_TOKENS) from “model finished cleanly but produced an invalid shape” — see issue #320 / #321. None for providers that don’t surface a stop reason or for paths that haven’t been threaded (e.g. the mock provider). Serialized with #[serde(default)] so old wire payloads that omit the field still deserialize.

Today this field carries the RAW provider value when the parse_*_usage path produced the TokenUsage (the common case for non-streamed calls). The usage_from_outcome rebuild path (streaming + some retry paths) writes the OTel-canonical form ("stop" / "max_tokens" / "tool_use" / "content_filter" / "other") because LlmCallOutcome only carries the canonical form. Consumers that need a deterministic-by-provider raw value should prefer Self::raw_stop_reason (#1077).

§raw_stop_reason: Option<String>

Raw provider stop reason, never lossy-mapped to OTel canonical form. Set to the same value as Self::stop_reason when the parse_*_usage path produced the usage; None otherwise (mock, streaming rebuilds via usage_from_outcome).

Bench / observability code that needs to distinguish Gemini’s "STOP" from "RECITATION" (both collapse to "stop" under the canonical mapping) or Anthropic’s "stop_sequence" from "end_turn" should read this field. #1077.

§reasoning_tokens: u64

Reasoning / thinking tokens — a SUBSET of Self::output_tokens, not in addition. Captured from:

  • OpenAI o-series + GPT-5: usage.completion_tokens_details.reasoning_tokens
  • Anthropic extended-thinking: usage.thinking_tokens (when present)
  • Gemini with thinkingBudget set: usageMetadata.thoughtsTokenCount

0 when the model didn’t engage reasoning or the provider didn’t surface the breakdown. #[serde(default)] keeps wire-compat with pre-#322 events that omit the field entirely.

Trait Implementations§

Source§

impl Clone for TokenUsage

Source§

fn clone(&self) -> TokenUsage

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TokenUsage

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl Default for TokenUsage

Source§

fn default() -> TokenUsage

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for TokenUsage

Source§

fn deserialize<__D>( __deserializer: __D, ) -> Result<TokenUsage, <__D as Deserializer<'de>>::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for TokenUsage

Source§

fn serialize<__S>( &self, __serializer: __S, ) -> Result<<__S as Serializer>::Ok, <__S as Serializer>::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more