Expand description
Batch-eligibility advisor (Batch/Flex phase 1 — ADVISORY only).
The OpenAI / Anthropic / Gemini Batch APIs price asynchronous (≤24h) traffic at ~50% of standard — the single biggest no-quality-loss cost lever. Building the durable batch-submission queue is deferred (P3); phase 1 is purely advisory: detect request-log traffic that is batch-eligible (tagged background / offline / nightly / bulk, i.e. latency-insensitive) and PROJECT the savings of moving it to the Batch API.
This module is the pure, tool-groundable core: given request-log aggregates
(which the advisor already reasons over) and the embedded pricing catalog, it
produces a BatchFinding per eligible tag segment with the eligible spend
and the projected Batch-API cost/savings. The savings are computed from the
real per-model batch rates in the catalog (pricing.toml carries
batch_{input,output}_per_million), NOT a hardcoded 50% — a model with no
catalog batch tier contributes no projected savings (conservative).
Nothing here submits anything to a batch API; it only surfaces the projection.
Structs§
- Batch
Finding - A projected-savings finding for one batch-eligible tag segment.
- Request
Aggregate - One tag-grouped request-log aggregate — the condensed view the advisor /
inspect path computes over
request_logs(e.g.SELECT provider, model, tag, SUM(input_tokens), SUM(output_tokens), SUM(cost_usd), COUNT(*) ... GROUP BY provider, model, tag). One row per(provider, model, tag)segment.
Constants§
- DEFAULT_
BATCH_ ELIGIBLE_ TAGS - Default set of tags treated as non-interactive (batch-eligible) traffic.
Functions§
- project_
batch_ savings - Project Batch-API savings over request-log aggregates using the default
non-interactive tag set (
DEFAULT_BATCH_ELIGIBLE_TAGS). - project_
batch_ savings_ with_ tags - Project Batch-API savings over request-log aggregates, treating any segment
whose tag (case-insensitive) is in
eligible_tagsas batch-eligible.