Module batch_advisor

Expand description

Batch-eligibility advisor (Batch/Flex phase 1 — ADVISORY only).

The OpenAI / Anthropic / Gemini Batch APIs price asynchronous (≤24h) traffic at ~50% of standard — the single biggest no-quality-loss cost lever. Building the durable batch-submission queue is deferred (P3); phase 1 is purely advisory: detect request-log traffic that is batch-eligible (tagged background / offline / nightly / bulk, i.e. latency-insensitive) and PROJECT the savings of moving it to the Batch API.

This module is the pure, tool-groundable core: given request-log aggregates (which the advisor already reasons over) and the embedded pricing catalog, it produces a BatchFinding per eligible tag segment with the eligible spend and the projected Batch-API cost/savings. The savings are computed from the real per-model batch rates in the catalog (pricing.toml carries batch_{input,output}_per_million), NOT a hardcoded 50% — a model with no catalog batch tier contributes no projected savings (conservative).

Nothing here submits anything to a batch API; it only surfaces the projection.

Structs§

BatchFinding: A projected-savings finding for one batch-eligible tag segment.
RequestAggregate: One tag-grouped request-log aggregate — the condensed view the advisor / inspect path computes over request_logs (e.g. SELECT provider, model, tag, SUM(input_tokens), SUM(output_tokens), SUM(cost_usd), COUNT(*) ... GROUP BY provider, model, tag). One row per (provider, model, tag) segment.

Constants§

DEFAULT_BATCH_ELIGIBLE_TAGS: Default set of tags treated as non-interactive (batch-eligible) traffic.

Functions§

project_batch_savings: Project Batch-API savings over request-log aggregates using the default non-interactive tag set (DEFAULT_BATCH_ELIGIBLE_TAGS).
project_batch_savings_with_tags: Project Batch-API savings over request-log aggregates, treating any segment whose tag (case-insensitive) is in eligible_tags as batch-eligible.