1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
//! `OpenAI` Chat Completions wire codec.
//!
//! Builds an `OpenAI` Chat upstream request from a
//! [`crate::wire::canonical::CanonicalRequest`], parses the buffered reply into
//! a [`crate::wire::canonical::CanonicalResponse`], and maps SSE bytes to a
//! stream of [`crate::wire::canonical::CanonicalEvent`]s. Also serves
//! OpenAI-compatible providers exposing the same surface. Auth-header and
//! transport concerns stay with the gateway adapter; this module is pure wire
//! translation.
//!
//! Reasoning models (`gpt-5*`, `o1*`, `o3*`, `o4*`) bill internal reasoning
//! from the same completion budget as visible output, so a caller `max_tokens`
//! — which on the inbound Anthropic surface bounds only visible output — can be
//! consumed entirely by reasoning and trigger an upstream output-limit
//! rejection. `output_token_ceiling` therefore uses the full model-card cap as
//! the budget for these families; `is_reasoning_model` identifies them. For
//! every other model it clamps the caller's `max_tokens` *down* to the cap when
//! one is known (never raising it) — keeping the upstream within the model's
//! real output limit and giving operators a per-request TPM lever via the
//! model card's `limits.max_output_tokens`. Both `OpenAI` codecs (Chat
//! Completions and Responses) share these.
pub use build_request_body;
pub use parse_response;
pub use sse_to_canonical_events;
use crateModelLimits;
use crateCanonicalRequest;
pub
pub