1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
//! Request transformation and per-model wire-format helpers for the
//! Anthropic Messages API. Strips OpenAI-only fields off the
//! [`CreateMessageRequest`] before serialisation, picks the right
//! `anthropic-beta` token for the target model, and exposes the
//! public reasoning-effort helpers that downstream code uses to
//! configure thinking budgets.
use crate*;
/// Compact `anthropic-beta` set sent on every Anthropic request.
/// `prompt-caching-2024-07-31` is implicit for the Sonnet 4.x / Opus
/// 4.x line, so we only have to ship token-efficient tools and the
/// compaction beta where it applies.
pub const BETA_HEADER_NAME: &str = "anthropic-beta";
/// Token-efficient tools beta — opt-in to the smaller tool-result
/// schema. Cuts ~30% off tool-result tokens on Sonnet 3.7 / 4 /
/// 4.5 by streaming results in the compact wire shape. Older models
/// (Sonnet 3.5 and Opus 4.x) just ignore the header.
pub const BETA_TOKEN_EFFICIENT: &str = "token-efficient-tools-2025-02-19";
/// Server-side compaction beta — Anthropic prunes earlier turns when
/// the request input grows past the model's per-request threshold,
/// returning a `compaction` content block in the next assistant
/// response that we round-trip on subsequent calls.
///
/// Only ships when the target model supports it (Opus 4.x, Sonnet
/// 4.6 and newer) — older models 400 on the unknown beta token.
/// Referenced by name only inside the cross-check test
/// `beta_with_compact_matches_components`; the production header is
/// served as the literal in `BETA_TOKEN_EFFICIENT_AND_COMPACT`.
pub const BETA_COMPACT: &str = "compact-2026-01-12";
/// Compound beta token list used when the target model supports both
/// token-efficient tools and server-side compaction. Anthropic
/// accepts a comma-separated list in the single header.
pub const BETA_TOKEN_EFFICIENT_AND_COMPACT: &str =
"token-efficient-tools-2025-02-19,compact-2026-01-12";
/// Pick the `anthropic-beta` value for `model`. Compaction is gated
/// off the same `ModelInfo::supports_server_compaction` flag the
/// request builder uses to attach the `context_management` field, so
/// the beta header and the body field can never disagree about which
/// models speak server-side compaction.
pub
/// Legacy `thinking.budget_tokens` values used for Sonnet 3.7 and
/// older models that don't accept the adaptive `output_config`
/// reasoning request shape. The four-tier mapping mirrors the
/// reasoning-effort enum: `Off` → no budget at all (no thinking
/// block), `Low` / `Medium` / `High` → these three constants.
pub const LEGACY_THINKING_BUDGET_LOW: u32 = 1024;
pub const LEGACY_THINKING_BUDGET_MEDIUM: u32 = 5120;
pub const LEGACY_THINKING_BUDGET_HIGH: u32 = 16384;
/// Default trigger floor for server-side compaction. Below this the
/// model probably hasn't earned compaction yet, and triggering early
/// would waste a compaction round-trip on a still-small history.
pub const COMPACTION_TRIGGER_FLOOR: u32 = 50_000;
/// Map a reasoning effort tier to its legacy `thinking.budget_tokens`
/// value. Returns 0 for `Off` so callers can branch on it to omit
/// the thinking block entirely. Used by request_builder when the
/// target model doesn't speak adaptive thinking.
/// True for models that speak the adaptive `output_config.thinking`
/// reasoning request shape instead of the legacy
/// `thinking.budget_tokens` field. Currently only Opus 4.7 — every
/// other model wants the legacy budget. The adaptive shape lets the
/// server pick the budget; sending both fields would 400, sending
/// neither omits thinking entirely, so request_builder uses this
/// predicate to pick exactly one path.
///
/// Important: adaptive requests carry a `effort_label` field on
/// `output_config.thinking` that the server cross-checks against the
/// request's signed thinking blocks. Echoing back a stored set of
/// thinking blocks that the server cross-checks against the request,
/// and dropping `output_config` would 400 the next turn.
/// True for models that speak the adaptive `output_config.thinking`
/// reasoning request shape. Thin wrapper over the model-info table
/// so a new model's adaptive-thinking opt-in is one entry in
/// [`crate::api::model_info::lookup`] rather than a code change here.
/// Strip OpenAI-only fields and tools off `request` before it goes
/// out on the wire to Anthropic, and run [`sanitize_messages_for_anthropic`]
/// over the message history so OpenAI Reasoning / Summary blocks
/// don't 400 the request. Used by both the streaming and
/// non-streaming call paths.
pub
/// Drop OpenAI-only content blocks (`Summary`, `Reasoning`) from
/// every message before sending to Anthropic. A session that
/// switched providers mid-stream still carries the OpenAI blocks in
/// memory; without this strip, the next Anthropic call 400s on the
/// unknown content-block types.
pub