pub struct ChatModel<C: Codec + 'static, T: Transport + 'static> { /* private fields */ }Expand description
Configurable chat model — codec + transport + layer stack.
Cheap to clone (handles are Arc-backed). Builder-style
configuration via with_* methods; layer attachment via
Self::layer.
Implementations§
Source§impl<C: Codec + 'static, T: Transport + 'static> ChatModel<C, T>
impl<C: Codec + 'static, T: Transport + 'static> ChatModel<C, T>
Sourcepub fn new(codec: C, transport: T, model: impl Into<String>) -> Self
pub fn new(codec: C, transport: T, model: impl Into<String>) -> Self
Build a model bundling owned codec + transport + model name.
Sourcepub fn from_arc(
codec: Arc<C>,
transport: Arc<T>,
model: impl Into<String>,
) -> Self
pub fn from_arc( codec: Arc<C>, transport: Arc<T>, model: impl Into<String>, ) -> Self
Build from already-shared Arcs — useful when many models
share one transport / codec instance.
Sourcepub const fn with_max_tokens(self, n: u32) -> Self
pub const fn with_max_tokens(self, n: u32) -> Self
Set per-call max_tokens.
Sourcepub fn with_system(self, s: impl Into<String>) -> Self
pub fn with_system(self, s: impl Into<String>) -> Self
Attach a system / instruction prompt. Convenience
shorthand for a single-block uncached prompt — for
multi-block or cached prompts, set config.system to a
pre-built SystemPrompt directly.
Sourcepub fn with_system_prompt(self, prompt: SystemPrompt) -> Self
pub fn with_system_prompt(self, prompt: SystemPrompt) -> Self
Attach a pre-built SystemPrompt — supports multi-block
and per-block cache control.
Sourcepub const fn with_temperature(self, t: f32) -> Self
pub const fn with_temperature(self, t: f32) -> Self
Set sampling temperature.
Sourcepub const fn with_validation_retries(self, n: u32) -> Self
pub const fn with_validation_retries(self, n: u32) -> Self
Set the complete_typed validation
retry budget — number of times the loop reflects a
schema-mismatch or crate::OutputValidator failure back to
the model with corrective hint text before surfacing the
terminal Error::ModelRetry. Default 0 (no retry). Each
retry increments the conversation length by two messages
(assistant’s failed reply + retry prompt). Both retry shapes
share one budget and route through Error::ModelRetry
(invariant 20).
Sourcepub fn with_token_counter(self, counter: Arc<dyn TokenCounter>) -> Self
pub fn with_token_counter(self, counter: Arc<dyn TokenCounter>) -> Self
Wire an operator-supplied TokenCounter
for pre-flight budget checks and content-economy estimation.
Each call replaces the configured counter — the last call wins.
Vendor-accurate counters (tiktoken, HuggingFace, ko-mecab)
ship as companion crates so the core stays
zero-dependency. crate::ByteCountTokenCounter is the
English-biased zero-dependency default for development
scaffolding.
Sourcepub const fn with_top_p(self, p: f32) -> Self
pub const fn with_top_p(self, p: f32) -> Self
Set nucleus sampling parameter.
Sourcepub const fn with_top_k(self, k: u32) -> Self
pub const fn with_top_k(self, k: u32) -> Self
Set top-k sampling parameter. Native on Anthropic, Gemini,
and Bedrock-Anthropic; OpenAI codecs surface as
LossyEncode.
Sourcepub fn with_stop_sequence(self, s: impl Into<String>) -> Self
pub fn with_stop_sequence(self, s: impl Into<String>) -> Self
Append one stop sequence.
Sourcepub fn with_stop_sequences(self, seqs: Vec<String>) -> Self
pub fn with_stop_sequences(self, seqs: Vec<String>) -> Self
Replace the full stop sequences list.
Sourcepub fn with_tools(self, tools: impl Into<Arc<[ToolSpec]>>) -> Self
pub fn with_tools(self, tools: impl Into<Arc<[ToolSpec]>>) -> Self
Replace the advertised tools list. Accepts anything that
converts into Arc<[ToolSpec]> — Vec<ToolSpec> and
[ToolSpec; N] literals both qualify, so caller ergonomics
match the previous Vec shape while per-dispatch
build_request clones become an atomic refcount bump rather
than a deep walk of every tool’s JSON schema.
Sourcepub fn with_tool_choice(self, c: ToolChoice) -> Self
pub fn with_tool_choice(self, c: ToolChoice) -> Self
Set the tool-choice mode.
Sourcepub fn with_reasoning_effort(self, effort: ReasoningEffort) -> Self
pub fn with_reasoning_effort(self, effort: ReasoningEffort) -> Self
Set the cross-vendor reasoning-effort knob. Codecs translate
onto their native wire shape per the table on
crate::ir::ReasoningEffort — Off / Minimal / Low
/ Medium / High / Auto snap to vendor buckets, lossy
approximations emit ModelWarning::LossyEncode, and
VendorSpecific(s) passes the literal vendor wire value
through.
Sourcepub fn layer<L>(self, layer: L) -> Selfwhere
L: Layer<BoxedModelService> + Layer<BoxedStreamingService> + NamedLayer + Clone + Send + Sync + 'static,
<L as Layer<BoxedModelService>>::Service: Service<ModelInvocation, Response = ModelResponse, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedModelService>>::Service as Service<ModelInvocation>>::Future: Send + 'static,
<L as Layer<BoxedStreamingService>>::Service: Service<StreamingModelInvocation, Response = ModelStream, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedStreamingService>>::Service as Service<StreamingModelInvocation>>::Future: Send + 'static,
pub fn layer<L>(self, layer: L) -> Selfwhere
L: Layer<BoxedModelService> + Layer<BoxedStreamingService> + NamedLayer + Clone + Send + Sync + 'static,
<L as Layer<BoxedModelService>>::Service: Service<ModelInvocation, Response = ModelResponse, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedModelService>>::Service as Service<ModelInvocation>>::Future: Send + 'static,
<L as Layer<BoxedStreamingService>>::Service: Service<StreamingModelInvocation, Response = ModelStream, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedStreamingService>>::Service as Service<StreamingModelInvocation>>::Future: Send + 'static,
Append a tower::Layer to both dispatch spines — the
one-shot path (Service<ModelInvocation, Response = ModelResponse>) and the streaming path
(Service<StreamingModelInvocation, Response = ModelStream>). Each .layer(L) wraps L around the
already-composed stack, so the last-registered layer is
outermost (sees the request first, the response last) and
the first-registered layer sits innermost against the leaf
InnerChatModel.
PolicyLayer and OtelLayer from the policy / otel
crates satisfy both spines, so a single .layer(...) call
wraps the agent’s complete dispatch fan-out:
- one-shot:
Service<ModelInvocation, Response = ModelResponse> - streaming:
Service<StreamingModelInvocation, Response = ModelStream> - tool dispatch (separate, on
ToolRegistry::layer):Service<ToolInvocation, Response = Value>
The streaming-side layer wraps the ModelStream’s
completion future so observability events (cost,
latency, span close) fire only on the Ok branch — a
stream that errors mid-flight surfaces the error and
emits no charge (invariant 12).
Layers that legitimately apply only to the one-shot spine
(e.g. RetryLayer — retrying a partially-streamed
response is meaningless) implement a pass-through
Layer<BoxedStreamingService> so they satisfy the
constraint without affecting streaming dispatch.
§Introspection
Every layer is recorded in Self::layer_names at compose
time via NamedLayer::layer_name. First-party entelix
layers (PolicyLayer, OtelLayer) implement
NamedLayer directly; external tower::Layer middleware
wraps through crate::service::WithName to participate:
use entelix_core::WithName;
chat_model
.layer(PolicyLayer::new(registry))
.layer(OtelLayer::new("anthropic"))
.layer(WithName::new("concurrency", tower::limit::ConcurrencyLimitLayer::new(10)));
// chat_model.layer_names() == ["policy", "otel", "concurrency"]Sourcepub fn layer_named<L>(self, name: &'static str, layer: L) -> Selfwhere
L: Layer<BoxedModelService> + Layer<BoxedStreamingService> + Clone + Send + Sync + 'static,
<L as Layer<BoxedModelService>>::Service: Service<ModelInvocation, Response = ModelResponse, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedModelService>>::Service as Service<ModelInvocation>>::Future: Send + 'static,
<L as Layer<BoxedStreamingService>>::Service: Service<StreamingModelInvocation, Response = ModelStream, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedStreamingService>>::Service as Service<StreamingModelInvocation>>::Future: Send + 'static,
pub fn layer_named<L>(self, name: &'static str, layer: L) -> Selfwhere
L: Layer<BoxedModelService> + Layer<BoxedStreamingService> + Clone + Send + Sync + 'static,
<L as Layer<BoxedModelService>>::Service: Service<ModelInvocation, Response = ModelResponse, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedModelService>>::Service as Service<ModelInvocation>>::Future: Send + 'static,
<L as Layer<BoxedStreamingService>>::Service: Service<StreamingModelInvocation, Response = ModelStream, Error = Error> + Clone + Send + 'static,
<<L as Layer<BoxedStreamingService>>::Service as Service<StreamingModelInvocation>>::Future: Send + 'static,
Compose a tower::Layer whose only difference from a
first-party entelix layer is the missing NamedLayer
impl. Equivalent to self.layer(WithName::new(name, layer))
— the wrapper supplies the identity surfaced through
Self::layer_names.
Use this for external middleware (tower::limit,
tower::timeout, operator-defined wrappers) without
implementing NamedLayer yourself. First-party entelix
layers already implement NamedLayer and should reach for
Self::layer directly so the canonical role name
(e.g. "policy", "otel", "retry") ships at the call
site.
Sourcepub fn layer_names(&self) -> &[&'static str]
pub fn layer_names(&self) -> &[&'static str]
Diagnostic snapshot of the composed layer stack, in
registration order: [0] is the first .layer(...) call
(innermost, against the leaf InnerChatModel); the last
element is the most recent registration (outermost, sees
requests first). Empty for a ChatModel with no layers
composed.
Surfaced for boot-time wiring assertions, debug dashboards,
and conditional-layer audits. The values are
NamedLayer::layer_name outputs — &'static str and
patch-version-stable per the trait’s contract.
Sourcepub fn codec(&self) -> &C
pub fn codec(&self) -> &C
Borrow the configured codec — exposes its name() and
capabilities() for diagnostics.
Sourcepub const fn config(&self) -> &ChatModelConfig
pub const fn config(&self) -> &ChatModelConfig
Borrow the configured request shape — model(),
max_tokens(), system(), temperature(), top_p(),
stop_sequences(), tools(), tool_choice() accessors all
live on ChatModelConfig.
Sourcepub fn service(&self) -> BoxedModelService
pub fn service(&self) -> BoxedModelService
Build the layered BoxedModelService — used by callers
who want to drive the service directly (e.g. wrap with
further tower::ServiceBuilder middleware externally).
Sourcepub fn streaming_service(&self) -> BoxedStreamingService
pub fn streaming_service(&self) -> BoxedStreamingService
Build the layered BoxedStreamingService — the streaming
counterpart to Self::service. Layers attached via
Self::layer wrap this spine the same way they wrap the
one-shot service; consumers driving the service directly
(rather than through Self::stream_deltas) drive
Service<StreamingModelInvocation, Response = ModelStream>.
Sourcepub async fn complete(
&self,
messages: Vec<Message>,
ctx: &ExecutionContext,
) -> Result<Message>
pub async fn complete( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<Message>
Send a conversation and return the assistant reply as a single
Message. The full pipeline routes through the layer
stack: each layer’s pre-call work runs before encode, each
layer’s post-call work runs after decode.
Sourcepub async fn complete_full(
&self,
messages: Vec<Message>,
ctx: &ExecutionContext,
) -> Result<ModelResponse>
pub async fn complete_full( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<ModelResponse>
Same pipeline as Self::complete, but returns the full
ModelResponse — usage, stop reason, codec warnings, and
the provider rate-limit snapshot when the codec could parse
one from the response headers.
Sourcepub async fn complete_typed<O>(
&self,
messages: Vec<Message>,
ctx: &ExecutionContext,
) -> Result<O>
pub async fn complete_typed<O>( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<O>
Send a conversation and return a typed O parsed from the
model’s structured-output channel.
The codec emits a response_format directive carrying the
schemars-derived schema for O; the dispatch shape (native
JSON-Schema vs forced tool call) is the codec’s
Codec::auto_output_strategy for the configured model
when crate::ir::OutputStrategy::Auto is selected — see
crate::ir::OutputStrategy and for the cross-vendor mapping.
Operators that need to override the strategy build their
own ResponseFormat via
ResponseFormat::with_strategy and attach it to the
request through a custom flow.
On Native dispatch, the codec produces a single text
ContentPart whose body parses as O. On Tool dispatch,
the codec emits one forced ContentPart::ToolUse whose
input is the JSON object the model produced; this method
extracts the input and parses it as O.
O: JsonSchema + DeserializeOwned + Send + 'static —
schemars derives the JSON Schema at call time (zero-cost
after the first call thanks to schemars’ static schema
caching). Production operators that cache the schema across
many calls build the JsonSchemaSpec once and attach it
to the request directly (no O type parameter on the
caller side).
Sourcepub async fn complete_typed_validated<O, V>(
&self,
messages: Vec<Message>,
validator: V,
ctx: &ExecutionContext,
) -> Result<O>
pub async fn complete_typed_validated<O, V>( &self, messages: Vec<Message>, validator: V, ctx: &ExecutionContext, ) -> Result<O>
Send a conversation, parse the structured-output response as
O, and run validator against the parsed value.
Both failure modes — schema-mismatch (the model emitted JSON
the deserialiser couldn’t bind to O) and validator failure
(the deserialised value broke a semantic invariant) — route
through one channel: crate::Error::ModelRetry. The retry
loop catches the variant, reflects the hint to the model as
a corrective user message, and re-invokes within the same
ChatModelConfig::validation_retries
budget (invariant 20).
Self::complete_typed is the no-validator shortcut — it
calls into this method with an always-Ok validator so the
schema-mismatch retry path stays uniform.
The validator surface is sync (crate::OutputValidator::validate)
so simple closures (|out: &O| -> Result<()>) compose
without async-trait ceremony. Validators that need to
.await (DB lookup, external check) compose around the
complete_typed_validated call boundary instead — run the
async work after the typed response returns.
Sourcepub async fn stream_deltas(
&self,
messages: Vec<Message>,
ctx: &ExecutionContext,
) -> Result<ModelStream>
pub async fn stream_deltas( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<ModelStream>
Open a streaming model call and return an IR StreamDelta
stream.
Pipeline: codec.encode_streaming → transport.send_streaming
→ codec.decode_stream → tap_aggregator, all driven
through the same tower::Service spine as
Self::complete_full. Layers attached via
Self::layer (e.g. OtelLayer, PolicyLayer) wrap
the streaming dispatch the same way they wrap the one-shot
dispatch; observability events (cost, span close) fire on
the streaming-side completion future’s Ok branch only —
invariant 12.
The returned ModelStream carries both the delta stream
(consumer-visible) and a completion future that resolves
to the aggregated ModelResponse after the consumer
drains the stream. Layers wrap completion to gate
observability emission on the Ok branch — a stream that
errors mid-flight surfaces the error on both the consumer
side and the completion future, and no charge fires.
Codecs without true streaming support fall back to a single-shot pseudo-stream the same way they did before the spine refactor.
Sourcepub async fn stream_typed<O>(
&self,
messages: Vec<Message>,
ctx: &ExecutionContext,
) -> Result<TypedModelStream<O>>
pub async fn stream_typed<O>( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<TypedModelStream<O>>
Streaming sibling of Self::complete_typed. Returns a
TypedModelStream<O> whose stream field exposes raw
StreamDeltas (text fragments operators echo to the user
during generation) and whose completion future resolves to
the aggregated, parsed O.
response_format = ResponseFormat::strict(JsonSchemaSpec::for::<O>())
is set on the request so the model emits the typed JSON
payload natively (Native strategy via text deltas) or
through a tool_use block (Tool strategy). The aggregator
behind Self::stream_deltas already collects both shapes
into the final ModelResponse; stream_typed parses the
completion the same way Self::complete_typed does.
O: JsonSchema + DeserializeOwned + Send + 'static —
schemars derives the JSON Schema at call time.
§Streaming + retry tradeoff
stream_typed does not retry on parse failure: by the
time completion resolves the deltas have already been
surfaced to the consumer, so re-invoking with a corrective
hint would emit a divergent second stream. Operators wanting
the ChatModelConfig::validation_retries loop call
Self::complete_typed / Self::complete_typed_validated
instead — a parse failure there is fully recoverable because
no partial output was surfaced.
A validator that needs to inspect the parsed O runs after
completion resolves: let value = stream.completion.await?; validator.validate(&value)?;. The validator’s Err does not
flow back into the stream — it surfaces alongside the typed
completion at the call site.
Source§impl ChatModel<AnthropicMessagesCodec, DirectTransport>
impl ChatModel<AnthropicMessagesCodec, DirectTransport>
Sourcepub fn anthropic(
api_key: impl Into<SecretString>,
model: impl Into<String>,
) -> Result<Self>
pub fn anthropic( api_key: impl Into<SecretString>, model: impl Into<String>, ) -> Result<Self>
One-call construction against https://api.anthropic.com.
Bundles AnthropicMessagesCodec + DirectTransport +
ApiKeyProvider. Mirrors LangChain’s ChatAnthropic(api_key=…)
surface so the 5-line agent path is achievable without
hand-wiring four components.
Returns Error::Config if the underlying HTTP client
cannot be initialised.
Source§impl ChatModel<OpenAiChatCodec, DirectTransport>
impl ChatModel<OpenAiChatCodec, DirectTransport>
Sourcepub fn openai(
api_key: impl Into<SecretString>,
model: impl Into<String>,
) -> Result<Self>
pub fn openai( api_key: impl Into<SecretString>, model: impl Into<String>, ) -> Result<Self>
One-call construction against https://api.openai.com.
Bundles OpenAiChatCodec + DirectTransport +
BearerProvider.
Source§impl ChatModel<GeminiCodec, DirectTransport>
impl ChatModel<GeminiCodec, DirectTransport>
Sourcepub fn gemini(
api_key: impl Into<SecretString>,
model: impl Into<String>,
) -> Result<Self>
pub fn gemini( api_key: impl Into<SecretString>, model: impl Into<String>, ) -> Result<Self>
One-call construction against
https://generativelanguage.googleapis.com. Bundles
GeminiCodec + DirectTransport + BearerProvider.