Struct ChatModel

Source

pub struct ChatModel<C: Codec + 'static, T: Transport + 'static> { /* private fields */ }

Expand description

Configurable chat model — codec + transport + layer stack.

Cheap to clone (handles are Arc-backed). Builder-style configuration via with_* methods; layer attachment via Self::layer.

Implementations§

Source §

impl<C: Codec + 'static, T: Transport + 'static> ChatModel<C, T>

Source

pub fn new(codec: C, transport: T, model: impl Into<String>) -> Self

Build a model bundling owned codec + transport + model name.

Source

pub fn from_arc( codec: Arc<C>, transport: Arc<T>, model: impl Into<String>, ) -> Self

Build from already-shared Arcs — useful when many models share one transport / codec instance.

Source

pub const fn with_max_tokens(self, n: u32) -> Self

Set per-call max_tokens.

Source

pub fn with_system(self, s: impl Into<String>) -> Self

Attach a system / instruction prompt. Convenience shorthand for a single-block uncached prompt — for multi-block or cached prompts, set config.system to a pre-built SystemPrompt directly.

Source

pub fn with_system_prompt(self, prompt: SystemPrompt) -> Self

Attach a pre-built SystemPrompt — supports multi-block and per-block cache control.

Source

pub const fn with_temperature(self, t: f32) -> Self

Set sampling temperature.

Source

pub const fn with_validation_retries(self, n: u32) -> Self

Set the complete_typed validation retry budget — number of times the loop reflects a schema-mismatch or crate::OutputValidator failure back to the model with corrective hint text before surfacing the terminal Error::ModelRetry. Default 0 (no retry). Each retry increments the conversation length by two messages (assistant’s failed reply + retry prompt). Both retry shapes share one budget and route through Error::ModelRetry (invariant 20).

Source

pub fn with_token_counter(self, counter: Arc<dyn TokenCounter>) -> Self

Wire an operator-supplied TokenCounter for pre-flight budget checks and content-economy estimation. Each call replaces the configured counter — the last call wins.

Vendor-accurate counters (tiktoken, HuggingFace, ko-mecab) ship as companion crates so the core stays zero-dependency. crate::ByteCountTokenCounter is the English-biased zero-dependency default for development scaffolding.

Source

pub const fn with_top_p(self, p: f32) -> Self

Set nucleus sampling parameter.

Source

pub const fn with_top_k(self, k: u32) -> Self

Set top-k sampling parameter. Native on Anthropic, Gemini, and Bedrock-Anthropic; OpenAI codecs surface as LossyEncode.

Source

pub fn with_stop_sequence(self, s: impl Into<String>) -> Self

Append one stop sequence.

Source

pub fn with_stop_sequences(self, seqs: Vec<String>) -> Self

Replace the full stop sequences list.

Source

pub fn with_tools(self, tools: impl Into<Arc<[ToolSpec]>>) -> Self

Replace the advertised tools list. Accepts anything that converts into Arc<[ToolSpec]> — Vec<ToolSpec> and [ToolSpec; N] literals both qualify, so caller ergonomics match the previous Vec shape while per-dispatch build_request clones become an atomic refcount bump rather than a deep walk of every tool’s JSON schema.

Source

pub fn with_tool_choice(self, c: ToolChoice) -> Self

Set the tool-choice mode.

Source

pub fn with_reasoning_effort(self, effort: ReasoningEffort) -> Self

Set the cross-vendor reasoning-effort knob. Codecs translate onto their native wire shape per the table on crate::ir::ReasoningEffort — Off / Minimal / Low / Medium / High / Auto snap to vendor buckets, lossy approximations emit ModelWarning::LossyEncode, and VendorSpecific(s) passes the literal vendor wire value through.

Source

pub fn layer<L>(self, layer: L) -> Self
where L: Layer<BoxedModelService> + Layer<BoxedStreamingService> + NamedLayer + Clone + Send + Sync + 'static, <L as Layer<BoxedModelService>>::Service: Service<ModelInvocation, Response = ModelResponse, Error = Error> + Clone + Send + 'static, <<L as Layer<BoxedModelService>>::Service as Service<ModelInvocation>>::Future: Send + 'static, <L as Layer<BoxedStreamingService>>::Service: Service<StreamingModelInvocation, Response = ModelStream, Error = Error> + Clone + Send + 'static, <<L as Layer<BoxedStreamingService>>::Service as Service<StreamingModelInvocation>>::Future: Send + 'static,

Append a tower::Layer to both dispatch spines — the one-shot path (Service<ModelInvocation, Response = ModelResponse>) and the streaming path (Service<StreamingModelInvocation, Response = ModelStream>). Each .layer(L) wraps L around the already-composed stack, so the last-registered layer is outermost (sees the request first, the response last) and the first-registered layer sits innermost against the leaf InnerChatModel.

PolicyLayer and OtelLayer from the policy / otel crates satisfy both spines, so a single .layer(...) call wraps the agent’s complete dispatch fan-out:

one-shot: Service<ModelInvocation, Response = ModelResponse>
streaming: Service<StreamingModelInvocation, Response = ModelStream>
tool dispatch (separate, on ToolRegistry::layer): Service<ToolInvocation, Response = Value>

The streaming-side layer wraps the ModelStream’s completion future so observability events (cost, latency, span close) fire only on the Ok branch — a stream that errors mid-flight surfaces the error and emits no charge (invariant 12).

Layers that legitimately apply only to the one-shot spine (e.g. RetryLayer — retrying a partially-streamed response is meaningless) implement a pass-through Layer<BoxedStreamingService> so they satisfy the constraint without affecting streaming dispatch.

§Introspection

Every layer is recorded in Self::layer_names at compose time via NamedLayer::layer_name. First-party entelix layers (PolicyLayer, OtelLayer) implement NamedLayer directly; external tower::Layer middleware wraps through crate::service::WithName to participate:

use entelix_core::WithName;

chat_model
    .layer(PolicyLayer::new(registry))
    .layer(OtelLayer::new("anthropic"))
    .layer(WithName::new("concurrency", tower::limit::ConcurrencyLimitLayer::new(10)));
// chat_model.layer_names() == ["policy", "otel", "concurrency"]

Source

pub fn layer_named<L>(self, name: &'static str, layer: L) -> Self
where L: Layer<BoxedModelService> + Layer<BoxedStreamingService> + Clone + Send + Sync + 'static, <L as Layer<BoxedModelService>>::Service: Service<ModelInvocation, Response = ModelResponse, Error = Error> + Clone + Send + 'static, <<L as Layer<BoxedModelService>>::Service as Service<ModelInvocation>>::Future: Send + 'static, <L as Layer<BoxedStreamingService>>::Service: Service<StreamingModelInvocation, Response = ModelStream, Error = Error> + Clone + Send + 'static, <<L as Layer<BoxedStreamingService>>::Service as Service<StreamingModelInvocation>>::Future: Send + 'static,

Compose a tower::Layer whose only difference from a first-party entelix layer is the missing NamedLayer impl. Equivalent to self.layer(WithName::new(name, layer)) — the wrapper supplies the identity surfaced through Self::layer_names.

Use this for external middleware (tower::limit, tower::timeout, operator-defined wrappers) without implementing NamedLayer yourself. First-party entelix layers already implement NamedLayer and should reach for Self::layer directly so the canonical role name (e.g. "policy", "otel", "retry") ships at the call site.

Source

pub fn layer_names(&self) -> &[&'static str]

Diagnostic snapshot of the composed layer stack, in registration order: [0] is the first .layer(...) call (innermost, against the leaf InnerChatModel); the last element is the most recent registration (outermost, sees requests first). Empty for a ChatModel with no layers composed.

Surfaced for boot-time wiring assertions, debug dashboards, and conditional-layer audits. The values are NamedLayer::layer_name outputs — &'static str and patch-version-stable per the trait’s contract.

Source

pub fn codec(&self) -> &C

Borrow the configured codec — exposes its name() and capabilities() for diagnostics.

Source

pub fn transport(&self) -> &T

Borrow the configured transport.

Source

pub const fn config(&self) -> &ChatModelConfig

Borrow the configured request shape — model(), max_tokens(), system(), temperature(), top_p(), stop_sequences(), tools(), tool_choice() accessors all live on ChatModelConfig.

Source

pub fn service(&self) -> BoxedModelService

Build the layered BoxedModelService — used by callers who want to drive the service directly (e.g. wrap with further tower::ServiceBuilder middleware externally).

Source

pub fn streaming_service(&self) -> BoxedStreamingService

Build the layered BoxedStreamingService — the streaming counterpart to Self::service. Layers attached via Self::layer wrap this spine the same way they wrap the one-shot service; consumers driving the service directly (rather than through Self::stream_deltas) drive Service<StreamingModelInvocation, Response = ModelStream>.

Source

pub async fn complete( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<Message>

Send a conversation and return the assistant reply as a single Message. The full pipeline routes through the layer stack: each layer’s pre-call work runs before encode, each layer’s post-call work runs after decode.

Source

pub async fn complete_full( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<ModelResponse>

Same pipeline as Self::complete, but returns the full ModelResponse — usage, stop reason, codec warnings, and the provider rate-limit snapshot when the codec could parse one from the response headers.

Source

pub async fn complete_typed<O>( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<O>
where O: JsonSchema + DeserializeOwned + Send + 'static,

Send a conversation and return a typed O parsed from the model’s structured-output channel.

The codec emits a response_format directive carrying the schemars-derived schema for O; the dispatch shape (native JSON-Schema vs forced tool call) is the codec’s Codec::auto_output_strategy for the configured model when crate::ir::OutputStrategy::Auto is selected — see crate::ir::OutputStrategy and for the cross-vendor mapping. Operators that need to override the strategy build their own ResponseFormat via ResponseFormat::with_strategy and attach it to the request through a custom flow.

On Native dispatch, the codec produces a single text ContentPart whose body parses as O. On Tool dispatch, the codec emits one forced ContentPart::ToolUse whose input is the JSON object the model produced; this method extracts the input and parses it as O.

O: JsonSchema + DeserializeOwned + Send + 'static — schemars derives the JSON Schema at call time (zero-cost after the first call thanks to schemars’ static schema caching). Production operators that cache the schema across many calls build the JsonSchemaSpec once and attach it to the request directly (no O type parameter on the caller side).

Source

pub async fn complete_typed_validated<O, V>( &self, messages: Vec<Message>, validator: V, ctx: &ExecutionContext, ) -> Result<O>
where O: JsonSchema + DeserializeOwned + Send + 'static, V: OutputValidator<O>,

Send a conversation, parse the structured-output response as O, and run validator against the parsed value.

Both failure modes — schema-mismatch (the model emitted JSON the deserialiser couldn’t bind to O) and validator failure (the deserialised value broke a semantic invariant) — route through one channel: crate::Error::ModelRetry. The retry loop catches the variant, reflects the hint to the model as a corrective user message, and re-invokes within the same ChatModelConfig::validation_retries budget (invariant 20).

Self::complete_typed is the no-validator shortcut — it calls into this method with an always-Ok validator so the schema-mismatch retry path stays uniform.

The validator surface is sync (crate::OutputValidator::validate) so simple closures (|out: &O| -> Result<()>) compose without async-trait ceremony. Validators that need to .await (DB lookup, external check) compose around the complete_typed_validated call boundary instead — run the async work after the typed response returns.

Source

pub async fn stream_deltas( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<ModelStream>

Open a streaming model call and return an IR StreamDelta stream.

Pipeline: codec.encode_streaming → transport.send_streaming → codec.decode_stream → tap_aggregator, all driven through the same tower::Service spine as Self::complete_full. Layers attached via Self::layer (e.g. OtelLayer, PolicyLayer) wrap the streaming dispatch the same way they wrap the one-shot dispatch; observability events (cost, span close) fire on the streaming-side completion future’s Ok branch only — invariant 12.

The returned ModelStream carries both the delta stream (consumer-visible) and a completion future that resolves to the aggregated ModelResponse after the consumer drains the stream. Layers wrap completion to gate observability emission on the Ok branch — a stream that errors mid-flight surfaces the error on both the consumer side and the completion future, and no charge fires.

Codecs without true streaming support fall back to a single-shot pseudo-stream the same way they did before the spine refactor.

Source

pub async fn stream_typed<O>( &self, messages: Vec<Message>, ctx: &ExecutionContext, ) -> Result<TypedModelStream<O>>
where O: JsonSchema + DeserializeOwned + Send + 'static,

Streaming sibling of Self::complete_typed. Returns a TypedModelStream<O> whose stream field exposes raw StreamDeltas (text fragments operators echo to the user during generation) and whose completion future resolves to the aggregated, parsed O.

response_format = ResponseFormat::strict(JsonSchemaSpec::for::<O>()) is set on the request so the model emits the typed JSON payload natively (Native strategy via text deltas) or through a tool_use block (Tool strategy). The aggregator behind Self::stream_deltas already collects both shapes into the final ModelResponse; stream_typed parses the completion the same way Self::complete_typed does.

O: JsonSchema + DeserializeOwned + Send + 'static — schemars derives the JSON Schema at call time.

§Streaming + retry tradeoff

stream_typed does not retry on parse failure: by the time completion resolves the deltas have already been surfaced to the consumer, so re-invoking with a corrective hint would emit a divergent second stream. Operators wanting the ChatModelConfig::validation_retries loop call Self::complete_typed / Self::complete_typed_validated instead — a parse failure there is fully recoverable because no partial output was surfaced.

A validator that needs to inspect the parsed O runs after completion resolves: let value = stream.completion.await?; validator.validate(&value)?;. The validator’s Err does not flow back into the stream — it surfaces alongside the typed completion at the call site.

Source §

impl ChatModel<AnthropicMessagesCodec, DirectTransport>

Source

pub fn anthropic( api_key: impl Into<SecretString>, model: impl Into<String>, ) -> Result<Self>

One-call construction against https://api.anthropic.com. Bundles AnthropicMessagesCodec + DirectTransport + ApiKeyProvider. Mirrors LangChain’s ChatAnthropic(api_key=…) surface so the 5-line agent path is achievable without hand-wiring four components.

Returns Error::Config if the underlying HTTP client cannot be initialised.

Source §