pub struct ModelStream {
pub stream: BoxDeltaStream<'static>,
pub completion: BoxFuture<'static, Result<ModelResponse, Error>>,
}Expand description
Streaming dispatch result returned by
Service<ModelInvocation, Response = ModelStream> — the
caller-visible delta stream paired with a future that resolves
to the aggregated terminal response.
The Self::stream field carries the raw StreamDelta flow
(text chunks, tool-use boundaries, usage, rate-limit, warnings,
terminal Stop). The Self::completion future resolves to
Ok(ModelResponse) after the stream has been fully consumed
AND a StreamAggregator has reconstructed the final response;
it resolves to Err(...) if the stream errored mid-flight, was
dropped before terminal Stop, or violated the aggregator’s
protocol invariants.
Layers (OtelLayer, PolicyLayer) wrap completion to emit
observability / cost events on the Ok branch only —
invariant 12. A stream that errors mid-flight surfaces the
error through the consumer’s stream-side Err and through
completion resolving to Err; either way, no cost charge
fires.
completion is internally driven by the same stream
stream carries — consumers do not need to poll it
separately. The aggregator runs as the consumer drains the
stream; completion resolves naturally when the consumer
reads the terminal Stop (or drops the stream early, in which
case completion resolves Err).
Fields§
§stream: BoxDeltaStream<'static>Raw delta stream surfaced to the caller. The wrapper
produced by entelix_core::stream::tap_aggregator taps
each delta into a StreamAggregator as it flows past, so
the caller sees an unmodified stream while
Self::completion receives the aggregated final response
without a second pass.
completion: BoxFuture<'static, Result<ModelResponse, Error>>Future resolving to the aggregated ModelResponse after
the stream has been consumed to its terminal Stop. Layers
wrap this future to gate observability emission on success
(invariant 12). Consumers that ignore the streaming-side
completion (e.g. wire it into a fire-and-forget OTel layer)
do not need to await it directly — dropping the
ModelStream is the canonical “I’m done” signal that lets
any wrapping layer observe stream-completion regardless of
whether the consumer polled completion itself.