Expand description
HTTP streaming client for LLM APIs.
Sends conversation messages to an LLM API and streams back response events via Server-Sent Events (SSE). Features:
- Prompt caching with cache_control markers
- Beta header negotiation (thinking, structured outputs, effort)
- Retry with exponential backoff and fallback model
- Tool choice constraints
- Thinking/reasoning token configuration
Structs§
- Completion
Request - A request to the LLM API.
- LlmClient
- Client for communicating with an LLM API.
Enums§
- Effort
Level - Agent effort level (influences thoroughness and token usage).
- Thinking
Mode - Configuration for thinking/reasoning behavior.
- Tool
Choice - Controls how the model selects tools.