Expand description
LLM provider abstraction and backend implementations for the Zeph agent.
§Overview
zeph-llm is the inference layer of the Zeph agent stack. It defines the
LlmProvider trait and supplies concrete backends for every supported
inference provider. All providers are composable via AnyProvider and the
router module, so callers never need to depend on a specific backend.
§Core Abstraction
LlmProvider is the central trait. Every backend implements:
LlmProvider::chat— single-turn, blocking responseLlmProvider::chat_stream— streaming response as aChatStreamLlmProvider::embed— embedding generationLlmProvider::chat_with_tools— structured tool-call protocolLlmProvider::chat_typed— schema-driven structured JSON extraction
§Backends
| Module | Backend | Feature flag |
|---|---|---|
ollama | Ollama local models | always |
claude | Anthropic Claude API | always |
openai | OpenAI API | always |
gemini | Google Gemini API | always |
compatible | OpenAI-compatible endpoints | always |
candle_provider | HuggingFace Candle local inference | candle |
§Provider Routing
The router module provides router::RouterProvider, which wraps a list
of backends and selects among them using one of four strategies:
- EMA — exponential moving average latency-aware ordering (default)
- Thompson — Bayesian Beta-distribution sampling
- Cascade — cheapest-first with automatic escalation on degenerate output
- Bandit — contextual
LinUCBwith online learning (PILOT)
§Structured Extraction
Extractor wraps any provider and exposes a typed extract::<T>() method
that injects a JSON schema into the prompt and parses the response. Use it for
entity extraction, classification, and any structured LLM output.
§Error Handling
All fallible operations return LlmError. Callers can inspect the error type
to distinguish retriable failures (rate limiting, transient HTTP errors) from
permanent failures (invalid input, context length exceeded).
§Examples
use zeph_llm::provider::{LlmProvider, Message, Role};
use zeph_llm::ollama::OllamaProvider;
let provider = OllamaProvider::new("http://localhost:11434", "llama3.2".into(), "nomic-embed-text".into());
let messages = vec![Message::from_legacy(Role::User, "Hello!")];
let response = provider.chat(&messages).await?;
println!("{response}");Re-exports§
pub use classifier::metrics::ClassifierMetrics;pub use classifier::metrics::ClassifierMetricsSnapshot;pub use classifier::metrics::TaskMetricsSnapshot;pub use error::LlmError;pub use extractor::Extractor;pub use provider::ChatExtras;pub use provider::ChatStream;pub use provider::LlmProvider;pub use provider::StreamChunk;pub use provider::ThinkingBlock;pub use provider_dyn::LlmProviderDyn;pub use router::aware::RouterAware;pub use router::coe::CoeConfig;pub use router::coe::CoeMetrics;pub use router::coe::CoeRouter;pub use stt::SpeechToText;pub use stt::Transcription;
Modules§
- any
- Type-erased provider enum wrapping all concrete backends.
- classifier
- ML-backed classifier infrastructure (feature
classifiers). - claude
- Claude (Anthropic) LLM provider implementation.
- compatible
OpenAI-compatible provider adapter.- ema
- Per-provider EMA tracker for latency-aware
super::router::RouterProviderordering. - error
- Error type for all LLM provider operations.
- extractor
- Typed structured extraction from free-form text.
- gemini
- http
- Shared HTTP client construction for consistent timeout and TLS configuration.
- model_
cache - Disk-backed cache for remote model listings with 24-hour TTL.
- ollama
- Ollama local model backend.
- openai
OpenAIAPI backend.- provider
- provider_
dyn - Object-safe adapter for
LlmProvider. - router
- Multi-provider router with pluggable routing strategies.
- stt
- Speech-to-text (STT) abstraction and result type.
- whisper
Enums§
- Cache
Ttl - Prompt-cache TTL variant for the Anthropic API.
- Gemini
Thinking Level - Thinking level for Gemini models that support extended reasoning.
- Thinking
Config - Extended or adaptive thinking mode for Claude.
- Thinking
Effort - Effort level for adaptive thinking.