Crate zeph_llm

Expand description

LLM provider abstraction and backend implementations for the Zeph agent.

§Overview

zeph-llm is the inference layer of the Zeph agent stack. It defines the LlmProvider trait and supplies concrete backends for every supported inference provider. All providers are composable via AnyProvider and the router module, so callers never need to depend on a specific backend.

§Core Abstraction

LlmProvider is the central trait. Every backend implements:

LlmProvider::chat — single-turn, blocking response
LlmProvider::chat_stream — streaming response as a ChatStream
LlmProvider::embed — embedding generation
LlmProvider::chat_with_tools — structured tool-call protocol
LlmProvider::chat_typed — schema-driven structured JSON extraction

§Backends

Module	Backend	Feature flag
`ollama`	`Ollama` local models	always
`claude`	`Anthropic` Claude API	always
`openai`	`OpenAI` API	always
`gemini`	`Google` Gemini API	always
`compatible`	`OpenAI`-compatible endpoints	always
`candle_provider`	`HuggingFace` Candle local inference	`candle`

§Provider Routing

The router module provides router::RouterProvider, which wraps a list of backends and selects among them using one of four strategies:

EMA — exponential moving average latency-aware ordering (default)
Thompson — Bayesian Beta-distribution sampling
Cascade — cheapest-first with automatic escalation on degenerate output
Bandit — contextual LinUCB with online learning (PILOT)

§Structured Extraction

Extractor wraps any provider and exposes a typed extract::<T>() method that injects a JSON schema into the prompt and parses the response. Use it for entity extraction, classification, and any structured LLM output.

§Error Handling

All fallible operations return LlmError. Callers can inspect the error type to distinguish retriable failures (rate limiting, transient HTTP errors) from permanent failures (invalid input, context length exceeded).

§Examples

use zeph_llm::provider::{LlmProvider, Message, Role};
use zeph_llm::ollama::OllamaProvider;

let provider = OllamaProvider::new("http://localhost:11434", "llama3.2".into(), "nomic-embed-text".into());
let messages = vec![Message::from_legacy(Role::User, "Hello!")];
let response = provider.chat(&messages).await?;
println!("{response}");

Re-exports§

pub use classifier::metrics::ClassifierMetrics;
pub use classifier::metrics::ClassifierMetricsSnapshot;
pub use classifier::metrics::TaskMetricsSnapshot;
pub use error::LlmError;
pub use extractor::Extractor;
pub use provider::ChatExtras;
pub use provider::ChatStream;
pub use provider::LlmProvider;
pub use provider::StreamChunk;
pub use provider::ThinkingBlock;
pub use provider_dyn::LlmProviderDyn;
pub use router::aware::RouterAware;
pub use router::coe::CoeConfig;
pub use router::coe::CoeMetrics;
pub use router::coe::CoeRouter;
pub use stt::SpeechToText;
pub use stt::Transcription;

Modules§

any: Type-erased provider enum wrapping all concrete backends.
classifier: ML-backed classifier infrastructure (feature classifiers).
claude: Claude (Anthropic) LLM provider implementation.
compatible: OpenAI-compatible provider adapter.
ema: Per-provider EMA tracker for latency-aware super::router::RouterProvider ordering.
error: Error type for all LLM provider operations.
extractor: Typed structured extraction from free-form text.
gemini
http: Shared HTTP client construction for consistent timeout and TLS configuration.
model_cache: Disk-backed cache for remote model listings with 24-hour TTL.
ollama: Ollama local model backend.
openai: OpenAI API backend.
provider
provider_dyn: Object-safe adapter for LlmProvider.
router: Multi-provider router with pluggable routing strategies.
stt: Speech-to-text (STT) abstraction and result type.
whisper

Enums§

CacheTtl: Prompt-cache TTL variant for the Anthropic API.
GeminiThinkingLevel: Thinking level for Gemini models that support extended reasoning.
ThinkingConfig: Extended or adaptive thinking mode for Claude.
ThinkingEffort: Effort level for adaptive thinking.

Crate zeph_llm

Crate zeph_llm Copy item path

§Overview

§Core Abstraction

§Backends

§Provider Routing

§Structured Extraction

§Error Handling

§Examples

Re-exports§

Modules§

Enums§

Crate zeph_llm