Skip to main content

Crate zeph_llm

Crate zeph_llm 

Source
Expand description

LLM provider abstraction and backend implementations for the Zeph agent.

§Overview

zeph-llm is the inference layer of the Zeph agent stack. It defines the LlmProvider trait and supplies concrete backends for every supported inference provider. All providers are composable via AnyProvider and the router module, so callers never need to depend on a specific backend.

§Core Abstraction

LlmProvider is the central trait. Every backend implements:

§Backends

ModuleBackendFeature flag
ollamaOllama local modelsalways
claudeAnthropic Claude APIalways
openaiOpenAI APIalways
geminiGoogle Gemini APIalways
compatibleOpenAI-compatible endpointsalways
candle_providerHuggingFace Candle local inferencecandle

§Provider Routing

The router module provides router::RouterProvider, which wraps a list of backends and selects among them using one of four strategies:

  • EMA — exponential moving average latency-aware ordering (default)
  • Thompson — Bayesian Beta-distribution sampling
  • Cascade — cheapest-first with automatic escalation on degenerate output
  • Bandit — contextual LinUCB with online learning (PILOT)

§Structured Extraction

Extractor wraps any provider and exposes a typed extract::<T>() method that injects a JSON schema into the prompt and parses the response. Use it for entity extraction, classification, and any structured LLM output.

§Error Handling

All fallible operations return LlmError. Callers can inspect the error type to distinguish retriable failures (rate limiting, transient HTTP errors) from permanent failures (invalid input, context length exceeded).

§Examples

use zeph_llm::provider::{LlmProvider, Message, Role};
use zeph_llm::ollama::OllamaProvider;

let provider = OllamaProvider::new("http://localhost:11434", "llama3.2".into(), "nomic-embed-text".into());
let messages = vec![Message::from_legacy(Role::User, "Hello!")];
let response = provider.chat(&messages).await?;
println!("{response}");

Re-exports§

pub use classifier::metrics::ClassifierMetrics;
pub use classifier::metrics::ClassifierMetricsSnapshot;
pub use classifier::metrics::TaskMetricsSnapshot;
pub use error::LlmError;
pub use extractor::Extractor;
pub use provider::ChatExtras;
pub use provider::ChatStream;
pub use provider::LlmProvider;
pub use provider::StreamChunk;
pub use provider::ThinkingBlock;
pub use provider_dyn::LlmProviderDyn;
pub use router::aware::RouterAware;
pub use router::coe::CoeConfig;
pub use router::coe::CoeMetrics;
pub use router::coe::CoeRouter;
pub use stt::SpeechToText;
pub use stt::Transcription;

Modules§

any
Type-erased provider enum wrapping all concrete backends.
classifier
ML-backed classifier infrastructure (feature classifiers).
claude
Claude (Anthropic) LLM provider implementation.
compatible
OpenAI-compatible provider adapter.
ema
Per-provider EMA tracker for latency-aware super::router::RouterProvider ordering.
error
Error type for all LLM provider operations.
extractor
Typed structured extraction from free-form text.
gemini
http
Shared HTTP client construction for consistent timeout and TLS configuration.
model_cache
Disk-backed cache for remote model listings with 24-hour TTL.
ollama
Ollama local model backend.
openai
OpenAI API backend.
provider
provider_dyn
Object-safe adapter for LlmProvider.
router
Multi-provider router with pluggable routing strategies.
stt
Speech-to-text (STT) abstraction and result type.
whisper

Enums§

CacheTtl
Prompt-cache TTL variant for the Anthropic API.
GeminiThinkingLevel
Thinking level for Gemini models that support extended reasoning.
ThinkingConfig
Extended or adaptive thinking mode for Claude.
ThinkingEffort
Effort level for adaptive thinking.