1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
// SPDX-FileCopyrightText: 2026 Andrei G <bug-ops>
// SPDX-License-Identifier: MIT OR Apache-2.0
//! LLM provider abstraction and backend implementations for the Zeph agent.
//!
//! # Overview
//!
//! `zeph-llm` is the inference layer of the Zeph agent stack. It defines the
//! [`LlmProvider`] trait and supplies concrete backends for every supported
//! inference provider. All providers are composable via `AnyProvider` and the
//! [`router`] module, so callers never need to depend on a specific backend.
//!
//! # Core Abstraction
//!
//! [`LlmProvider`] is the central trait. Every backend implements:
//! - [`LlmProvider::chat`] — single-turn, blocking response
//! - [`LlmProvider::chat_stream`] — streaming response as a [`ChatStream`]
//! - [`LlmProvider::embed`] — embedding generation
//! - [`LlmProvider::chat_with_tools`] — structured tool-call protocol
//! - [`LlmProvider::chat_typed`] — schema-driven structured JSON extraction
//!
//! # Backends
//!
//! | Module | Backend | Feature flag |
//! |---|---|---|
//! | [`ollama`] | `Ollama` local models | always |
//! | [`claude`] | `Anthropic` Claude API | always |
//! | [`openai`] | `OpenAI` API | always |
//! | [`gemini`] | `Google` Gemini API | always |
//! | [`compatible`] | `OpenAI`-compatible endpoints | always |
//! | `candle_provider` | `HuggingFace` Candle local inference | `candle` |
//!
//! # Provider Routing
//!
//! The [`router`] module provides [`router::RouterProvider`], which wraps a list
//! of backends and selects among them using one of four strategies:
//!
//! - **EMA** — exponential moving average latency-aware ordering (default)
//! - **Thompson** — Bayesian Beta-distribution sampling
//! - **Cascade** — cheapest-first with automatic escalation on degenerate output
//! - **Bandit** — contextual `LinUCB` with online learning (PILOT)
//!
//! # Structured Extraction
//!
//! [`Extractor`] wraps any provider and exposes a typed `extract::<T>()` method
//! that injects a JSON schema into the prompt and parses the response. Use it for
//! entity extraction, classification, and any structured LLM output.
//!
//! # Error Handling
//!
//! All fallible operations return [`LlmError`]. Callers can inspect the error type
//! to distinguish retriable failures (rate limiting, transient HTTP errors) from
//! permanent failures (invalid input, context length exceeded).
//!
//! # Examples
//!
//! ```rust,no_run
//! use zeph_llm::provider::{LlmProvider, Message, Role};
//! use zeph_llm::ollama::OllamaProvider;
//!
//! # async fn example() -> Result<(), zeph_llm::LlmError> {
//! let provider = OllamaProvider::new("http://localhost:11434", "llama3.2".into(), "nomic-embed-text".into());
//! let messages = vec![Message::from_legacy(Role::User, "Hello!")];
//! let response = provider.chat(&messages).await?;
//! println!("{response}");
//! # Ok(())
//! # }
//! ```
pub
pub
pub
pub
pub
pub
pub use ;
pub use ;
pub use LlmError;
pub use Extractor;
pub use ThinkingLevel as GeminiThinkingLevel;
pub use ;
pub use ;