Expand description
§llmkit
A unified, async, multi-provider LLM client for Rust. One trait, one streaming API, one Tower middleware stack — across OpenAI, Anthropic, and local Ollama models, with no provider lock-in.
use llmkit::prelude::*;
use std::time::Duration;
let client = LlmClientBuilder::new()
.provider(AnthropicProvider::from_env()?.model("claude-opus-4-8"))
.fallback(OpenAiProvider::from_env()?.model("gpt-4o-mini"))
.layer(TracingLayer::new())
.layer(RetryLayer::exponential(3, Duration::from_millis(200)))
.build()?;
let resp = client
.chat(ChatRequest::builder().user("Hello!").build())
.await?;
println!("{}", resp.text().unwrap_or_default());Provider adapters are feature-gated (openai, anthropic, ollama; all on
by default). Disable defaults and opt in to slim the dependency tree.
Modules§
- prelude
- Common imports for application code.
- pricing
- Per-model pricing lookup. Unknown models (e.g. local Ollama) return
None.
Structs§
- Anthropic
Provider - Anthropic (Claude) provider over the
/v1/messagesAPI. - Chat
Builder - A chat request with optional registered tools, executed when awaited.
- Chat
Request - A single-shot or streaming chat completion request.
- Chat
Request Builder - Ergonomic builder for
ChatRequest. - Chat
Response - A completed chat response.
- Cost
Estimate - Computed cost breakdown in USD.
- Cost
Tracking - Provider produced by
CostTrackingLayer. - Cost
Tracking Layer - Tracks per-request and cumulative cost; optionally enforces a budget.
- Embed
Request - An embeddings request.
- Embed
Response - An embeddings response, one vector per input.
- Fallback
Provider - Tries providers in order, advancing to the next on a retryable failure.
- LlmClient
- A composed, ready-to-use LLM client.
- LlmClient
Builder - Builds an
LlmClientfrom a primary provider, optional fallbacks, and a stack of Tower layers. - Message
- A single conversational message.
- Model
Aliases - Resolves model aliases to concrete model slugs.
- Model
Pricing - USD pricing per 1M tokens.
- Ollama
Provider - Ollama provider for local models (Llama, Mistral, …).
- Open
AiProvider - OpenAI provider (GPT-4o, o1, embeddings).
- Rate
Limit - Provider produced by
RateLimitLayer. - Rate
Limit Layer - Token-bucket rate limiter:
capacitytokens refilling overwindow. - Retry
- Provider produced by
RetryLayer. - Retry
Layer - Configures exponential-backoff retries for retryable errors.
- Session
Cost - Shared, cloneable handle to the running session cost (USD), in micro-dollars.
- Token
Usage - Normalised token counts for one request.
- Tool
- A tool the model may call, defined by a name, description, and JSON Schema.
- Tool
Call - A tool invocation requested by the model.
- Tool
Result - The result of executing a
ToolCall, returned to the model. - Tracing
- Provider produced by
TracingLayer. - Tracing
Layer - Emits a
tracingspan around each call with latency and usage.
Enums§
- Content
Part - A typed part of multi-part message content.
- Finish
Reason - Why the model stopped generating.
- LlmError
- The single error type returned by every llmkit operation.
- Message
Content - The content of a
Message. - Role
- Role of a
Message. - Stream
Delta - One incremental event in a streaming chat response.
- Tool
Choice - How the model should choose among available tools.
Traits§
- LlmLayer
- Wraps an inner
LlmProviderin a new one, adding cross-cutting behaviour. - LlmProvider
- A unified, async LLM backend.
- Tool
Schema - Implemented by tool input structs to expose name, description, and schema.
Type Aliases§
- Chat
Stream - A boxed async stream of
StreamDeltaitems, the unified streaming output across every provider. - LlmResult
Resultalias used throughout llmkit.
Derive Macros§
- Tool
Schema #[derive(ToolSchema)]for typed tool inputs. Derivellmkit_core::ToolSchemafor a struct.