Expand description
§LLMKit - Unified LLM API for Rust
LLMKit is a unified interface for interacting with multiple LLM providers, written in pure Rust with Python and TypeScript bindings.
§Features
- Unified API: Single interface for Anthropic, OpenAI, and many other providers
- Streaming: First-class support for streaming responses
- Tool Calling: Consistent tool/function calling across providers
- Type-Safe: Full Rust type safety with compile-time provider selection
- Feature Flags: Only compile the providers you need
§Quick Start
ⓘ
use llmkit::{LLMKitClient, Message, CompletionRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create client with providers from environment
let client = LLMKitClient::builder()
.with_anthropic_from_env()
.with_openai_from_env()
.build()?;
// Make a request - provider auto-detected from model name
let request = CompletionRequest::new(
"claude-sonnet-4-20250514",
vec![Message::user("Hello, how are you?")]
);
let response = client.complete(request).await?;
println!("{}", response.text_content());
Ok(())
}§Supported Providers
| Provider | Feature Flag | Models |
|---|---|---|
| Anthropic | anthropic | Claude 3, Claude 3.5, Claude 4 |
| OpenAI | openai | GPT-4o, GPT-4, o1 |
| OpenRouter | openrouter | 100+ models |
| Ollama | ollama | Local models |
| Azure OpenAI | azure | Azure-hosted GPT models |
| AWS Bedrock | bedrock | Claude, Titan, etc. |
| Google Vertex AI | vertex | Gemini |
| Mistral | mistral | Mistral, Mixtral |
| Groq | groq | Fast inference |
§Streaming Example
ⓘ
use futures::StreamExt;
let request = CompletionRequest::new("gpt-4o", vec![Message::user("Write a story")])
.with_streaming();
let mut stream = client.complete_stream(request).await?;
while let Some(chunk) = stream.next().await {
if let Ok(chunk) = chunk {
if let Some(ContentDelta::Text { text }) = chunk.delta {
print!("{}", text);
}
}
}§Tool Calling Example
ⓘ
use llmkit::{ToolDefinition, ToolBuilder};
let tool = ToolBuilder::new("get_weather")
.description("Get the current weather for a location")
.string_param("location", "The city name", true)
.build();
let request = CompletionRequest::new("claude-sonnet-4-20250514", vec![Message::user("What's the weather in Paris?")])
.with_tools(vec![tool]);
let response = client.complete(request).await?;
if response.has_tool_use() {
for tool_use in response.tool_uses() {
// Handle tool call
}
}Re-exports§
pub use audio::get_audio_model_info;pub use audio::AudioFormat;pub use audio::AudioInput;pub use audio::AudioModelInfo;pub use audio::AudioModelType;pub use audio::SpeechProvider;pub use audio::SpeechRequest;pub use audio::SpeechResponse;pub use audio::TimestampGranularity;pub use audio::TranscriptFormat;pub use audio::TranscriptSegment;pub use audio::TranscriptWord;pub use audio::TranscriptionProvider;pub use audio::TranscriptionRequest;pub use audio::TranscriptionResponse;pub use audio::VoiceInfo;pub use audio::AUDIO_MODELS;pub use cache::CacheBackend;pub use cache::CacheConfig;pub use cache::CacheKeyBuilder;pub use cache::CacheStats;pub use cache::CachedResponse;pub use cache::CachingProvider;pub use cache::InMemoryCache;pub use circuit_breaker::CircuitBreaker;pub use circuit_breaker::CircuitBreakerConfig;pub use circuit_breaker::CircuitState;pub use circuit_breaker::HealthMetrics;pub use client::ClientBuilder;pub use client::LLMKitClient;pub use embedding::get_embedding_model_info;pub use embedding::get_embedding_models_by_provider;pub use embedding::Embedding;pub use embedding::EmbeddingInput;pub use embedding::EmbeddingInputType;pub use embedding::EmbeddingModelInfo;pub use embedding::EmbeddingProvider;pub use embedding::EmbeddingRequest;pub use embedding::EmbeddingResponse;pub use embedding::EmbeddingUsage;pub use embedding::EncodingFormat;pub use embedding::EMBEDDING_MODELS;pub use error::Error;pub use error::Result;pub use failover::FailoverConfig;pub use failover::FailoverProvider;pub use failover::FailoverTrigger;pub use failover::FallbackProvider;pub use guardrails::Finding;pub use guardrails::FindingType;pub use guardrails::GuardedProvider;pub use guardrails::Guardrails;pub use guardrails::GuardrailsBuilder;pub use guardrails::GuardrailsConfig;pub use guardrails::GuardrailsResult;pub use guardrails::PiiPattern;pub use guardrails::PiiType;pub use guardrails::SecretPattern;pub use guardrails::SecretType;pub use guardrails::Severity;pub use health::DeploymentStatus;pub use health::HealthCheckResult;pub use health::HealthCheckType;pub use health::HealthChecker;pub use health::HealthCheckerHandle;pub use health::PoolHealthStatus;pub use image::get_image_model_info;pub use image::AsyncImageProvider;pub use image::GeneratedImage;pub use image::ImageEditRequest;pub use image::ImageFormat;pub use image::ImageGenerationRequest;pub use image::ImageGenerationResponse;pub use image::ImageInput;pub use image::ImageModelInfo;pub use image::ImageProvider;pub use image::ImageQuality;pub use image::ImageSize;pub use image::ImageStyle;pub use image::ImageVariationRequest;pub use image::JobId;pub use image::JobStatus;pub use image::IMAGE_MODELS;pub use metering::CostTracker;pub use metering::InMemoryMeteringSink;pub use metering::MeteringProvider;pub use metering::MeteringSink;pub use metering::ModelStats;pub use metering::TenantStats;pub use metering::UsageFilter;pub use metering::UsageRecord;pub use metering::UsageStats;pub use models::get_all_models;pub use models::get_available_models;pub use models::get_cheapest_model;pub use models::get_classifier_models;pub use models::get_current_models;pub use models::get_model_info;pub use models::get_models_by_provider;pub use models::get_models_with_capability;pub use models::get_registry_stats;pub use models::list_providers;pub use models::supports_structured_output;pub use models::ModelBenchmarks;pub use models::ModelCapabilities;pub use models::ModelInfo;pub use models::ModelPricing;pub use models::ModelStatus;pub use models::Provider as ProviderKind;pub use models::RegistryStats;pub use observability::MetricsRecorder;pub use observability::MetricsSnapshot;pub use observability::Observability;pub use observability::ObservabilityConfig;pub use observability::RequestSpan;pub use observability::TracingContext;pub use pool::DeploymentConfig;pub use pool::DeploymentHealth;pub use pool::HealthCheckConfig;pub use pool::ProviderPool;pub use pool::ProviderPoolBuilder;pub use pool::RoutingStrategy;pub use provider::ModelInfo as ProviderModelInfo;pub use provider::Provider;pub use provider::ProviderConfig;pub use rate_limiter::RateLimiter;pub use rate_limiter::TokenBucketConfig;pub use retry::ProviderExt;pub use retry::RetryConfig;pub use retry::RetryingProvider;pub use smart_router::Optimization;pub use smart_router::ProviderMetrics;pub use smart_router::RouterProviderConfig;pub use smart_router::RouterStats;pub use smart_router::RoutingDecision;pub use smart_router::SmartRouter;pub use smart_router::SmartRouterBuilder;pub use specialized::get_moderation_model_info;pub use specialized::get_ranking_model_info;pub use specialized::ClassificationExample;pub use specialized::ClassificationPrediction;pub use specialized::ClassificationProvider;pub use specialized::ClassificationRequest;pub use specialized::ClassificationResponse;pub use specialized::ModerationCategories;pub use specialized::ModerationInput;pub use specialized::ModerationModelInfo;pub use specialized::ModerationProvider;pub use specialized::ModerationRequest;pub use specialized::ModerationResponse;pub use specialized::ModerationScores;pub use specialized::RankedDocument;pub use specialized::RankingMeta;pub use specialized::RankingModelInfo;pub use specialized::RankingProvider;pub use specialized::RankingRequest;pub use specialized::RankingResponse;pub use specialized::MODERATION_MODELS;pub use specialized::RANKING_MODELS;pub use stream::collect_stream;pub use stream::CollectingStream;pub use streaming_multiplexer::MultiplexedStream;pub use streaming_multiplexer::MultiplexerStats;pub use streaming_multiplexer::StreamingMultiplexer;pub use templates::patterns as template_patterns;pub use templates::PromptTemplate;pub use templates::TemplateRegistry;pub use templates::TemplatedRequestBuilder;pub use tenant::CostLimitConfig;pub use tenant::CostLimitExceeded;pub use tenant::CostLimitType;pub use tenant::RateLimitConfig;pub use tenant::RateLimitExceeded;pub use tenant::RateLimitType;pub use tenant::TenantConfig;pub use tenant::TenantError;pub use tenant::TenantId;pub use tenant::TenantManager;pub use tenant::TenantProvider;pub use tenant::TenantUsageStats;pub use tools::ToolBuilder;pub use tools::ToolChoice;pub use tools::ToolDefinition;pub use types::BatchError;pub use types::BatchJob;pub use types::BatchRequest;pub use types::BatchRequestCounts;pub use types::BatchResult;pub use types::BatchStatus;pub use types::CompletionRequest;pub use types::CompletionResponse;pub use types::ContentBlock;pub use types::ContentDelta;pub use types::JsonSchemaDefinition;pub use types::Message;pub use types::Role;pub use types::StopReason;pub use types::StreamChunk;pub use types::StreamEventType;pub use types::StructuredOutput;pub use types::StructuredOutputType;pub use types::ThinkingConfig;pub use types::ThinkingEffort;pub use types::ThinkingType;pub use types::TokenCountRequest;pub use types::TokenCountResult;pub use types::Usage;pub use video::get_video_model_info;pub use video::get_video_models_by_provider;pub use video::CameraMotion;pub use video::VideoGenerationRequest;pub use video::VideoGenerationResponse;pub use video::VideoInput;pub use video::VideoJobStatus;pub use video::VideoModelInfo;pub use video::VideoProvider;pub use video::VideoResolution;pub use video::VIDEO_MODELS;pub use providers::AnthropicProvider;pub use providers::OpenAIProvider;pub use providers::audio::GrokRealtimeProvider;pub use providers::chat::ChatLawProvider;pub use providers::chat::LatamGPTProvider;pub use providers::chat::LightOnProvider;
Modules§
- audio
- Audio APIs for text-to-speech (TTS) and speech-to-text (STT).
- cache
- Response caching infrastructure for LLM API calls.
- circuit_
breaker - Adaptive circuit breaker with real-time anomaly detection.
- client
- LLMKit client for unified LLM access.
- embedding
- Embedding API for generating text embeddings across multiple providers.
- error
- Error types for the LLMKit library.
- failover
- Failover configuration and logic for provider routing.
- guardrails
- Content filtering and guardrails for LLM requests/responses.
- health
- Health checking infrastructure for provider pools.
- image
- Image generation API for creating images from text prompts.
- metering
- Cost tracking and usage metering for LLM requests.
- models
- Model Registry - Database of LLM model specifications.
- observability
- Built-in observability with OpenTelemetry integration.
- pool
- Provider pool for load balancing and failover.
- provider
- Provider trait and related types.
- providers
- LLM Provider implementations organized by modality.
- rate_
limiter - Lock-free rate limiter for high-performance request throttling.
- retry
- Retry logic for LLM provider operations.
- smart_
router - Adaptive smart router with ML-based provider selection.
- specialized
- Specialized AI APIs for ranking, moderation, and classification.
- stream
- Streaming utilities for handling LLM response streams.
- streaming_
multiplexer - Zero-copy streaming multiplexer for request deduplication.
- templates
- Prompt template infrastructure for variable substitution.
- tenant
- Tenant context and multi-tenancy support for LLM requests.
- tools
- Tool/function definition types for LLM providers.
- types
- Core types for the LLMKit unified LLM API.
- video
- Video generation API for creating videos from text prompts and images.