Crate llmkit

Expand description

§LLMKit - Unified LLM API for Rust

LLMKit is a unified interface for interacting with multiple LLM providers, written in pure Rust with Python and TypeScript bindings.

§Features

Unified API: Single interface for Anthropic, OpenAI, and many other providers
Streaming: First-class support for streaming responses
Tool Calling: Consistent tool/function calling across providers
Type-Safe: Full Rust type safety with compile-time provider selection
Feature Flags: Only compile the providers you need

§Quick Start

use llmkit::{LLMKitClient, Message, CompletionRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client with providers from environment
    let client = LLMKitClient::builder()
        .with_anthropic_from_env()
        .with_openai_from_env()
        .build()?;

    // Make a request - provider auto-detected from model name
    let request = CompletionRequest::new(
        "claude-sonnet-4-20250514",
        vec![Message::user("Hello, how are you?")]
    );

    let response = client.complete(request).await?;
    println!("{}", response.text_content());

    Ok(())
}

§Supported Providers

Provider	Feature Flag	Models
Anthropic	`anthropic`	Claude 3, Claude 3.5, Claude 4
OpenAI	`openai`	GPT-4o, GPT-4, o1
OpenRouter	`openrouter`	100+ models
Ollama	`ollama`	Local models
Azure OpenAI	`azure`	Azure-hosted GPT models
AWS Bedrock	`bedrock`	Claude, Titan, etc.
Google Vertex AI	`vertex`	Gemini
Mistral	`mistral`	Mistral, Mixtral
Groq	`groq`	Fast inference

§Streaming Example

use futures::StreamExt;

let request = CompletionRequest::new("gpt-4o", vec![Message::user("Write a story")])
    .with_streaming();

let mut stream = client.complete_stream(request).await?;

while let Some(chunk) = stream.next().await {
    if let Ok(chunk) = chunk {
        if let Some(ContentDelta::Text { text }) = chunk.delta {
            print!("{}", text);
        }
    }
}

§Tool Calling Example

use llmkit::{ToolDefinition, ToolBuilder};

let tool = ToolBuilder::new("get_weather")
    .description("Get the current weather for a location")
    .string_param("location", "The city name", true)
    .build();

let request = CompletionRequest::new("claude-sonnet-4-20250514", vec![Message::user("What's the weather in Paris?")])
    .with_tools(vec![tool]);

let response = client.complete(request).await?;

if response.has_tool_use() {
    for tool_use in response.tool_uses() {
        // Handle tool call
    }
}

Re-exports§

pub use audio::get_audio_model_info;
pub use audio::AudioFormat;
pub use audio::AudioInput;
pub use audio::AudioModelInfo;
pub use audio::AudioModelType;
pub use audio::SpeechProvider;
pub use audio::SpeechRequest;
pub use audio::SpeechResponse;
pub use audio::TimestampGranularity;
pub use audio::TranscriptFormat;
pub use audio::TranscriptSegment;
pub use audio::TranscriptWord;
pub use audio::TranscriptionProvider;
pub use audio::TranscriptionRequest;
pub use audio::TranscriptionResponse;
pub use audio::VoiceInfo;
pub use audio::AUDIO_MODELS;
pub use cache::CacheBackend;
pub use cache::CacheConfig;
pub use cache::CacheKeyBuilder;
pub use cache::CacheStats;
pub use cache::CachedResponse;
pub use cache::CachingProvider;
pub use cache::InMemoryCache;
pub use circuit_breaker::CircuitBreaker;
pub use circuit_breaker::CircuitBreakerConfig;
pub use circuit_breaker::CircuitState;
pub use circuit_breaker::HealthMetrics;
pub use client::ClientBuilder;
pub use client::LLMKitClient;
pub use embedding::get_embedding_model_info;
pub use embedding::get_embedding_models_by_provider;
pub use embedding::Embedding;
pub use embedding::EmbeddingInput;
pub use embedding::EmbeddingInputType;
pub use embedding::EmbeddingModelInfo;
pub use embedding::EmbeddingProvider;
pub use embedding::EmbeddingRequest;
pub use embedding::EmbeddingResponse;
pub use embedding::EmbeddingUsage;
pub use embedding::EncodingFormat;
pub use embedding::EMBEDDING_MODELS;
pub use error::Error;
pub use error::Result;
pub use failover::FailoverConfig;
pub use failover::FailoverProvider;
pub use failover::FailoverTrigger;
pub use failover::FallbackProvider;
pub use guardrails::Finding;
pub use guardrails::FindingType;
pub use guardrails::GuardedProvider;
pub use guardrails::Guardrails;
pub use guardrails::GuardrailsBuilder;
pub use guardrails::GuardrailsConfig;
pub use guardrails::GuardrailsResult;
pub use guardrails::PiiPattern;
pub use guardrails::PiiType;
pub use guardrails::SecretPattern;
pub use guardrails::SecretType;
pub use guardrails::Severity;
pub use health::DeploymentStatus;
pub use health::HealthCheckResult;
pub use health::HealthCheckType;
pub use health::HealthChecker;
pub use health::HealthCheckerHandle;
pub use health::PoolHealthStatus;
pub use image::get_image_model_info;
pub use image::AsyncImageProvider;
pub use image::GeneratedImage;
pub use image::ImageEditRequest;
pub use image::ImageFormat;
pub use image::ImageGenerationRequest;
pub use image::ImageGenerationResponse;
pub use image::ImageInput;
pub use image::ImageModelInfo;
pub use image::ImageProvider;
pub use image::ImageQuality;
pub use image::ImageSize;
pub use image::ImageStyle;
pub use image::ImageVariationRequest;
pub use image::JobId;
pub use image::JobStatus;
pub use image::IMAGE_MODELS;
pub use metering::CostTracker;
pub use metering::InMemoryMeteringSink;
pub use metering::MeteringProvider;
pub use metering::MeteringSink;
pub use metering::ModelStats;
pub use metering::TenantStats;
pub use metering::UsageFilter;
pub use metering::UsageRecord;
pub use metering::UsageStats;
pub use models::get_all_models;
pub use models::get_available_models;
pub use models::get_cheapest_model;
pub use models::get_classifier_models;
pub use models::get_current_models;
pub use models::get_model_info;
pub use models::get_models_by_provider;
pub use models::get_models_with_capability;
pub use models::get_registry_stats;
pub use models::list_providers;
pub use models::supports_structured_output;
pub use models::ModelBenchmarks;
pub use models::ModelCapabilities;
pub use models::ModelInfo;
pub use models::ModelPricing;
pub use models::ModelStatus;
pub use models::Provider as ProviderKind;
pub use models::RegistryStats;
pub use observability::MetricsRecorder;
pub use observability::MetricsSnapshot;
pub use observability::Observability;
pub use observability::ObservabilityConfig;
pub use observability::RequestSpan;
pub use observability::TracingContext;
pub use pool::DeploymentConfig;
pub use pool::DeploymentHealth;
pub use pool::HealthCheckConfig;
pub use pool::ProviderPool;
pub use pool::ProviderPoolBuilder;
pub use pool::RoutingStrategy;
pub use provider::ModelInfo as ProviderModelInfo;
pub use provider::Provider;
pub use provider::ProviderConfig;
pub use rate_limiter::RateLimiter;
pub use rate_limiter::TokenBucketConfig;
pub use retry::ProviderExt;
pub use retry::RetryConfig;
pub use retry::RetryingProvider;
pub use smart_router::Optimization;
pub use smart_router::ProviderMetrics;
pub use smart_router::RouterProviderConfig;
pub use smart_router::RouterStats;
pub use smart_router::RoutingDecision;
pub use smart_router::SmartRouter;
pub use smart_router::SmartRouterBuilder;
pub use specialized::get_moderation_model_info;
pub use specialized::get_ranking_model_info;
pub use specialized::ClassificationExample;
pub use specialized::ClassificationPrediction;
pub use specialized::ClassificationProvider;
pub use specialized::ClassificationRequest;
pub use specialized::ClassificationResponse;
pub use specialized::ModerationCategories;
pub use specialized::ModerationInput;
pub use specialized::ModerationModelInfo;
pub use specialized::ModerationProvider;
pub use specialized::ModerationRequest;
pub use specialized::ModerationResponse;
pub use specialized::ModerationScores;
pub use specialized::RankedDocument;
pub use specialized::RankingMeta;
pub use specialized::RankingModelInfo;
pub use specialized::RankingProvider;
pub use specialized::RankingRequest;
pub use specialized::RankingResponse;
pub use specialized::MODERATION_MODELS;
pub use specialized::RANKING_MODELS;
pub use stream::collect_stream;
pub use stream::CollectingStream;
pub use streaming_multiplexer::MultiplexedStream;
pub use streaming_multiplexer::MultiplexerStats;
pub use streaming_multiplexer::StreamingMultiplexer;
pub use templates::patterns as template_patterns;
pub use templates::PromptTemplate;
pub use templates::TemplateRegistry;
pub use templates::TemplatedRequestBuilder;
pub use tenant::CostLimitConfig;
pub use tenant::CostLimitExceeded;
pub use tenant::CostLimitType;
pub use tenant::RateLimitConfig;
pub use tenant::RateLimitExceeded;
pub use tenant::RateLimitType;
pub use tenant::TenantConfig;
pub use tenant::TenantError;
pub use tenant::TenantId;
pub use tenant::TenantManager;
pub use tenant::TenantProvider;
pub use tenant::TenantUsageStats;
pub use tools::ToolBuilder;
pub use tools::ToolChoice;
pub use tools::ToolDefinition;
pub use types::BatchError;
pub use types::BatchJob;
pub use types::BatchRequest;
pub use types::BatchRequestCounts;
pub use types::BatchResult;
pub use types::BatchStatus;
pub use types::CompletionRequest;
pub use types::CompletionResponse;
pub use types::ContentBlock;
pub use types::ContentDelta;
pub use types::JsonSchemaDefinition;
pub use types::Message;
pub use types::Role;
pub use types::StopReason;
pub use types::StreamChunk;
pub use types::StreamEventType;
pub use types::StructuredOutput;
pub use types::StructuredOutputType;
pub use types::ThinkingConfig;
pub use types::ThinkingEffort;
pub use types::ThinkingType;
pub use types::TokenCountRequest;
pub use types::TokenCountResult;
pub use types::Usage;
pub use video::get_video_model_info;
pub use video::get_video_models_by_provider;
pub use video::CameraMotion;
pub use video::VideoGenerationRequest;
pub use video::VideoGenerationResponse;
pub use video::VideoInput;
pub use video::VideoJobStatus;
pub use video::VideoModelInfo;
pub use video::VideoProvider;
pub use video::VideoResolution;
pub use video::VIDEO_MODELS;
pub use providers::AnthropicProvider;
pub use providers::OpenAIProvider;
pub use providers::audio::GrokRealtimeProvider;
pub use providers::chat::ChatLawProvider;
pub use providers::chat::LatamGPTProvider;
pub use providers::chat::LightOnProvider;

Modules§

audio: Audio APIs for text-to-speech (TTS) and speech-to-text (STT).
cache: Response caching infrastructure for LLM API calls.
circuit_breaker: Adaptive circuit breaker with real-time anomaly detection.
client: LLMKit client for unified LLM access.
embedding: Embedding API for generating text embeddings across multiple providers.
error: Error types for the LLMKit library.
failover: Failover configuration and logic for provider routing.
guardrails: Content filtering and guardrails for LLM requests/responses.
health: Health checking infrastructure for provider pools.
image: Image generation API for creating images from text prompts.
metering: Cost tracking and usage metering for LLM requests.
models: Model Registry - Database of LLM model specifications.
observability: Built-in observability with OpenTelemetry integration.
pool: Provider pool for load balancing and failover.
provider: Provider trait and related types.
providers: LLM Provider implementations organized by modality.
rate_limiter: Lock-free rate limiter for high-performance request throttling.
retry: Retry logic for LLM provider operations.
smart_router: Adaptive smart router with ML-based provider selection.
specialized: Specialized AI APIs for ranking, moderation, and classification.
stream: Streaming utilities for handling LLM response streams.
streaming_multiplexer: Zero-copy streaming multiplexer for request deduplication.
templates: Prompt template infrastructure for variable substitution.
tenant: Tenant context and multi-tenancy support for LLM requests.
tools: Tool/function definition types for LLM providers.
types: Core types for the LLMKit unified LLM API.
video: Video generation API for creating videos from text prompts and images.

Crate llmkit

Crate llmkit Copy item path

§LLMKit - Unified LLM API for Rust

§Features

§Quick Start

§Supported Providers

§Streaming Example

§Tool Calling Example

Re-exports§

Modules§

Crate llmkit