Crate llmkit

Crate llmkit 

Source
Expand description

§LLMKit - Unified LLM API for Rust

LLMKit is a unified interface for interacting with multiple LLM providers, written in pure Rust with Python and TypeScript bindings.

§Features

  • Unified API: Single interface for Anthropic, OpenAI, and many other providers
  • Streaming: First-class support for streaming responses
  • Tool Calling: Consistent tool/function calling across providers
  • Type-Safe: Full Rust type safety with compile-time provider selection
  • Feature Flags: Only compile the providers you need

§Quick Start

use llmkit::{LLMKitClient, Message, CompletionRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client with providers from environment
    let client = LLMKitClient::builder()
        .with_anthropic_from_env()
        .with_openai_from_env()
        .build()?;

    // Make a request - provider auto-detected from model name
    let request = CompletionRequest::new(
        "claude-sonnet-4-20250514",
        vec![Message::user("Hello, how are you?")]
    );

    let response = client.complete(request).await?;
    println!("{}", response.text_content());

    Ok(())
}

§Supported Providers

ProviderFeature FlagModels
AnthropicanthropicClaude 3, Claude 3.5, Claude 4
OpenAIopenaiGPT-4o, GPT-4, o1
OpenRouteropenrouter100+ models
OllamaollamaLocal models
Azure OpenAIazureAzure-hosted GPT models
AWS BedrockbedrockClaude, Titan, etc.
Google Vertex AIvertexGemini
MistralmistralMistral, Mixtral
GroqgroqFast inference

§Streaming Example

use futures::StreamExt;

let request = CompletionRequest::new("gpt-4o", vec![Message::user("Write a story")])
    .with_streaming();

let mut stream = client.complete_stream(request).await?;

while let Some(chunk) = stream.next().await {
    if let Ok(chunk) = chunk {
        if let Some(ContentDelta::Text { text }) = chunk.delta {
            print!("{}", text);
        }
    }
}

§Tool Calling Example

use llmkit::{ToolDefinition, ToolBuilder};

let tool = ToolBuilder::new("get_weather")
    .description("Get the current weather for a location")
    .string_param("location", "The city name", true)
    .build();

let request = CompletionRequest::new("claude-sonnet-4-20250514", vec![Message::user("What's the weather in Paris?")])
    .with_tools(vec![tool]);

let response = client.complete(request).await?;

if response.has_tool_use() {
    for tool_use in response.tool_uses() {
        // Handle tool call
    }
}

Re-exports§

pub use audio::get_audio_model_info;
pub use audio::AudioFormat;
pub use audio::AudioInput;
pub use audio::AudioModelInfo;
pub use audio::AudioModelType;
pub use audio::SpeechProvider;
pub use audio::SpeechRequest;
pub use audio::SpeechResponse;
pub use audio::TimestampGranularity;
pub use audio::TranscriptFormat;
pub use audio::TranscriptSegment;
pub use audio::TranscriptWord;
pub use audio::TranscriptionProvider;
pub use audio::TranscriptionRequest;
pub use audio::TranscriptionResponse;
pub use audio::VoiceInfo;
pub use audio::AUDIO_MODELS;
pub use cache::CacheBackend;
pub use cache::CacheConfig;
pub use cache::CacheKeyBuilder;
pub use cache::CacheStats;
pub use cache::CachedResponse;
pub use cache::CachingProvider;
pub use cache::InMemoryCache;
pub use circuit_breaker::CircuitBreaker;
pub use circuit_breaker::CircuitBreakerConfig;
pub use circuit_breaker::CircuitState;
pub use circuit_breaker::HealthMetrics;
pub use client::ClientBuilder;
pub use client::LLMKitClient;
pub use embedding::get_embedding_model_info;
pub use embedding::get_embedding_models_by_provider;
pub use embedding::Embedding;
pub use embedding::EmbeddingInput;
pub use embedding::EmbeddingInputType;
pub use embedding::EmbeddingModelInfo;
pub use embedding::EmbeddingProvider;
pub use embedding::EmbeddingRequest;
pub use embedding::EmbeddingResponse;
pub use embedding::EmbeddingUsage;
pub use embedding::EncodingFormat;
pub use embedding::EMBEDDING_MODELS;
pub use error::Error;
pub use error::Result;
pub use failover::FailoverConfig;
pub use failover::FailoverProvider;
pub use failover::FailoverTrigger;
pub use failover::FallbackProvider;
pub use guardrails::Finding;
pub use guardrails::FindingType;
pub use guardrails::GuardedProvider;
pub use guardrails::Guardrails;
pub use guardrails::GuardrailsBuilder;
pub use guardrails::GuardrailsConfig;
pub use guardrails::GuardrailsResult;
pub use guardrails::PiiPattern;
pub use guardrails::PiiType;
pub use guardrails::SecretPattern;
pub use guardrails::SecretType;
pub use guardrails::Severity;
pub use health::DeploymentStatus;
pub use health::HealthCheckResult;
pub use health::HealthCheckType;
pub use health::HealthChecker;
pub use health::HealthCheckerHandle;
pub use health::PoolHealthStatus;
pub use image::get_image_model_info;
pub use image::AsyncImageProvider;
pub use image::GeneratedImage;
pub use image::ImageEditRequest;
pub use image::ImageFormat;
pub use image::ImageGenerationRequest;
pub use image::ImageGenerationResponse;
pub use image::ImageInput;
pub use image::ImageModelInfo;
pub use image::ImageProvider;
pub use image::ImageQuality;
pub use image::ImageSize;
pub use image::ImageStyle;
pub use image::ImageVariationRequest;
pub use image::JobId;
pub use image::JobStatus;
pub use image::IMAGE_MODELS;
pub use metering::CostTracker;
pub use metering::InMemoryMeteringSink;
pub use metering::MeteringProvider;
pub use metering::MeteringSink;
pub use metering::ModelStats;
pub use metering::TenantStats;
pub use metering::UsageFilter;
pub use metering::UsageRecord;
pub use metering::UsageStats;
pub use models::get_all_models;
pub use models::get_available_models;
pub use models::get_cheapest_model;
pub use models::get_classifier_models;
pub use models::get_current_models;
pub use models::get_model_info;
pub use models::get_models_by_provider;
pub use models::get_models_with_capability;
pub use models::get_registry_stats;
pub use models::list_providers;
pub use models::supports_structured_output;
pub use models::ModelBenchmarks;
pub use models::ModelCapabilities;
pub use models::ModelInfo;
pub use models::ModelPricing;
pub use models::ModelStatus;
pub use models::Provider as ProviderKind;
pub use models::RegistryStats;
pub use observability::MetricsRecorder;
pub use observability::MetricsSnapshot;
pub use observability::Observability;
pub use observability::ObservabilityConfig;
pub use observability::RequestSpan;
pub use observability::TracingContext;
pub use pool::DeploymentConfig;
pub use pool::DeploymentHealth;
pub use pool::HealthCheckConfig;
pub use pool::ProviderPool;
pub use pool::ProviderPoolBuilder;
pub use pool::RoutingStrategy;
pub use provider::ModelInfo as ProviderModelInfo;
pub use provider::Provider;
pub use provider::ProviderConfig;
pub use rate_limiter::RateLimiter;
pub use rate_limiter::TokenBucketConfig;
pub use retry::ProviderExt;
pub use retry::RetryConfig;
pub use retry::RetryingProvider;
pub use smart_router::Optimization;
pub use smart_router::ProviderMetrics;
pub use smart_router::RouterProviderConfig;
pub use smart_router::RouterStats;
pub use smart_router::RoutingDecision;
pub use smart_router::SmartRouter;
pub use smart_router::SmartRouterBuilder;
pub use specialized::get_moderation_model_info;
pub use specialized::get_ranking_model_info;
pub use specialized::ClassificationExample;
pub use specialized::ClassificationPrediction;
pub use specialized::ClassificationProvider;
pub use specialized::ClassificationRequest;
pub use specialized::ClassificationResponse;
pub use specialized::ModerationCategories;
pub use specialized::ModerationInput;
pub use specialized::ModerationModelInfo;
pub use specialized::ModerationProvider;
pub use specialized::ModerationRequest;
pub use specialized::ModerationResponse;
pub use specialized::ModerationScores;
pub use specialized::RankedDocument;
pub use specialized::RankingMeta;
pub use specialized::RankingModelInfo;
pub use specialized::RankingProvider;
pub use specialized::RankingRequest;
pub use specialized::RankingResponse;
pub use specialized::MODERATION_MODELS;
pub use specialized::RANKING_MODELS;
pub use stream::collect_stream;
pub use stream::CollectingStream;
pub use streaming_multiplexer::MultiplexedStream;
pub use streaming_multiplexer::MultiplexerStats;
pub use streaming_multiplexer::StreamingMultiplexer;
pub use templates::patterns as template_patterns;
pub use templates::PromptTemplate;
pub use templates::TemplateRegistry;
pub use templates::TemplatedRequestBuilder;
pub use tenant::CostLimitConfig;
pub use tenant::CostLimitExceeded;
pub use tenant::CostLimitType;
pub use tenant::RateLimitConfig;
pub use tenant::RateLimitExceeded;
pub use tenant::RateLimitType;
pub use tenant::TenantConfig;
pub use tenant::TenantError;
pub use tenant::TenantId;
pub use tenant::TenantManager;
pub use tenant::TenantProvider;
pub use tenant::TenantUsageStats;
pub use tools::ToolBuilder;
pub use tools::ToolChoice;
pub use tools::ToolDefinition;
pub use types::BatchError;
pub use types::BatchJob;
pub use types::BatchRequest;
pub use types::BatchRequestCounts;
pub use types::BatchResult;
pub use types::BatchStatus;
pub use types::CompletionRequest;
pub use types::CompletionResponse;
pub use types::ContentBlock;
pub use types::ContentDelta;
pub use types::JsonSchemaDefinition;
pub use types::Message;
pub use types::Role;
pub use types::StopReason;
pub use types::StreamChunk;
pub use types::StreamEventType;
pub use types::StructuredOutput;
pub use types::StructuredOutputType;
pub use types::ThinkingConfig;
pub use types::ThinkingEffort;
pub use types::ThinkingType;
pub use types::TokenCountRequest;
pub use types::TokenCountResult;
pub use types::Usage;
pub use video::get_video_model_info;
pub use video::get_video_models_by_provider;
pub use video::CameraMotion;
pub use video::VideoGenerationRequest;
pub use video::VideoGenerationResponse;
pub use video::VideoInput;
pub use video::VideoJobStatus;
pub use video::VideoModelInfo;
pub use video::VideoProvider;
pub use video::VideoResolution;
pub use video::VIDEO_MODELS;
pub use providers::AnthropicProvider;
pub use providers::OpenAIProvider;
pub use providers::audio::GrokRealtimeProvider;
pub use providers::chat::ChatLawProvider;
pub use providers::chat::LatamGPTProvider;
pub use providers::chat::LightOnProvider;

Modules§

audio
Audio APIs for text-to-speech (TTS) and speech-to-text (STT).
cache
Response caching infrastructure for LLM API calls.
circuit_breaker
Adaptive circuit breaker with real-time anomaly detection.
client
LLMKit client for unified LLM access.
embedding
Embedding API for generating text embeddings across multiple providers.
error
Error types for the LLMKit library.
failover
Failover configuration and logic for provider routing.
guardrails
Content filtering and guardrails for LLM requests/responses.
health
Health checking infrastructure for provider pools.
image
Image generation API for creating images from text prompts.
metering
Cost tracking and usage metering for LLM requests.
models
Model Registry - Database of LLM model specifications.
observability
Built-in observability with OpenTelemetry integration.
pool
Provider pool for load balancing and failover.
provider
Provider trait and related types.
providers
LLM Provider implementations organized by modality.
rate_limiter
Lock-free rate limiter for high-performance request throttling.
retry
Retry logic for LLM provider operations.
smart_router
Adaptive smart router with ML-based provider selection.
specialized
Specialized AI APIs for ranking, moderation, and classification.
stream
Streaming utilities for handling LLM response streams.
streaming_multiplexer
Zero-copy streaming multiplexer for request deduplication.
templates
Prompt template infrastructure for variable substitution.
tenant
Tenant context and multi-tenancy support for LLM requests.
tools
Tool/function definition types for LLM providers.
types
Core types for the LLMKit unified LLM API.
video
Video generation API for creating videos from text prompts and images.