LLMKit
The production-grade LLM client. One API for 100+ providers. Pure Rust core with native bindings.
11,000+ models · 100+ providers · Rust | Python | Node.js
┌──────────────┐
│ Rust Core │
└──────┬───────┘
┌──────────┬─────────┼─────────┬──────────┐
▼ ▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│Python │ │ Node │ │ WASM │ │ Go │ │ ... │
│ ✅ │ │ ✅ │ │ Soon │ │ Soon │ │ │
└───────┘ └───────┘ └───────┘ └───────┘ └───────┘
Documentation · Changelog · Contributing
Why LLMKit?
Built for Production
LLMKit is written in pure Rust — no Python runtime, no garbage collector, no memory leaks. Deploy with confidence knowing your LLM infrastructure won't degrade over time or crash under load.
- Memory Safety — Rust's ownership model eliminates memory leaks by design
- True Concurrency — No GIL. Handle thousands of concurrent streams efficiently
- Minimal Footprint — Native binary, not a 150MB Python package
- Run Forever — No worker restarts, no memory bloat, no surprises
Features That Actually Work
- Prompt Caching — Native support for Anthropic, OpenAI, Google, DeepSeek. Save up to 90% on API costs
- Extended Thinking — Unified API for reasoning across 5 providers (Anthropic, OpenAI, Google, DeepSeek, OpenRouter)
- Streaming — Zero-copy streaming with automatic request deduplication
- 11,000+ Model Registry — Pricing, context limits, and capabilities baked in. No external API calls
Production Features
| Feature | Description |
|---|---|
| Smart Router | ML-based provider selection optimizing for latency, cost, or reliability |
| Circuit Breaker | Automatic failure detection and recovery with anomaly detection |
| Rate Limiting | Lock-free, hierarchical rate limiting at scale |
| Cost Tracking | Multi-tenant metering with cache-aware pricing |
| Guardrails | PII detection, secret scanning, prompt injection prevention |
| Observability | OpenTelemetry integration for tracing and metrics |
Quick Start
Rust
use ;
let client = from_env?;
let response = client.complete.await?;
println!;
Python
=
=
Node.js
import { LLMKitClient, Message, CompletionRequest } from 'llmkit-node'
const client = LLMKitClient.fromEnv()
const response = await client.complete(
new CompletionRequest('anthropic/claude-sonnet-4-20250514', [Message.user('Hello!')])
)
console.log(response.textContent())
Installation
Rust
[]
= { = "0.1", = ["anthropic", "openai"] }
Python
Node.js
Features
| Chat | Media | Specialized |
|---|---|---|
| Streaming | Image Generation | Embeddings |
| Tool Calling | Vision/Images | Token Counting |
| Structured Output | Audio STT/TTS | Batch Processing |
| Extended Thinking | Video Generation | Model Registry |
| Prompt Caching | 11,000+ Models |
Providers
| Category | Providers |
|---|---|
| Core | Anthropic, OpenAI, Azure OpenAI |
| Cloud | AWS Bedrock, Google Vertex AI, Google AI |
| Fast Inference | Groq, Mistral, Cerebras, SambaNova, Fireworks, DeepSeek |
| Enterprise | Cohere, AI21 |
| Hosted | Together, Perplexity, DeepInfra, OpenRouter |
| Local | Ollama, LM Studio, vLLM |
| Audio | Deepgram, ElevenLabs |
| Video | Runware |
See PROVIDERS.md for the full list with environment variables.
Examples
Streaming
let mut stream = client.complete_stream.await?;
while let Some = stream.next.await
Tool Calling
= \
\
\
=
Prompt Caching
# Cache large system prompts - save up to 90% on repeated calls
=
Extended Thinking
// Unified reasoning API across providers
const request = new CompletionRequest('anthropic/claude-sonnet-4-20250514', messages)
.withThinking({ budgetTokens: 10000 })
const response = await client.complete(request)
console.log(response.thinkingContent()) // See the reasoning process
console.log(response.textContent()) // Final answer
Model Registry
# Get model details - no API calls, instant lookup
=
# Find models by provider
=
For more examples, see examples/.
Documentation
- Getting Started (Rust)
- Getting Started (Python)
- Getting Started (Node.js)
- Model Registry — 11,000+ models with pricing
Building from Source
# Python bindings
&&
# Node.js bindings
&& &&
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
License
Dual-licensed under MIT or Apache-2.0.
Built with Rust · Production Ready · GitHub