limit-llm
Multi-provider LLM client for Rust with streaming support.
Unified API for Anthropic Claude, OpenAI, z.ai, and local LLMs with built-in token tracking, state persistence, and automatic model handoff.
Part of the Limit ecosystem.
Why This Exists
Building AI applications shouldn't require learning different APIs for each LLM provider. limit-llm provides a single, consistent interface that works across Anthropic Claude, OpenAI GPT, z.ai GLM, and local models—so you can switch providers without rewriting code.
Features
- Multi-provider support: Anthropic Claude, OpenAI GPT, z.ai GLM, and local LLMs (Ollama, LM Studio, vLLM)
- Streaming responses: Async streaming with
futures::Streamfor real-time output - Token tracking: SQLite-based usage tracking with cost estimation
- State persistence: Serialize/restore conversation state with bincode
- Model handoff: Automatic fallback between providers on failure
- Tool calling: Full function/tool support for all compatible providers
- Thinking mode: Extended reasoning support (Claude, z.ai)
- Type-safe: Full Rust type system with serde integration
Installation
Add to your Cargo.toml:
[]
= "0.0.27"
Requirements: Rust 1.70+, tokio runtime
Quick Start
Basic Usage
use ;
use StreamExt;
async
With Configuration File
use ;
// Load from ~/.limit/config.toml
let config = load?;
// Create provider from config
let provider = from_config?;
// Use the provider
let stream = provider.send.await?;
Providers
| Provider | Client | Streaming | Tools | Thinking |
|---|---|---|---|---|
| Anthropic Claude | AnthropicClient |
✓ | ✓ | ✓ |
| OpenAI GPT | OpenAiProvider |
✓ | ✓ | — |
| z.ai GLM | ZaiProvider |
✓ | ✓ | ✓ |
| Local/Ollama | LocalProvider |
✓ | — | — |
Provider Configuration
# ~/.limit/config.toml
= "anthropic"
[]
= "claude-sonnet-4-6-20260217"
= 4096
= 60
Environment Variables
| Variable | Provider |
|---|---|
ANTHROPIC_API_KEY |
Anthropic Claude |
OPENAI_API_KEY |
OpenAI |
ZAI_API_KEY |
z.ai |
Tool Calling
use ;
use json;
let tools = vec!;
let messages = vec!;
let client = from_env?;
let stream = client.send.await?;
Token Tracking
use TrackingDb;
let tracking = new?;
// Record usage (automatically done by clients)
tracking.record_usage?;
// Get statistics
let stats = tracking.get_stats?;
println!;
State Persistence
use ;
let persistence = new?;
// Save conversation
persistence.save?;
// Restore later
let restored = persistence.?;
Model Handoff
Automatic fallback between providers:
use ModelHandoff;
let handoff = new
.with_primary
.with_fallback
.with_fallback;
// Automatically falls back if primary fails
let response = handoff.complete.await?;
Core Types
| Type | Description |
|---|---|
Message |
Chat message with role, content, and tool calls |
Role |
User, Assistant, System, or Tool |
Tool / ToolCall |
Function calling definitions |
Usage |
Token counting for prompt/completion |
Response |
Complete response with content and metadata |
API Reference
See docs.rs/limit-llm for full API documentation.
Examples
# Run examples
License
MIT © Mário Idival