llm-connector
Next-generation Rust library for LLM protocol abstraction with native multi-modal support.
Supports 11+ providers: OpenAI, Anthropic, Aliyun, Zhipu, Ollama, Tencent, Volcengine, LongCat, Moonshot, DeepSeek, and more. Clean architecture with unified output format, multi-modal content support, and configuration-driven design for maximum flexibility.
Key Features
- Multi-modal Content Support: Native support for text + images in a single message (v0.5.0+)
- Function Calling / Tools: Full support for OpenAI-compatible function calling with streaming
- Reasoning Models Support: Universal support for reasoning models (Volcengine Doubao-Seed-Code, DeepSeek R1, OpenAI o1, etc.)
- 11+ Provider Support: OpenAI, Anthropic, Aliyun, Zhipu, Ollama, Tencent, Volcengine, LongCat, Moonshot, DeepSeek, and more
- Unified Output Format: All providers return the same
StreamingResponsetype - Configuration-Driven Architecture: Clean Protocol/Provider separation with flexible configuration
- Extreme Performance: 7,000x+ faster client creation (7µs vs 53ms)
- Memory Efficient: Only 16 bytes per client instance
- Type-Safe: Full Rust type safety with Result-based error handling
- No Hardcoded Models: Use any model name without restrictions
- Online Model Discovery: Fetch available models dynamically from API
- Universal Streaming: Real-time streaming with format abstraction
- Ollama Model Management: Full CRUD operations for local models
Quick Start
Installation
Add to your Cargo.toml:
[]
= "0.5.4"
= { = "1", = ["full"] }
Optional features:
# Streaming support
= { = "0.5.4", = ["streaming"] }
Basic Usage
use ;
async
Multi-modal Content (v0.5.0+)
Send text and images in a single message:
use ;
async
Supported Content Types:
MessageBlock::text(text)- Text contentMessageBlock::image_url(url)- Image from URL (OpenAI format)MessageBlock::image_base64(media_type, data)- Base64 encoded imageMessageBlock::image_url_anthropic(url)- Image from URL (Anthropic format)
Provider Support:
- OpenAI - Full support (text + images)
- Anthropic - Full support (text + images)
- Other providers - Text only (images converted to text description)
See examples/multimodal_basic.rs for more examples.
List Supported Providers
Get a list of all supported provider names:
use LlmClient;
Output:
openai
aliyun
anthropic
zhipu
ollama
tencent
volcengine
longcat_anthropic
azure_openai
openai_compatible
See examples/list_providers.rs for a complete example.
Supported Providers
llm-connector supports 11+ LLM providers with a unified interface:
| Provider | Quick Start | Features |
|---|---|---|
| OpenAI | LlmClient::openai("sk-...") |
Chat, Streaming, Tools, Multi-modal, Reasoning (o1) |
| Anthropic | LlmClient::anthropic("sk-ant-...") |
Chat, Streaming, Multi-modal |
| Aliyun | LlmClient::aliyun("sk-...") |
Chat, Streaming, Qwen models |
| Zhipu | LlmClient::zhipu("key") |
Chat, Streaming, Tools, GLM models |
| Ollama | LlmClient::ollama() |
Chat, Streaming, Local models, Model management |
| Tencent | LlmClient::tencent("key") |
Chat, Streaming, Hunyuan models |
| Volcengine | LlmClient::volcengine("key") |
Chat, Streaming, Reasoning (Doubao-Seed-Code) |
| DeepSeek | LlmClient::deepseek("sk-...") |
Chat, Streaming, Reasoning (R1) |
| Moonshot | LlmClient::moonshot("sk-...") |
Chat, Streaming, Long context |
| LongCat | LlmClient::longcat_openai("ak-...") |
Chat, Streaming |
For detailed provider documentation and advanced configuration, see:
- Detailed Protocol Information below
- Provider Guides for provider-specific features
Function Calling / Tools
llm-connector supports OpenAI-compatible function calling (tools) with both streaming and non-streaming modes.
Basic Usage
use ;
async
Streaming Tool Calls
use StreamExt;
let request = ChatRequest ;
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Key Features:
- Automatic deduplication of streaming tool_calls
- Incremental accumulation support
- Compatible with OpenAI streaming format
- Works with Zhipu, OpenAI, and other compatible providers
For complete examples, see:
examples/zhipu_tools.rs- Basic tool callingexamples/zhipu_multiround_tools.rs- Multi-round conversations with toolsexamples/test_aliyun_streaming_tools.rs- Streaming tool calls
Technical Details: See docs/STREAMING_TOOL_CALLS.md for implementation details.
Streaming
llm-connector provides unified streaming support across all providers with the streaming feature.
Enable Streaming
[]
= { = "0.5.4", = ["streaming"] }
Basic Streaming
use ;
use StreamExt;
async
All providers return the same StreamingResponse type, making it easy to switch between providers without changing your code.
For examples, see:
examples/anthropic_streaming.rsexamples/ollama_streaming.rsexamples/volcengine_streaming.rs
Supported Protocols
1. OpenAI Protocol
Standard OpenAI API format with multiple deployment options.
// OpenAI (default)
let client = openai?;
// Custom base URL
let client = openai_with_base_url?;
// Azure OpenAI
let client = azure_openai?;
// OpenAI-compatible services
let client = openai_compatible?;
Features:
- No hardcoded models - use any model name
- Online model discovery via
models() - Azure OpenAI support
- Works with OpenAI-compatible providers (DeepSeek, Moonshot, etc.)
Example Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, o1-preview, o1-mini
2. Anthropic Protocol
Claude Messages API with multiple deployment options.
// Standard Anthropic API
let client = anthropic?;
// Google Vertex AI
let client = anthropic_vertex?;
// Amazon Bedrock
let client = anthropic_bedrock?;
Models: claude-3-5-sonnet-20241022, claude-3-opus, claude-3-haiku
3. Zhipu Protocol (ChatGLM)
Supports both native and OpenAI-compatible formats.
// Native format
let client = zhipu?;
// OpenAI-compatible format (recommended)
let client = zhipu_openai_compatible?;
Models: glm-4, glm-4-flash, glm-4-air, glm-4-plus, glm-4x
4. Aliyun Protocol (DashScope)
Custom protocol for Qwen models with regional support.
// Default (China)
let client = aliyun?;
// International
let client = aliyun_international?;
// Private cloud
let client = aliyun_private?;
Models: qwen-turbo, qwen-plus, qwen-max
5. Ollama Protocol (Local)
Local LLM server with comprehensive model management.
// Default: localhost:11434
let client = ollama?;
// Custom URL
let client = ollama_with_base_url?;
// With custom configuration
let client = ollama_with_config?;
Models: llama3.2, llama3.1, mistral, mixtral, qwen2.5, etc.
Features:
- Model listing and management
- Pull, delete, and inspect models
- Local server support with custom URLs
- Enhanced error handling for Ollama-specific operations
- Direct access to Ollama-specific features
6. Tencent Hunyuan
OpenAI-compatible API for Tencent Cloud.
// Default
let client = tencent?;
// With custom configuration
let client = tencent_with_config?;
Models: hunyuan-lite, hunyuan-standard, hunyuan-pro, hunyuan-turbo
7. Volcengine
OpenAI-compatible API with custom endpoint paths. Supports both standard chat models and reasoning models (Doubao-Seed-Code).
// Default
let client = volcengine?;
// With custom configuration
let client = volcengine_with_config?;
// Streaming example (works with both standard and reasoning models)
Endpoint: Uses /api/v3/chat/completions instead of /v1/chat/completions
Models:
- Standard models: Use endpoint ID (e.g.,
ep-...) - Reasoning models: Doubao-Seed-Code (outputs via
reasoning_contentfield, automatically handled)
Streaming Support: Full support for both standard and reasoning models. The library automatically extracts content from the appropriate field (content or reasoning_content).
8. LongCat API
Supports both OpenAI and Anthropic formats.
// OpenAI format
let client = longcat_openai?;
// Anthropic format (with Bearer auth)
let client = longcat_anthropic?;
Models: LongCat-Flash-Chat and other LongCat models
Note: LongCat's Anthropic format uses Authorization: Bearer instead of x-api-key
9. Moonshot
OpenAI-compatible API for Moonshot AI.
// Default
let client = moonshot?;
// With custom configuration
let client = moonshot_with_config?;
Models: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k
Features:
- OpenAI-compatible API format
- Long context support (up to 128k tokens)
- Streaming support
- Unified output format
10. DeepSeek
OpenAI-compatible API with reasoning models support.
// Default
let client = deepseek?;
// With custom configuration
let client = deepseek_with_config?;
Models:
deepseek-chat- Standard chat modeldeepseek-reasoner- Reasoning model with thinking process
Features:
- OpenAI-compatible API format
- Reasoning content support (thinking process)
- Streaming support
- Unified output format
- Automatic extraction of reasoning content
Reasoning Model Example:
let request = ChatRequest ;
let response = client.chat.await?;
// Get reasoning process (thinking)
if let Some = response.reasoning_content
// Get final answer
println!;
Ollama Model Management
Access Ollama-specific features through the special interface:
let client = ollama?;
// Access Ollama-specific features
if let Some = client.as_ollama
Supported Ollama Operations
- List Models:
models()- Get all locally installed models - Pull Models:
pull_model(name)- Download models from registry - Delete Models:
delete_model(name)- Remove local models - Show Details:
show_model(name)- Get comprehensive model information - Check Existence:
model_exists(name)- Verify if model is installed
Universal Streaming Format Support
The library provides comprehensive streaming support with universal format abstraction for maximum flexibility:
Standard OpenAI Format (Default)
use StreamExt;
use ;
let client = anthropic?;
let request = ChatRequest ;
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Pure Ollama Format for Tool Integration
For perfect compatibility with tools like Zed.dev, use the pure Ollama streaming format:
use StreamExt;
// Use pure Ollama format (perfect for Zed.dev)
let mut stream = client.chat_stream_ollama.await?;
while let Some = stream.next.await
Legacy Ollama Format (Embedded)
For backward compatibility, the embedded format is still available:
use StreamExt;
// Use embedded Ollama format (legacy)
let mut stream = client.chat_stream_ollama_embedded.await?;
while let Some = stream.next.await
Streaming Chat Completions
For real-time streaming responses, use the streaming interface:
use ;
use StreamExt;
let request = ChatRequest ;
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Advanced Streaming Features
The streaming response provides rich information and convenience methods:
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Format Comparison
| Format | Output Example | Use Case |
|---|---|---|
| JSON | {"content":"hello"} |
API responses, standard JSON |
| SSE | data: {"content":"hello"}\n\n |
Web real-time streaming |
| NDJSON | {"content":"hello"}\n |
Log processing, data pipelines |
Enhanced Anthropic Streaming Features
- State Management: Proper handling of
message_start,content_block_delta,message_delta,message_stopevents - Event Processing: Correct parsing of complex Anthropic streaming responses
- Usage Tracking: Real-time token usage statistics during streaming
- Error Resilience: Robust error handling for streaming interruptions
Model Discovery
Fetch the latest available models from the API:
let client = openai?;
// Fetch models online from the API
let models = client.models.await?;
println!;
Supported by:
- OpenAI Protocol (including OpenAI-compatible providers like DeepSeek, Zhipu, Moonshot)
- Anthropic Protocol (limited support - returns fallback endpoint)
- Ollama Protocol (full support via
/api/tags) - Aliyun Protocol (not supported)
Example Results:
- DeepSeek:
["deepseek-chat", "deepseek-reasoner"] - Zhipu:
["glm-4.5", "glm-4.5-air", "glm-4.6"] - Moonshot:
["moonshot-v1-32k", "kimi-latest", ...]
Recommendation:
- Cache
models()results to avoid repeated API calls - For protocols that don't support model listing, you can use any model name directly in your requests
Request Examples
OpenAI / OpenAI-compatible
let request = ChatRequest ;
Anthropic (requires max_tokens)
let request = ChatRequest ;
Aliyun (DashScope)
let request = ChatRequest ;
Ollama (Local)
let request = ChatRequest ;
Ollama Streaming (GLM-4.6 via Remote Gateway)
If you expose an Ollama-compatible API while the backend actually calls Zhipu glm-4.6 (remote gateway), you do NOT need any local model installation. Just point the client to your gateway and use the model id defined by your service:
use StreamExt;
use ;
async
Run example (requires streaming feature):
Note: This setup targets a remote Ollama-compatible gateway. The model id is defined by your backend (e.g. glm-4.6); no local installation is required. If your gateway uses a different identifier, replace it accordingly.
Reasoning Models Support
llm-connector provides universal support for reasoning models across different providers. No matter which field the reasoning content is in (reasoning_content, reasoning, thought, thinking), it's automatically extracted and available via get_content().
Supported Reasoning Models
| Provider | Model | Reasoning Field | Status |
|---|---|---|---|
| Volcengine | Doubao-Seed-Code | reasoning_content |
Verified |
| DeepSeek | DeepSeek R1 | reasoning_content / reasoning |
Supported |
| OpenAI | o1-preview, o1-mini | thought / reasoning_content |
Supported |
| Qwen | Qwen-Plus | reasoning |
Supported |
| Anthropic | Claude 3.5 Sonnet | thinking |
Supported |
Usage Example
The same code works for all reasoning models:
use StreamExt;
// Works with Volcengine Doubao-Seed-Code
let provider = volcengine_with_config?;
// Works with DeepSeek R1
// let provider = openai_with_config("deepseek-key", Some("https://api.deepseek.com"), None, None)?;
// Works with OpenAI o1
// let provider = openai("openai-key")?;
let request = ChatRequest ;
let mut stream = provider.chat_stream.await?;
while let Some = stream.next.await
Key Benefits:
- Zero Configuration: Automatic field detection
- Unified Interface: Same code for all reasoning models
- Backward Compatible: Standard models (GPT-4, Claude) work as before
- Priority-Based: Standard
contentfield takes precedence when available
See Reasoning Models Support Guide for detailed documentation.
Error Handling
use LlmConnectorError;
match client.chat.await
Configuration
Simple API Key (Recommended)
let client = openai;
Environment Variables
use env;
let api_key = var?;
let client = openai;
Protocol Information
let client = openai?;
// Get provider name
println!;
// Fetch models online (requires API call)
let models = client.models.await?;
println!;
Reasoning Synonyms
Many providers return hidden or provider-specific keys for model reasoning content (chain-of-thought). To simplify usage across providers, we normalize four common keys:
reasoning_content,reasoning,thought,thinking
Post-processing automatically scans raw JSON and fills these optional fields on both regular messages (Message) and streaming deltas (Delta). You can read the first available value via a convenience method:
// Non-streaming
let msg = &response.choices.message;
if let Some = msg.reasoning_any
// Streaming
while let Some = stream.next.await
Notes:
- Fields remain
Noneif the provider does not return any reasoning keys. - The normalization is provider-agnostic and applied uniformly to OpenAI, Anthropic, Aliyun (Qwen), Zhipu (GLM), and DeepSeek flows (including streaming).
StreamingResponsealso backfills its top-levelreasoning_contentfrom the first delta that contains reasoning.
Unified Output Format
All providers output the same unified StreamingResponse format, regardless of their native API format.
Different Input Formats → Protocol Conversion → Unified StreamingResponse
Why This Matters
- Consistent API - Same code works with all providers
- Easy Switching - Change providers without changing business logic
- Type Safety - Compile-time guarantees across all providers
- Lower Learning Curve - Learn once, use everywhere
Example
// Same code works with ANY provider
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
How It Works
| Provider | Native Format | Conversion | Complexity |
|---|---|---|---|
| OpenAI | OpenAI standard | Direct mapping | Simple |
| Tencent | OpenAI compatible | Direct mapping | Simple |
| Volcengine | OpenAI compatible | Direct mapping | Simple |
| Anthropic | Multi-event stream | Custom parser | Complex |
| Aliyun | DashScope format | Custom parser | Medium |
| Zhipu | GLM format | Custom parser | Medium |
All conversions happen transparently in the Protocol layer - you just get consistent StreamingResponse objects!
Debugging & Troubleshooting
Common Issues
Authentication Error:
Authentication failed: Incorrect API key provided
Solutions:
- Verify your API key is correct (no extra spaces)
- Check if your account has credits
- Generate a new API key from your provider's dashboard
- Test with a simple chat request to verify the key works
Timeout Error:
Request timeout
Solutions:
- Check your network connection
- Increase timeout settings using
*_with_timeout()methods - Verify the provider's API endpoint is accessible
Model Not Found:
Model not found
Solutions:
- Use
fetch_models()to get available models - Check the model name spelling
- Verify your account has access to the model
Recent Changes
v0.5.4 (Latest)
Streaming Tool Calls Fix
- Fixed: Incremental accumulation and deduplication logic for streaming tool_calls
- Improved: Support for OpenAI streaming API's incremental tool_calls format
- Guaranteed: Each tool_call is sent only once, preventing duplicate execution
- Compatible: Fully backward compatible, no impact on existing code
v0.5.3
Universal Reasoning Models Support
- Support for all major reasoning models (Volcengine Doubao-Seed-Code, DeepSeek R1, OpenAI o1, etc.)
- Zero-configuration automatic field detection
- Unified interface, same code works for all reasoning models
v0.4.8
Simplified Configuration Architecture
- Unified
chat_stream()method - 3000x+ performance improvement
- Support for reasoning content and usage statistics
For complete changelog, see CHANGELOG.md
Design Philosophy
Minimal by Design:
- Only 4 protocols to cover all major LLM providers
- No hardcoded model restrictions - use any model name
- No complex configuration files or registries
- Direct API usage with clear abstractions
Protocol-first:
- Group providers by API protocol, not by company
- OpenAI-compatible providers share one implementation
- Extensible through protocol adapters
Examples
Check out the examples/ directory for various usage examples:
# Basic usage examples
# Multi-modal support
# Ollama model management
# Function calling / Tools
# Streaming examples
# List all available providers
Example Descriptions
Basic Examples:
openai_basic.rs- Simple OpenAI chat exampleanthropic_streaming.rs- Anthropic streaming with proper event handlingaliyun_basic.rs- Aliyun DashScope basic usagezhipu_basic.rs- Zhipu GLM basic usageollama_basic.rs- Ollama local model usagetencent_basic.rs- Tencent Hunyuan basic usage
Advanced Examples:
multimodal_basic.rs- Multi-modal content (text + images)ollama_model_management.rs- Complete Ollama model CRUD operationszhipu_tools.rs- Function calling with Zhipuzhipu_multiround_tools.rs- Multi-round conversation with toolsvolcengine_streaming.rs- Volcengine streaming with reasoning models
Utility Examples:
list_providers.rs- List all available providers and their configurations
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT