llm-connector
Next-generation Rust library for LLM protocol abstraction.
Supports 5 protocols: OpenAI, Anthropic, Zhipu, Aliyun, Ollama. Clean architecture with clear Protocol/Provider separation for maximum performance and extensibility.
๐จ Having Authentication Issues?
Test your API keys right now:
This will tell you exactly what's wrong with your API keys! See Debugging & Troubleshooting for more details.
โจ Key Features
- 5 Protocol Support: OpenAI, Anthropic, Zhipu, Aliyun, Ollama
- V2 Architecture: Clean Protocol/Provider separation for maximum extensibility
- Extreme Performance: 7,000x+ faster client creation (7ยตs vs 53ms)
- Memory Efficient: Only 16 bytes per client instance
- Type-Safe: Full Rust type safety with Result-based error handling
- No Hardcoded Models: Use any model name without restrictions
- Online Model Discovery: Fetch available models dynamically from API
- Universal Streaming: Real-time streaming with format abstraction (JSON/SSE/NDJSON)
- Ollama Model Management: Full CRUD operations for local models
- Unified Interface: Same API for all protocols
Quick Start
Installation
Add to your Cargo.toml:
[]
= "0.4.0"
= { = "1", = ["full"] }
Optional features:
# Streaming support
= { = "0.4.0", = ["streaming"] }
# V1 legacy compatibility
= { = "0.4.0", = ["v1-legacy"] }
# Both streaming and V1 compatibility
= { = "0.4.0", = ["streaming", "v1-legacy"] }
Basic Usage
use ;
async
Supported Protocols
1. OpenAI Protocol
Standard OpenAI API format with multiple deployment options.
// OpenAI (default)
let client = openai?;
// Custom base URL
let client = openai_with_base_url?;
// Azure OpenAI
let client = azure_openai?;
// OpenAI-compatible services
let client = openai_compatible?;
Features:
- โ No hardcoded models - use any model name
- โ
Online model discovery via
models() - โ Azure OpenAI support
- โ Works with OpenAI-compatible providers (DeepSeek, Moonshot, etc.)
Example Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, o1-preview, o1-mini
2. Anthropic Protocol
Claude Messages API with multiple deployment options.
// Standard Anthropic API
let client = anthropic?;
// Google Vertex AI
let client = anthropic_vertex?;
// Amazon Bedrock
let client = anthropic_bedrock?;
Models: claude-3-5-sonnet-20241022, claude-3-opus, claude-3-haiku
3. Zhipu Protocol (ChatGLM)
Supports both native and OpenAI-compatible formats.
// Native format
let client = zhipu?;
// OpenAI-compatible format (recommended)
let client = zhipu_openai_compatible?;
Models: glm-4, glm-4-flash, glm-4-air, glm-4-plus, glm-4x
4. Aliyun Protocol (DashScope)
Custom protocol for Qwen models with regional support.
// Default (China)
let client = aliyun?;
// International
let client = aliyun_international?;
// Private cloud
let client = aliyun_private?;
Models: qwen-turbo, qwen-plus, qwen-max
5. Ollama Protocol (Local)
Local LLM server with comprehensive model management.
// Default: localhost:11434
let client = ollama?;
// Custom URL
let client = ollama_with_url?;
// With custom configuration
let client = ollama_with_config?;
Models: llama3.2, llama3.1, mistral, mixtral, qwen2.5, etc.
Features:
- โ Model listing and management
- โ Pull, delete, and inspect models
- โ Local server support with custom URLs
- โ Enhanced error handling for Ollama-specific operations
- โ Direct access to Ollama-specific features
Ollama Model Management
Access Ollama-specific features through the special interface:
let client = ollama?;
// Access Ollama-specific features
if let Some = client.as_ollama
Supported Ollama Operations
- List Models:
models()- Get all locally installed models - Pull Models:
pull_model(name)- Download models from registry - Delete Models:
delete_model(name)- Remove local models - Show Details:
show_model(name)- Get comprehensive model information - Check Existence:
model_exists(name)- Verify if model is installed
Universal Streaming Format Support
The library provides comprehensive streaming support with universal format abstraction for maximum flexibility:
Standard OpenAI Format (Default)
use StreamExt;
use ;
let client = anthropic?;
let request = ChatRequest ;
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Pure Ollama Format for Tool Integration
For perfect compatibility with tools like Zed.dev, use the pure Ollama streaming format:
use StreamExt;
// Use pure Ollama format (perfect for Zed.dev)
let mut stream = client.chat_stream_ollama.await?;
while let Some = stream.next.await
Legacy Ollama Format (Embedded)
For backward compatibility, the embedded format is still available:
use StreamExt;
// Use embedded Ollama format (legacy)
let mut stream = client.chat_stream_ollama_embedded.await?;
while let Some = stream.next.await
Streaming Chat Completions
For real-time streaming responses, use the streaming interface:
use ;
use StreamExt;
let request = ChatRequest ;
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Advanced Streaming Features
The streaming response provides rich information and convenience methods:
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Format Comparison
| Format | Output Example | Use Case |
|---|---|---|
| JSON | {"content":"hello"} |
API responses, standard JSON |
| SSE | data: {"content":"hello"}\n\n |
Web real-time streaming |
| NDJSON | {"content":"hello"}\n |
Log processing, data pipelines |
Enhanced Anthropic Streaming Features
- State Management: Proper handling of
message_start,content_block_delta,message_delta,message_stopevents - Event Processing: Correct parsing of complex Anthropic streaming responses
- Usage Tracking: Real-time token usage statistics during streaming
- Error Resilience: Robust error handling for streaming interruptions
Model Discovery
Fetch the latest available models from the API:
let client = openai?;
// Fetch models online from the API
let models = client.models.await?;
println!;
Supported by:
- โ OpenAI Protocol (including OpenAI-compatible providers like DeepSeek, Zhipu, Moonshot)
- โ Anthropic Protocol (limited support - returns fallback endpoint)
- โ
Ollama Protocol (full support via
/api/tags) - โ Aliyun Protocol (not supported)
Example Results:
- DeepSeek:
["deepseek-chat", "deepseek-reasoner"] - Zhipu:
["glm-4.5", "glm-4.5-air", "glm-4.6"] - Moonshot:
["moonshot-v1-32k", "kimi-latest", ...]
Recommendation:
- Cache
models()results to avoid repeated API calls - For protocols that don't support model listing, you can use any model name directly in your requests
Request Examples
OpenAI / OpenAI-compatible
let request = ChatRequest ;
Anthropic (requires max_tokens)
let request = ChatRequest ;
Aliyun (DashScope)
let request = ChatRequest ;
Ollama (Local)
let request = ChatRequest ;
Ollama Streaming (GLM-4.6 via Remote Gateway)
If you expose an Ollama-compatible API while the backend actually calls Zhipu glm-4.6 (remote gateway), you do NOT need any local model installation. Just point the client to your gateway and use the model id defined by your service:
use StreamExt;
use ;
async
Run example (requires streaming feature):
Note: This setup targets a remote Ollama-compatible gateway. The model id is defined by your backend (e.g. glm-4.6); no local installation is required. If your gateway uses a different identifier, replace it accordingly.
Streaming (Optional Feature)
Enable streaming in your Cargo.toml:
= { = "0.3.13", = ["streaming"] }
use StreamExt;
let mut stream = client.chat_stream.await?;
while let Some = stream.next.await
Error Handling
use LlmConnectorError;
match client.chat.await
Configuration
Simple API Key (Recommended)
let client = openai;
Environment Variables
use env;
let api_key = var?;
let client = openai;
Protocol Information
let client = openai?;
// Get provider name
println!;
// Fetch models online (requires API call)
let models = client.models.await?;
println!;
Reasoning Synonyms
Many providers return hidden or provider-specific keys for model reasoning content (chain-of-thought). To simplify usage across providers, we normalize four common keys:
reasoning_content,reasoning,thought,thinking
Post-processing automatically scans raw JSON and fills these optional fields on both regular messages (Message) and streaming deltas (Delta). You can read the first available value via a convenience method:
// Non-streaming
let msg = &response.choices.message;
if let Some = msg.reasoning_any
// Streaming
while let Some = stream.next.await
Notes:
- Fields remain
Noneif the provider does not return any reasoning keys. - The normalization is provider-agnostic and applied uniformly to OpenAI, Anthropic, Aliyun (Qwen), Zhipu (GLM), and DeepSeek flows (including streaming).
StreamingResponsealso backfills its top-levelreasoning_contentfrom the first delta that contains reasoning.
Debugging & Troubleshooting
Test Your API Keys
Quickly test if your API keys are valid:
# Test all keys from keys.yaml
# Debug DeepSeek specifically
The test tool will:
- โ Validate API key format
- โ Test authentication with the provider
- โ Show exactly what's wrong if a key fails
- โ Provide specific fix instructions
Troubleshooting Guides
TROUBLESHOOTING.md- Comprehensive troubleshooting guideHOW_TO_TEST_YOUR_KEYS.md- How to test your API keysTEST_YOUR_DEEPSEEK_KEY.md- Quick start for DeepSeek users
Common Issues
Authentication Error:
โ Authentication failed: Incorrect API key provided
Solutions:
- Verify your API key is correct (no extra spaces)
- Check if your account has credits
- Generate a new API key from your provider's dashboard
- Run
cargo run --example test_keys_yamlto diagnose
Recent Changes
v0.4.8 (Current)
๐ง Simplified Configuration Architecture
- Single Configuration Module: Consolidated
src/config/directory intosrc/config.rs - Eliminated Naming Confusion: Clear separation between configuration and providers
- Streamlined Streaming API: Unified
chat_stream()method for all streaming needs - Enhanced Performance: 3000x+ performance improvements in V2 architecture
๐ฏ Current Streaming API:
chat_stream()- Unified streaming interface with rich response dataStreamingResponsewith convenience methods likeget_content()- Support for reasoning content and usage statistics
- Compatible with all providers (OpenAI, Anthropic, Aliyun, Zhipu, Ollama)
v0.3.13 (V1 Legacy)
Note: The following features are from V1 architecture (available via
features = ["v1-legacy"])
๐ Universal Streaming Format Abstraction
- StreamFormat Enum: Support for JSON, SSE, and NDJSON output formats
- StreamChunk Universal Container: Unified abstraction for all streaming responses
- Format Conversion Methods:
to_json(),to_sse(),to_ndjson(),to_format() - Content Extraction: Universal
extract_content()method for both OpenAI and Ollama formats
๐ฏ V1 Streaming Methods:
chat_stream_universal()- Most flexible interface with full format controlchat_stream_sse()- Convenient Server-Sent Events format for web appschat_stream_ndjson()- Convenient Newline-Delimited JSON for data pipelines- Enhanced
StreamingConfigwith separate content and output format controls
๐ง Architecture Improvements:
- Separation of Concerns: Content format (OpenAI/Ollama) vs Output format (JSON/SSE/NDJSON)
- Format Abstraction: No more hardcoded JSON strings in streaming responses
- Extensible Design: Easy to add new output formats in the future
- Type Safety: Strong typing for all format options
๐ก Use Cases:
- Web Applications: Use SSE format for real-time streaming
- API Services: Use JSON format for standard responses
- Data Processing: Use NDJSON format for logs and pipelines
- Tool Integration: Combine any content format with any output format
๐ Enhanced Documentation:
- Comprehensive format comparison table
- Detailed usage examples for each format
- Clear migration guide from previous versions
v0.3.12
๐ง Critical Fix: Pure Ollama Format Streaming
- Fixed Double Format Issue:
chat_stream_ollama()now returns pure Ollama format instead of nested format - Direct Compatibility: Perfect integration with Zed.dev and other Ollama-compatible tools
- Simplified Usage: No more JSON parsing required - direct
OllamaStreamChunkaccess - Backward Compatibility: Added
chat_stream_ollama_embedded()for legacy nested format
๐ฏ Format Changes:
- Before: Ollama JSON embedded in OpenAI format
contentfield (required parsing) - After: Direct
OllamaStreamChunkobjects with native field access - New Type:
OllamaChatStreamfor pure Ollama format streams - Enhanced API: Cleaner, more intuitive streaming interface
๐ Updated Documentation:
- Clear distinction between pure and embedded Ollama formats
- Updated examples with direct field access patterns
- Enhanced streaming format comparison section
๐งช New Examples:
test_pure_ollama_format.rs- Validation of pure format output- Updated
ollama_streaming_simple.rs- Demonstrates direct field access
v0.3.11
๐ Major New Features:
- Multiple Streaming Formats: Support for both OpenAI and Ollama streaming formats
chat_stream_ollama()- Ollama-compatible streaming for Zed.dev integrationchat_stream_with_format()- Custom streaming configurationStreamingFormat::OpenAIandStreamingFormat::Ollamaoptions
- Enhanced Tool Integration: Perfect compatibility with Zed.dev and other Ollama-compatible tools
- Tencent Hunyuan Native API: Initial implementation of TC3-HMAC-SHA256 signature authentication
hunyuan_native()- Native Tencent Cloud API support- Full region support (ap-beijing, ap-shanghai, ap-guangzhou)
- Better error handling and debugging capabilities
๐ง Improvements:
- Streaming Format Conversion: Automatic conversion between OpenAI and Ollama formats
- Done Marker Handling: Proper
done: truefinal chunk for Ollama format - Usage Statistics: Complete token usage and timing information in Ollama format
- Backward Compatibility: All existing streaming code continues to work unchanged
๐ Documentation:
- Complete streaming format comparison and usage examples
- New examples:
ollama_streaming_simple.rs,streaming_ollama_format.rs - Updated README with detailed format explanations
- Enhanced troubleshooting guides for streaming
๐ฏ Breaking Changes:
- None - all changes are backward compatible
v0.3.8
๐ Major Stability and Debugging Improvements:
- Enhanced Timeout Configuration: All providers now support custom timeout settings
LlmClient::openai_with_timeout()- OpenAI with custom timeoutLlmClient::anthropic_with_timeout()- Anthropic with custom timeoutLlmClient::zhipu_with_timeout()- Zhipu with custom timeout- Default timeout increased to 30 seconds for better stability
- Advanced Debugging Support: Comprehensive request/response debugging
LLM_DEBUG_REQUEST_RAW=1- Show detailed request informationLLM_DEBUG_RESPONSE_RAW=1- Show response status and headersLLM_DEBUG_STREAM_RAW=1- Show streaming response details- Enhanced error messages with specific troubleshooting guidance
- Zhipu Stability Improvements: Dedicated tools for diagnosing Zhipu API issues
- New
zhipu_stability_test.rsexample for comprehensive testing - Improved error handling and timeout management
- Better connection stability monitoring
- New
๐ง New Examples:
enhanced_error_handling.rs- Comprehensive error handling and debuggingunified_config.rs- Unified configuration interface for all providerszhipu_stability_test.rs- Dedicated Zhipu stability testing tool
๐ Documentation:
- Updated troubleshooting guides with timeout configuration
- Enhanced error handling examples
- Improved debugging instructions
v0.3.1
๐ Major New Features:
- Complete Ollama Model Management: Full CRUD operations for local models
list_models()- List all installed modelspull_model()- Download models from registrypush_model()- Upload models to registrydelete_model()- Remove local modelsshow_model()- Get detailed model information
- Enhanced Anthropic Streaming: Proper event state management
- Correct handling of
message_start,content_block_delta,message_delta,message_stopevents - Real-time token usage tracking during streaming
- Improved error resilience and state management
- Correct handling of
๐ง Improvements:
- Expanded Model Discovery Support:
- Added Ollama model listing via
/api/tagsendpoint - Limited Anthropic model discovery support
- Added Ollama model listing via
- Enhanced Client Interface: New methods for Ollama model management
- Updated Examples: Added comprehensive model management and streaming examples
๐ Documentation:
- Complete rewrite of Ollama section with model management examples
- Enhanced streaming documentation with code examples
- Updated feature descriptions and supported operations
v0.2.3
๐ง Breaking Changes:
- Removed
supported_models()method - Usefetch_models()instead - Removed
supports_model()method - No longer needed
โจ New Features:
- Improved error messages - Removed confusing OpenAI URLs for other providers
- New debugging tools:
examples/test_keys_yaml.rs- Test all API keysexamples/debug_deepseek.rs- Debug DeepSeek authentication
- Comprehensive documentation:
TROUBLESHOOTING.md- Troubleshooting guideHOW_TO_TEST_YOUR_KEYS.md- Testing instructionsTEST_YOUR_DEEPSEEK_KEY.md- Quick start guide
Migration from v0.2.2:
// โ Old (no longer works)
let models = client.supported_models;
// โ
New
let models = client.fetch_models.await?;
v0.2.2
โจ New Features:
- Added
fetch_models()for online model discovery - OpenAI protocol supports dynamic model fetching from
/v1/modelsendpoint - Works with OpenAI-compatible providers (DeepSeek, Zhipu, Moonshot, etc.)
Design Philosophy
Minimal by Design:
- Only 4 protocols to cover all major LLM providers
- No hardcoded model restrictions - use any model name
- No complex configuration files or registries
- Direct API usage with clear abstractions
Protocol-first:
- Group providers by API protocol, not by company
- OpenAI-compatible providers share one implementation
- Extensible through protocol adapters
Examples
Check out the examples/ directory:
# Test your API keys from keys.yaml
# Debug DeepSeek authentication
# Simple fetch_models() demo
# Ollama model management (NEW!)
# Anthropic streaming (NEW! - requires streaming feature)
# Ollama streaming (NEW! - requires streaming feature)
# LongCat demo (OpenAI/Anthropic compatible)
Example Descriptions
test_keys_yaml.rs โญ New!
- Tests all API keys from your
keys.yamlfile - Validates API key format and authentication
- Provides specific troubleshooting for each error
- Run this first if you have authentication issues!
debug_deepseek.rs โญ New!
- Interactive debugging tool for DeepSeek API
- Validates API key format
- Tests model fetching and chat requests
- Provides detailed troubleshooting guidance
fetch_models_simple.rs
- Simple demonstration of
fetch_models() - Shows how to fetch models from OpenAI-compatible providers
- Includes usage recommendations
ollama_model_management.rs โญ New!
- Demonstrates complete Ollama model management functionality
- Shows how to list, pull, delete, and get model details
- Includes error handling and practical usage examples
anthropic_streaming.rs โญ New!
- Shows enhanced Anthropic streaming with proper event handling
- Demonstrates real-time response streaming and usage tracking
- Includes both regular and streaming chat examples
Removed redundant examples
test_fetch_models.rsandtest_with_keys.rswere overlapping with other examples and have been removed.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT