llm-connector

Minimal Rust library for LLM protocol abstraction.

Supports 6 protocols: OpenAI, Anthropic, Zhipu, Aliyun, Ollama, Hunyuan. No complex configuration - just pick a protocol and start chatting.

🚨 Having Authentication Issues?

Test your API keys right now:

cargo run --example test_keys_yaml

This will tell you exactly what's wrong with your API keys! See Debugging & Troubleshooting for more details.

✨ Key Features

6 Protocol Support: OpenAI, Anthropic, Zhipu, Aliyun, Ollama, Hunyuan
No Hardcoded Models: Use any model name without restrictions
Online Model Discovery: Fetch available models dynamically from API
Universal Streaming Formats: Real-time streaming with format abstraction (JSON/SSE/NDJSON) and pure Ollama support
Ollama Model Management: Full CRUD operations for local models
Unified Interface: Same API for all protocols
Type-Safe: Full Rust type safety with async/await

Quick Start

Installation

Add to your Cargo.toml:

[dependencies]
llm-connector = "0.3.13"
tokio = { version = "1", features = ["full"] }

Optional features:

# Streaming support
llm-connector = { version = "0.3.13", features = ["streaming"] }

# Tencent Cloud native API support
llm-connector = { version = "0.3.13", features = ["tencent-native"] }

# Both streaming and Tencent native API
llm-connector = { version = "0.3.13", features = ["streaming", "tencent-native"] }

Basic Usage

use llm_connector::{LlmClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // OpenAI (default base URL)
    let client = LlmClient::openai("sk-...", None);

    // Anthropic
    let client = LlmClient::anthropic("sk-ant-...");

    // Aliyun (DashScope)
    let client = LlmClient::aliyun("sk-...");

    // Tencent Hunyuan
    let client = LlmClient::hunyuan("sk-...");

    // Ollama (local, no API key needed)
    let client = LlmClient::ollama(None);

    let request = ChatRequest {
        model: "gpt-4".to_string(),
        messages: vec![Message::user("Hello!")],
        ..Default::default()
    };

    let response = client.chat(&request).await?;
    println!("Response: {}", response.choices[0].message.content);
    Ok(())
}

Supported Protocols

1. OpenAI Protocol

Standard OpenAI API format.

// OpenAI (default base URL)
let client = LlmClient::openai("sk-...", None);

// OpenAI-compatible endpoints (custom base URL)
let client = LlmClient::openai("sk-...", Some("https://api.example.com/v1"));

Features:

✅ No hardcoded models - use any model name
✅ Online model discovery via fetch_models()
✅ Works with OpenAI-compatible providers (DeepSeek, Zhipu, Moonshot, etc.)

Example Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, o1-preview, o1-mini

2. Anthropic Protocol

Claude Messages API with separate system messages.

let client = LlmClient::anthropic("sk-ant-...");

Models: claude-3-5-sonnet-20241022, claude-3-opus, claude-3-haiku

3. Zhipu Protocol (ChatGLM)

OpenAI-compatible format with Zhipu-specific error handling.

let client = LlmClient::zhipu("sk-...");

Models: glm-4, glm-4-flash, glm-4-air, glm-4-plus, glm-4x

4. Aliyun Protocol (DashScope)

Custom protocol for Qwen models.

let client = LlmClient::aliyun("sk-...");

Models: qwen-turbo, qwen-plus, qwen-max

5. Tencent Hunyuan Protocol

Two implementation options for Tencent's Hunyuan models:

5.1 OpenAI-Compatible Interface

let client = LlmClient::hunyuan("sk-...");

Features:

✅ OpenAI-compatible API format
✅ Supports streaming responses
✅ Online model discovery via fetch_models()

5.2 Native Tencent Cloud API (Recommended)

// Requires "tencent-native" feature
let client = LlmClient::hunyuan_native("secret-id", "secret-key", Some("ap-beijing"));

Features:

✅ Native Tencent Cloud API with TC3-HMAC-SHA256 signature
✅ Full access to Tencent Cloud features
✅ Better error handling and debugging
✅ Supports streaming responses
✅ Region specification support

Models: hunyuan-lite, hunyuan-standard, hunyuan-pro

6. Ollama Protocol (Local)

Local LLM server with no API key required.

// Default: localhost:11434
let client = LlmClient::ollama(None);

// Custom URL
let client = LlmClient::ollama(Some("http://192.168.1.100:11434"));

Models: llama3.2, llama3.1, mistral, mixtral, qwen2.5, etc.

Features:

✅ Model listing via /api/tags
✅ Model management (pull, push, delete, show details)
✅ Local server support with custom URLs
✅ Enhanced error handling for Ollama-specific operations

Ollama Model Management

The library now provides comprehensive Ollama model management capabilities:

use llm_connector::ollama::OllamaModelOps;
let client = LlmClient::ollama();

// List all installed models
let models = client.list_models().await?;
for model in models {
    println!("Available model: {}", model);
}

// Pull a new model
client.pull_model("llama3.2").await?;

// Get detailed model information
let details = client.show_model("llama3.2").await?;
println!("Model size: {} bytes", details.size.unwrap_or(0));

// Delete a model
client.delete_model("llama3.2").await?;

Supported Ollama Operations

List Models: list_models() - Get all locally installed models
Pull Models: pull_model(name) - Download models from registry
Push Models: push_model(name) - Upload models to registry
Delete Models: delete_model(name) - Remove local models
Show Details: show_model(name) - Get comprehensive model information

Universal Streaming Format Support

The library provides comprehensive streaming support with universal format abstraction for maximum flexibility:

Standard OpenAI Format (Default)

use futures_util::StreamExt;

let client = LlmClient::anthropic("sk-ant-...");
let request = ChatRequest {
    model: "claude-3-5-sonnet-20241022".to_string(),
    messages: vec![Message::user("Hello!")],
    max_tokens: Some(200),
    ..Default::default()
};

let mut stream = client.chat_stream(&request).await?;

while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(content) = chunk.get_content() {
        print!("{}", content);
    }
}

Pure Ollama Format for Tool Integration

For perfect compatibility with tools like Zed.dev, use the pure Ollama streaming format:

use futures_util::StreamExt;

// Use pure Ollama format (perfect for Zed.dev)
let mut stream = client.chat_stream_ollama(&request).await?;

while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    // chunk is now a pure OllamaStreamChunk
    if !chunk.message.content.is_empty() {
        print!("{}", chunk.message.content);
    }

    // Check for final chunk
    if chunk.done {
        println!("\nStreaming complete!");
        break;
    }
}

Legacy Ollama Format (Embedded)

For backward compatibility, the embedded format is still available:

use futures_util::StreamExt;

// Use embedded Ollama format (legacy)
let mut stream = client.chat_stream_ollama_embedded(&request).await?;

while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    // chunk.content contains Ollama-formatted JSON string
    if let Ok(ollama_chunk) = serde_json::from_str::<serde_json::Value>(&chunk.content) {
        if let Some(content) = ollama_chunk
            .get("message")
            .and_then(|m| m.get("content"))
            .and_then(|c| c.as_str())
        {
            print!("{}", content);
        }
    }
}

Universal Format Abstraction

For maximum flexibility, use the universal streaming interface with format abstraction:

use llm_connector::types::{StreamingConfig, StreamingFormat, StreamFormat};

// Configure both content format and output format
let config = StreamingConfig {
    format: StreamingFormat::Ollama,        // Content format (OpenAI/Ollama)
    stream_format: StreamFormat::SSE,       // Output format (JSON/SSE/NDJSON)
    include_usage: true,
    include_reasoning: false,
};

let mut stream = client.chat_stream_universal(&request, &config).await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    // chunk.to_format() returns formatted string based on stream_format
    println!("{}", chunk.to_format());

    // Or extract content directly
    if let Some(content) = chunk.extract_content() {
        print!("{}", content);
    }
}

Convenient Format Methods

For common use cases, use the convenient format-specific methods:

// Server-Sent Events format (perfect for web applications)
let mut sse_stream = client.chat_stream_sse(&request).await?;
while let Some(chunk) = sse_stream.next().await {
    let chunk = chunk?;
    println!("{}", chunk.to_format()); // "data: {...}\n\n"
}

// Newline-Delimited JSON format (perfect for data pipelines)
let mut ndjson_stream = client.chat_stream_ndjson(&request).await?;
while let Some(chunk) = ndjson_stream.next().await {
    let chunk = chunk?;
    println!("{}", chunk.to_format()); // "{...}\n"
}

Format Comparison

Format	Output Example	Use Case
JSON	`{"content":"hello"}`	API responses, standard JSON
SSE	`data: {"content":"hello"}\n\n`	Web real-time streaming
NDJSON	`{"content":"hello"}\n`	Log processing, data pipelines

Enhanced Anthropic Streaming Features

State Management: Proper handling of message_start, content_block_delta, message_delta, message_stop events
Event Processing: Correct parsing of complex Anthropic streaming responses
Usage Tracking: Real-time token usage statistics during streaming
Error Resilience: Robust error handling for streaming interruptions

Model Discovery

Fetch the latest available models from the API:

let client = LlmClient::openai("sk-...");

// Fetch models online from the API
let models = client.fetch_models().await?;
println!("Available models: {:?}", models);

Supported by:

✅ OpenAI Protocol (including OpenAI-compatible providers like DeepSeek, Zhipu, Moonshot)
✅ Anthropic Protocol (limited support - returns fallback endpoint)
✅ Ollama Protocol (full support via /api/tags)
❌ Aliyun Protocol (not supported)

Example Results:

DeepSeek: ["deepseek-chat", "deepseek-reasoner"]
Zhipu: ["glm-4.5", "glm-4.5-air", "glm-4.6"]
Moonshot: ["moonshot-v1-32k", "kimi-latest", ...]

Recommendation:

Cache fetch_models() results to avoid repeated API calls
For protocols that don't support model listing, you can use any model name directly in your requests

Request Examples

OpenAI / OpenAI-compatible

let request = ChatRequest {
    model: "gpt-4".to_string(),
    messages: vec![
        Message::system("You are a helpful assistant."),
        Message::user("Hello!"),
    ],
    temperature: Some(0.7),
    max_tokens: Some(100),
    ..Default::default()
};

Anthropic (requires max_tokens)

let request = ChatRequest {
    model: "claude-3-5-sonnet-20241022".to_string(),
    messages: vec![Message::user("Hello!")],
    max_tokens: Some(200), // Required for Anthropic
    ..Default::default()
};

Aliyun (DashScope)

let request = ChatRequest {
    model: "qwen-max".to_string(),
    messages: vec![Message::user("你好！")],
    ..Default::default()
};

Ollama (Local)

let request = ChatRequest {
    model: "llama3.2".to_string(),
    messages: vec![Message::user("Hello!")],
    ..Default::default()
};

Ollama Streaming (GLM-4.6 via Remote Gateway)

If you expose an Ollama-compatible API while the backend actually calls Zhipu glm-4.6 (remote gateway), you do NOT need any local model installation. Just point the client to your gateway and use the model id defined by your service:

use futures_util::StreamExt;
use llm_connector::{LlmClient, types::{ChatRequest, Message}};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Point to your remote Ollama-compatible gateway (replace with your actual URL)
    let client = LlmClient::ollama(Some("https://your-ollama-gateway.example.com"));

    let request = ChatRequest {
        model: "glm-4.6".to_string(),
        messages: vec![Message::user("Briefly explain the benefits of streaming.")],
        max_tokens: Some(128),
        ..Default::default()
    };

    let mut stream = client.chat_stream(&request).await?;
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        if let Some(content) = chunk.get_content() {
            print!("{}", content);
        }
    }

    Ok(())
}

Run example (requires streaming feature):

cargo run --example ollama_streaming --features streaming

Note: This setup targets a remote Ollama-compatible gateway. The model id is defined by your backend (e.g. glm-4.6); no local installation is required. If your gateway uses a different identifier, replace it accordingly.

Streaming (Optional Feature)

Enable streaming in your Cargo.toml:

llm-connector = { version = "0.3.13", features = ["streaming"] }

use futures_util::StreamExt;

let mut stream = client.chat_stream(&request).await?;

while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(content) = chunk.get_content() {
        print!("{}", content);
    }
}

Error Handling

use llm_connector::error::LlmConnectorError;

match client.chat(&request).await {
    Ok(response) => {
        println!("Response: {}", response.choices[0].message.content);
    }
    Err(e) => {
        match e {
            LlmConnectorError::AuthenticationError(msg) => {
                eprintln!("Auth error: {}", msg);
            }
            LlmConnectorError::RateLimitError(msg) => {
                eprintln!("Rate limit: {}", msg);
            }
            LlmConnectorError::UnsupportedOperation(msg) => {
                eprintln!("Not supported: {}", msg);
            }
            _ => eprintln!("Error: {}", e),
        }
    }
}

Configuration

Simple API Key (Recommended)

let client = LlmClient::openai("your-api-key");

Environment Variables

export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export ALIYUN_API_KEY="sk-your-key"

use std::env;

let api_key = env::var("OPENAI_API_KEY")?;
let client = LlmClient::openai(&api_key, None);

Protocol Information

let client = LlmClient::openai("sk-...");

// Get protocol name
println!("Protocol: {}", client.protocol_name());

// Fetch models online (requires API call)
let models = client.fetch_models().await?;
println!("Available models: {:?}", models);

Reasoning Synonyms

Many providers return hidden or provider-specific keys for model reasoning content (chain-of-thought). To simplify usage across providers, we normalize four common keys:

reasoning_content, reasoning, thought, thinking

Post-processing automatically scans raw JSON and fills these optional fields on both regular messages (Message) and streaming deltas (Delta). You can read the first available value via a convenience method:

// Non-streaming
let msg = &response.choices[0].message;
if let Some(reason) = msg.reasoning_any() {
    println!("Reasoning: {}", reason);
}

// Streaming
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(reason) = chunk.choices[0].delta.reasoning_any() {
        println!("Reasoning (stream): {}", reason);
    }
}

Notes:

Fields remain None if the provider does not return any reasoning keys.
The normalization is provider-agnostic and applied uniformly to OpenAI, Anthropic, Aliyun (Qwen), Zhipu (GLM), and DeepSeek flows (including streaming).
StreamingResponse also backfills its top-level reasoning_content from the first delta that contains reasoning.

Debugging & Troubleshooting

Test Your API Keys

Quickly test if your API keys are valid:

# Test all keys from keys.yaml
cargo run --example test_keys_yaml

# Debug DeepSeek specifically
cargo run --example debug_deepseek -- sk-your-key

The test tool will:

✅ Validate API key format
✅ Test authentication with the provider
✅ Show exactly what's wrong if a key fails
✅ Provide specific fix instructions

Troubleshooting Guides

TROUBLESHOOTING.md - Comprehensive troubleshooting guide
HOW_TO_TEST_YOUR_KEYS.md - How to test your API keys
TEST_YOUR_DEEPSEEK_KEY.md - Quick start for DeepSeek users

Common Issues

Authentication Error:

❌ Authentication failed: Incorrect API key provided

Solutions:

Verify your API key is correct (no extra spaces)
Check if your account has credits
Generate a new API key from your provider's dashboard
Run cargo run --example test_keys_yaml to diagnose

Recent Changes

v0.3.13 (Latest)

🚀 Universal Streaming Format Abstraction

StreamFormat Enum: Support for JSON, SSE, and NDJSON output formats
StreamChunk Universal Container: Unified abstraction for all streaming responses
Format Conversion Methods: to_json(), to_sse(), to_ndjson(), to_format()
Content Extraction: Universal extract_content() method for both OpenAI and Ollama formats

🎯 New Streaming Methods:

chat_stream_universal() - Most flexible interface with full format control
chat_stream_sse() - Convenient Server-Sent Events format for web apps
chat_stream_ndjson() - Convenient Newline-Delimited JSON for data pipelines
Enhanced StreamingConfig with separate content and output format controls

🔧 Architecture Improvements:

Separation of Concerns: Content format (OpenAI/Ollama) vs Output format (JSON/SSE/NDJSON)
Format Abstraction: No more hardcoded JSON strings in streaming responses
Extensible Design: Easy to add new output formats in the future
Type Safety: Strong typing for all format options

💡 Use Cases:

Web Applications: Use SSE format for real-time streaming
API Services: Use JSON format for standard responses
Data Processing: Use NDJSON format for logs and pipelines
Tool Integration: Combine any content format with any output format

📚 Enhanced Documentation:

Comprehensive format comparison table
Detailed usage examples for each format
Clear migration guide from previous versions

v0.3.12

🔧 Critical Fix: Pure Ollama Format Streaming

Fixed Double Format Issue: chat_stream_ollama() now returns pure Ollama format instead of nested format
Direct Compatibility: Perfect integration with Zed.dev and other Ollama-compatible tools
Simplified Usage: No more JSON parsing required - direct OllamaStreamChunk access
Backward Compatibility: Added chat_stream_ollama_embedded() for legacy nested format

🎯 Format Changes:

Before: Ollama JSON embedded in OpenAI format content field (required parsing)
After: Direct OllamaStreamChunk objects with native field access
New Type: OllamaChatStream for pure Ollama format streams
Enhanced API: Cleaner, more intuitive streaming interface

📚 Updated Documentation:

Clear distinction between pure and embedded Ollama formats
Updated examples with direct field access patterns
Enhanced streaming format comparison section

🧪 New Examples:

test_pure_ollama_format.rs - Validation of pure format output
Updated ollama_streaming_simple.rs - Demonstrates direct field access

v0.3.11

🚀 Major New Features:

Multiple Streaming Formats: Support for both OpenAI and Ollama streaming formats
- chat_stream_ollama() - Ollama-compatible streaming for Zed.dev integration
- chat_stream_with_format() - Custom streaming configuration
- StreamingFormat::OpenAI and StreamingFormat::Ollama options
Enhanced Tool Integration: Perfect compatibility with Zed.dev and other Ollama-compatible tools
Tencent Hunyuan Native API: Initial implementation of TC3-HMAC-SHA256 signature authentication
- hunyuan_native() - Native Tencent Cloud API support
- Full region support (ap-beijing, ap-shanghai, ap-guangzhou)
- Better error handling and debugging capabilities

🔧 Improvements:

Streaming Format Conversion: Automatic conversion between OpenAI and Ollama formats
Done Marker Handling: Proper done: true final chunk for Ollama format
Usage Statistics: Complete token usage and timing information in Ollama format
Backward Compatibility: All existing streaming code continues to work unchanged

📚 Documentation:

Complete streaming format comparison and usage examples
New examples: ollama_streaming_simple.rs, streaming_ollama_format.rs
Updated README with detailed format explanations
Enhanced troubleshooting guides for streaming

🎯 Breaking Changes:

None - all changes are backward compatible

v0.3.8

🚀 Major Stability and Debugging Improvements:

Enhanced Timeout Configuration: All providers now support custom timeout settings
- LlmClient::openai_with_timeout() - OpenAI with custom timeout
- LlmClient::anthropic_with_timeout() - Anthropic with custom timeout
- LlmClient::zhipu_with_timeout() - Zhipu with custom timeout
- Default timeout increased to 30 seconds for better stability
Advanced Debugging Support: Comprehensive request/response debugging
- LLM_DEBUG_REQUEST_RAW=1 - Show detailed request information
- LLM_DEBUG_RESPONSE_RAW=1 - Show response status and headers
- LLM_DEBUG_STREAM_RAW=1 - Show streaming response details
- Enhanced error messages with specific troubleshooting guidance
Zhipu Stability Improvements: Dedicated tools for diagnosing Zhipu API issues
- New zhipu_stability_test.rs example for comprehensive testing
- Improved error handling and timeout management
- Better connection stability monitoring

🔧 New Examples:

enhanced_error_handling.rs - Comprehensive error handling and debugging
unified_config.rs - Unified configuration interface for all providers
zhipu_stability_test.rs - Dedicated Zhipu stability testing tool

📚 Documentation:

Updated troubleshooting guides with timeout configuration
Enhanced error handling examples
Improved debugging instructions

v0.3.1

🚀 Major New Features:

Complete Ollama Model Management: Full CRUD operations for local models
- list_models() - List all installed models
- pull_model() - Download models from registry
- push_model() - Upload models to registry
- delete_model() - Remove local models
- show_model() - Get detailed model information
Enhanced Anthropic Streaming: Proper event state management
- Correct handling of message_start, content_block_delta, message_delta, message_stop events
- Real-time token usage tracking during streaming
- Improved error resilience and state management

🔧 Improvements:

Expanded Model Discovery Support:
- Added Ollama model listing via /api/tags endpoint
- Limited Anthropic model discovery support
Enhanced Client Interface: New methods for Ollama model management
Updated Examples: Added comprehensive model management and streaming examples

📚 Documentation:

Complete rewrite of Ollama section with model management examples
Enhanced streaming documentation with code examples
Updated feature descriptions and supported operations

v0.2.3

🔧 Breaking Changes:

Removed supported_models() method - Use fetch_models() instead
Removed supports_model() method - No longer needed

✨ New Features:

Improved error messages - Removed confusing OpenAI URLs for other providers
New debugging tools:
- examples/test_keys_yaml.rs - Test all API keys
- examples/debug_deepseek.rs - Debug DeepSeek authentication
Comprehensive documentation:
- TROUBLESHOOTING.md - Troubleshooting guide
- HOW_TO_TEST_YOUR_KEYS.md - Testing instructions
- TEST_YOUR_DEEPSEEK_KEY.md - Quick start guide

Migration from v0.2.2:

// ❌ Old (no longer works)
let models = client.supported_models();

// ✅ New
let models = client.fetch_models().await?;

v0.2.2

✨ New Features:

Added fetch_models() for online model discovery
OpenAI protocol supports dynamic model fetching from /v1/models endpoint
Works with OpenAI-compatible providers (DeepSeek, Zhipu, Moonshot, etc.)

Design Philosophy

Minimal by Design:

Only 4 protocols to cover all major LLM providers
No hardcoded model restrictions - use any model name
No complex configuration files or registries
Direct API usage with clear abstractions

Protocol-first:

Group providers by API protocol, not by company
OpenAI-compatible providers share one implementation
Extensible through protocol adapters

Examples

Check out the examples/ directory:

# Test your API keys from keys.yaml
cargo run --example test_keys_yaml

# Debug DeepSeek authentication
cargo run --example debug_deepseek -- sk-your-key

# Simple fetch_models() demo
cargo run --example fetch_models_simple

# Ollama model management (NEW!)
cargo run --example ollama_model_management

# Anthropic streaming (NEW! - requires streaming feature)
cargo run --example anthropic_streaming --features streaming

# Ollama streaming (NEW! - requires streaming feature)
cargo run --example ollama_streaming --features streaming

# LongCat demo (OpenAI/Anthropic compatible)
cargo run --example longcat_dual

Example Descriptions

test_keys_yaml.rs ⭐ New!

Tests all API keys from your keys.yaml file
Validates API key format and authentication
Provides specific troubleshooting for each error
Run this first if you have authentication issues!

debug_deepseek.rs ⭐ New!

Interactive debugging tool for DeepSeek API
Validates API key format
Tests model fetching and chat requests
Provides detailed troubleshooting guidance

fetch_models_simple.rs

Simple demonstration of fetch_models()
Shows how to fetch models from OpenAI-compatible providers
Includes usage recommendations

ollama_model_management.rs ⭐ New!

Demonstrates complete Ollama model management functionality
Shows how to list, pull, delete, and get model details
Includes error handling and practical usage examples

anthropic_streaming.rs ⭐ New!

Shows enhanced Anthropic streaming with proper event handling
Demonstrates real-time response streaming and usage tracking
Includes both regular and streaming chat examples

Removed redundant examples

test_fetch_models.rs and test_with_keys.rs were overlapping with other examples and have been removed.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

llm-connector 0.3.13

llm-connector

🚨 Having Authentication Issues?

✨ Key Features

Quick Start

Installation

Basic Usage

Supported Protocols

1. OpenAI Protocol

2. Anthropic Protocol

3. Zhipu Protocol (ChatGLM)

4. Aliyun Protocol (DashScope)

5. Tencent Hunyuan Protocol

5.1 OpenAI-Compatible Interface

5.2 Native Tencent Cloud API (Recommended)

6. Ollama Protocol (Local)

Ollama Model Management

Supported Ollama Operations

Universal Streaming Format Support

Standard OpenAI Format (Default)

Pure Ollama Format for Tool Integration

Legacy Ollama Format (Embedded)

Universal Format Abstraction

Convenient Format Methods

Format Comparison

Enhanced Anthropic Streaming Features

Model Discovery

Request Examples

OpenAI / OpenAI-compatible

Anthropic (requires max_tokens)

Aliyun (DashScope)

Ollama (Local)

Ollama Streaming (GLM-4.6 via Remote Gateway)

Streaming (Optional Feature)

Error Handling

Configuration

Simple API Key (Recommended)

Environment Variables

Protocol Information

Reasoning Synonyms

Debugging & Troubleshooting

Test Your API Keys

Troubleshooting Guides

Common Issues

Recent Changes

v0.3.13 (Latest)

v0.3.12

v0.3.11

v0.3.8

v0.3.1

v0.2.3

v0.2.2

Design Philosophy

Examples

Example Descriptions

Contributing

License